tl;dr: The tokenmaxxing conversation is obsessed with burning tokens on the output side. The real bottleneck — and the real unlock — is on the input side: collecting and feeding your own context to the agent.
There’s a lot of talk about “burning tokens” these days. Generate 1T tokens per day. Keep those GPUs humming. Speed up decoding, parallelize subagents, hit your rate limits like a badge of honor.
And honestly? It works — if you’re wealthy enough, and smart about prompt engineering. Prompts like #autoresearch, #agent/goal, #loop-engineering — they’re all just prompts at the end of the day, but good ones that make the “burning tokens” job way easier than it used to be.
Then people realized: maybe we should care about the quality of the output, not just how fast we’re burning. Welcome to the era of token economics.
The Cost Problem Nobody Solved
Here’s the frustrating part: the cost of tokens is aligning with the cost of a developer in Silicon Valley — a place where developers are already more expensive than anywhere else in the world.
Sure, you can use cheaper models like DeepSeek V4. But that’s not the real problem I want to talk about.
We want to be AI native. We want to rebuild everything with AI. We want to kick the human out of the loop. We want to build tools for AI, not just tools with AI.
But what’s actually stopping us?
The Real Bottleneck: Input, Not Output
Everyone’s focused on cranking up the output. Faster generation. More tokens. Bigger firehose.
But the most critical issue is the input. The context. Not the generation.
There are amazing things out there — autoresearch, gstack, superpower, you name it. But they’re all prompts. Great prompts, sure. The authors are brilliant. And I know it’s hard for most people to create prompts at that level, so copying them with whatever tools or skills you have is a smart move.
But ask yourself: in one year, will an agent still be unable to generate prompts like that on its own? Of course it will. The ability to craft amazing prompts is not the gap that keeps us from the AI native era.
The real gap is: how do we collect all the context from me, myself, and give it to the agent?
Tokenmaxxing should be about the input, not the output.
The Human Token Ceiling
Here’s a number that should make you uncomfortable: how many tokens does a human generate per day?
Speech, typing, writing by hand — sum it all up. Maybe 20k. Maybe 100k. Maybe 200k on a good day. That’s it. We don’t scale. There’s a ceiling built into our biology.
But how many of those tokens are actually feeding into the agent?
- You have an idea — then what?
- You meet someone interesting — then what?
- You have a meeting, read a great article, or some AI slop — then what?
- Yeah, you take notes every day. That’s good. But who’s your reader?
An interesting thing we neglect: humans learn primarily from visual data. But text is what changes the world. Not images. Poor humans can only generate text, and even that at a pathetically limited rate.
The Creepy Conclusion
I hope I could have a company that sells microphones and cameras. Because then I can sell you the most important devices that bridge the gap between you and AI native. Or we should hurry up to buy some stocks in companies that produce such products.
We need to collect what we speak, hear, write, and read. Use a microphone to capture your speech. Use a camera to record your day. Capture your screen. Not just text. Everything, everywhere, all the time.
That will be the creepiest moment. And we’ll be there.
Once we’re fully AI native.