What free actually means

by

Noel Braganza

A reflection on token limits, invisible ceilings and what happens when the cost of thinking drops to zero

March 27, 2026

Somewhere in the mid-nineties, a generation of people decided that the internet was mainly useful for sending email and looking things up. This wasn't a failure of imagination exactly. It was a failure of infrastructure. The internet they were using was slow, metered and expensive. You connected when you needed to. You disconnected when you were done. The idea that you might one day leave it running permanently in the background, streaming video while simultaneously talking to someone on the other side of the world while a dozen other processes quietly updated themselves, would have sounded less like a prediction and more like a category error.

The constraint was so present, so defining, that it felt like a fact about the medium rather than a temporary fact about the moment.

I think about that a lot when I look at how AI is being used today.

Right now, AI runs on tokens. Every word generated, every question answered, every task completed has a cost attached to it. There are limits per minute, per day, per month. There are tiers and caps and paywalls sitting at different points along the capability curve. This is just the reality of where the infrastructure is, and it shapes everything, including how we think about what AI is and what it can do. The constraint is so present that it starts to feel like a permanent feature of the technology rather than a temporary feature of 2025.

But costs fall. They always fall. And they have been falling faster in AI than almost anywhere else in the history of computing. What cost dollars per query a few years ago costs fractions of a cent now. The trajectory is not subtle. If you extend it even modestly into the near future, you arrive somewhere interesting: a world where the cost of a token approaches zero. Where the ceiling on a conversation, a task, an autonomous process, is effectively gone.

The question worth sitting with isn't whether that happens. It's what changes when it does.

A few things seem obvious in retrospect, the way broadband always seems obvious once you have it. Agents that run continuously rather than in discrete sessions. Processes that don't stop because a context window closed. AI that can sit with a problem for as long as the problem requires, returning to it, revising, accumulating understanding over time rather than starting fresh each time you open a new tab. The session as a concept probably disappears entirely, the way the dial-up connection as a concept disappeared. You stop noticing it because it stops mattering.

But I think the more interesting changes are harder to predict from here. Broadband didn't just make email faster. It made things possible that had no equivalent in the dial-up world, things that nobody had a name for yet because the infrastructure to imagine them didn't exist. The same is probably true here. The things that become possible when tokens are free aren't just faster or bigger versions of what we have now. They are categorically different in ways we don't have the vocabulary for yet, because the constraint we're about to lose is the thing that's been quietly shaping the vocabulary.

What I find most compelling about this isn't the capability part. It's the implication for autonomy.

Right now, AI operates in bursts. You ask, it answers. You prompt, it responds. Even the most sophisticated agentic workflows are essentially structured around the assumption that compute is finite and therefore precious. Every architecture decision reflects that. But when the cost of thinking drops to zero, the architecture changes too. Not just technically, but conceptually. The model of AI as a tool you pick up and put down starts to feel less accurate. The model of AI as a process that runs, learns and compounds starts to feel more accurate.

That is a significant shift. Not because it makes AI more dangerous, necessarily, but because it makes it more alive in the sense of being continuous rather than transactional. And continuity changes things. It changed what the internet was capable of. It will change what AI is capable of.

We are, right now, making judgments about AI based on what it costs to run it today. Some of those judgments will age well. A lot of them will look like someone in 1996 confidently explaining the limits of the internet, based on very reasonable observations about a 28.8k modem.

The ceiling we keep hitting isn't a ceiling. It's a dial-up connection. And it won't be long before someone leaves it running in the background and forgets it was ever a constraint at all.

About the author

Noel Braganza is a designer and founder based in Gothenburg, Sweden. He co-founded MuchSkills, a bootstrapped SaaS platform for skills intelligence, and Up Strategy Lab, a strategy and design consultancy. His background is in Interaction Design, with research experience at the MIT Design Lab.

Most of his work starts from the same instinct: that the inherited assumptions underneath a problem are usually more interesting than the problem itself. That's true of how organisations think about skills, how founders talk about growth, and how people are starting to make sense of AI.

MuchSkills is profitable and growing without external funding. That shapes how he thinks about building, what's worth optimising for, and what isn't.

He writes occasionally at noelbraganza.com.