The capital theory of AI

Computer scientist Richard Sutton hypothesised the “bitter truth” in his 2019 essay:

The bitter lesson is based on the historical observations that 1) AI researchers have often tried to build knowledge into their agents, 2) this always helps in the short term, and is personally satisfying to the researcher, but 3) in the long run it plateaus and even inhibits further progress, and 4) breakthrough progress eventually arrives by an opposing approach based on scaling computation by search and learning. The eventual success is tinged with bitterness, and often incompletely digested, because it is success over a favored, human-centric approach.

This was a hard-to-accept conclusion based on years of empirical evidence. Researchers have been trying to invent novel ways of embedding intelligence into machines, but these efforts have been consistently surpassed by advancements in compute tech (lower costs & enhanced capabilities).

In the following year (2020), OpenAI published a paper titled “Scaling Laws for Neural Language Models”, which further crystallised Sutton’s hypothesis.

The core insight was that AI performance improves with mathematical predictability, following power laws, as you scale compute, data, and model parameters. This proves that raw scale is a far more powerful driver of intelligence than specific architectural design.

Why does this matter?

It matters because technological progress is seldom predictable. The evolution of the internet, the advent of mobile, and the shift to cloud were all characterised by huge unpredictability.

Scaling laws, however, change this equation. If you know that a 10x increase in compute yields a reliable improvement in capability, then the frontier of AI is not a research problem; it’s a capital allocation problem.

The implications ripple outward from this core insight.

Consider the competitive dynamics among AI labs. If capability is purchased with compute, then the labs that can raise the most capital and secure the most chips will produce the best models. This is not a cottage industry for brilliant researchers in garages; it is a capital-intensive manufacturing process with massive economies of scale.

This is a major shift. For the past two decades, the defining logic of the technology industry has been software-centric. Software scales infinitely at zero marginal cost, which is why the aggregators–Google, Meta, Amazon–could build global monopolies with relatively modest infrastructure investments. The valuable asset was the code, the algorithm, the network effect.


This doesn’t mean that there is no value to be captured for other players. There are, in fact, several layers in this emerging stack:

At the bottom: compute infrastructure. Nvidia (and by extension TSMC, ASML Holdings, etc.) is the obvious winner here. Their GPUs are the de facto platform for AI training. But beyond Nvidia, the hyperscalers (Microsoft, Google, Amazon) are investing tens of billions of dollars annually in data centers.

In the middle: foundation model providers. OpenAI, Anthropic, Google DeepMind, Meta’s AI labs–these are the entities actually training frontier models. Their advantage is the combination of talent, capital, and proprietary training techniques. But here’s the uncomfortable reality: if scaling laws continue to hold, the moat is not the model itself (which can be approximated by anyone with enough compute) but the rate of improvement.

There is no switching or multi-homing cost, and hence it is very hard for these model providers to build a defensible moat beyond being ahead of the curve on capabilities and perhaps experimenting with interesting subscription models (e.g., ChatGPT Go, Google’s bundling & Perplexity’s Airtel Partnership).

At the top: applications and integrations. This is where most businesses will play, and owning proprietary data, specific workflows, or customer relationships can be the key to differentiation.

The a16z podcast “The AI opportunity that goes beyond models” highlights three amazing examples of startups doing this:

  1. Slingshot AI: They build an AI mental health companion. They started by taking notes for human therapists—a classic way to “do things that don’t scale.” By helping with the paperwork, they saw thousands of private sessions that aren’t on the internet. This gave them the “shadow data” needed to train a model that actually understands clinical empathy, which a general AI simply can’t learn from Reddit.
  2. FamilySearch: They run the world’s largest genealogy database. Their moat is physical friction: for decades, they sent people to remote villages to microfilm paper birth and death records. You can’t scrape what isn’t digital. They own the data because they were willing to do the tedious, manual work that Silicon Valley usually tries to avoid.
  3. vLex: This is a legal research platform that digitised fragmented, analog law records in Spain and beyond. By turning messy, offline paper trails into a “system of record,” they built a walled garden. Even the most powerful AI is useless if it can’t access the source of truth, and in this niche, vLex owns the keys to the library.

It would be a mistake to treat scaling laws as guaranteed to continue indefinitely. There are legitimate reasons for caution.

First, there is the data wall. As we exhaust the “low-hanging fruit” of high-quality human-generated internet text, the industry is forced into the expensive, artisanal world of synthetic data and human-in-the-loop labelling. There are early indications of performance degradation for models trained on outputs of other AIs.

Second, there is the distinction between loss (the statistical measure that scaling laws predict) and capability (the practical abilities we actually care about). Loss decreases smoothly, but capabilities often emerge abruptly and unpredictably. A model might go from 0% to 90% accuracy on a task with a single increment of scale. We cannot yet predict which capabilities will emerge when.

Third, AI follows power laws, where linear gains require exponential costs. We’ve reached a point where the next 1% of intelligence requires billions in hardware and massive energy, making “brute force” economically unsustainable for most.


That being said, we are still far off from reaching that scaling wall. As Tomasz Tunguz writes in “The scaling wall was a mirage”:

Then Gemini 3 launched. The model has the same parameter count as Gemini 2.5, one trillion parameters, yet achieved massive performance improvements. It’s the first model to break 1500 Elo on LMArena & beat GPT-5.1 on 19 of 20 benchmarks.

Oriol Vinyals, VP of Research at Google DeepMind, credited improving pre-training & post-training for the gains. He continued that the delta between 2.5 & 3.0 is as big as Google has ever seen with no walls in sight.

This is the strongest evidence since o1 that pre-training scaling still works when algorithmic improvements meet better compute.

Second, Nvidia’s earnings call reinforced the demand.

“We currently have visibility to USD 0.5 trillion in Blackwell and Rubin revenue from the start of this year through the end of calendar year 2026. By executing our annual product cadence and extending our performance leadership through full stack design, we believe NVIDIA will be the superior choice for the 3trillionto4 trillion in annual AI infrastructure build we estimate by the end of the decade.”

“The clouds are sold out and our GPU installed base, both new and previous generations, including Blackwell, Hopper and Ampere is fully utilized. Record Q3 data center revenue of $51 billion increased 66% year-over-year, a significant feat at our scale.”


This brings us back to Sutton’s bitter lesson. The evidence suggests we haven’t yet exhausted what scale can buy. But “not yet” is doing a lot of work in that sentence. The economics are getting harder. The next order of magnitude in compute will cost tens of billions of dollars. At some point, the capital allocation problem becomes a capital availability problem. Until then, the race continues. The winners will be those who can write the biggest checks, or those clever enough to build moats that don’t depend on writing checks at all.


You might also like