Phantom Nodes

March 21, 2026

About 80% of new businesses survive their first year. Around 50% make it to five. Roughly 35% are still standing at ten. In the BLS data, those bands have stayed remarkably stable since the mid-1990s. Not through the dot-com boom, cloud, mobile, the explosion of available capital, or AI. Thirty years of new methods, new infrastructure, new everything. The same result.

A caveat up front: these are establishment-level data, not a clean sample of venture-backed tech startups. They include restaurants, dry cleaners, and law firms alongside software companies. That makes them noisy. It may also make them more revealing. If the same stability holds across that much variety, the constraint is probably structural rather than methodological.

For most of that time, smart people have been trying to fix this. Lean Startup, Customer Development, the Business Model Canvas. These frameworks won. They're in 97% of university entrepreneurship course syllabi. They're the orthodoxy. And the chart didn't move. 80% of new businesses still survive their first year. 50% still make it to five. 35% to ten.

The usual assumption is that the survival rate is a problem and we haven't found the right solution yet. I think the survival rate is a measurement, and we've been misreading it.


Webvan raised $375 million to do grocery delivery in 1999 and went bankrupt. The standard take is they were "too early." But too early just means the world hadn't arranged itself into the right shape yet. Grocery delivery needed smartphones, GPS tracking, a gig labor workforce trained by ride-sharing, consumer trust in strangers showing up at your door, real-time three-party payment processing. None of it existed. Instacart worked fifteen years later because the prerequisites had resolved, not because the founders were better.

Facebook needed the social norms that Myspace established. DoorDash needed the behavioral infrastructure that Uber built. Not because these were linear steps in a plan, but because each one required conditions that only the previous wave could create. Webvan had the right idea in the wrong world. They were probing a position in the landscape that looked real but wasn't. A phantom node.


Zoom out from any single example and a structure emerges. Every innovation sits in a directed graph with prerequisites: technologies, infrastructure, behavioral norms, regulatory conditions that must exist before the new thing becomes feasible. And the structure is combinatorial. Most important nodes have multiple parents from different branches that must all resolve simultaneously. That's why timing is so precise, and why the same innovations keep getting independently discovered at the same time. Calculus, the telephone, evolution by natural selection. The dependencies resolve, the node ripens, and whoever is standing closest picks it. The identity of the winner matters less than we like to think once the complements have resolved.

You can't make a baby in one month by getting nine women pregnant.


The graph also branches. Each unlock exposes new child nodes. Personal computing exposes operating systems, which expose the web, which exposes social networks, which expose mobile-native platforms, which expose the gig economy, which exposes delivery. The frontier gets wider every generation.

Over 30 years, the number of startups grew substantially while average size at birth shrank (about 7 employees in 1994, about 4 by 2019). More probes, cheaper probes. If the number of real nodes were fixed, the survival rate should have dropped. More people competing for the same slots. But it stayed flat.

The explanation I find most compelling: the frontier expands proportionally. Each unlock exposes new nodes at roughly the rate new entrants arrive. And this isn't a coincidence that needs explaining. The two quantities are coupled. The expanding frontier is what draws new entrants in the first place. More feasible nodes means more visible opportunities, which means more people starting companies. The entrants don't arrive independently of the frontier growing. They arrive because it grew.

If that's true, the ratio of real nodes to phantom nodes may look roughly the same at every scale. And the chart may be a proxy for the shape of the opportunity frontier, not a report card on founder quality.

A caveat. The flat chart is consistent with this, but doesn't prove it alone. Composition shifts (more founders, smaller companies, lower commitment) could also produce flat rates. But the topology explanation accounts for something the composition story doesn't: why the ratio appears stable across very different conditions.


This changes what it means to be good at startups. Methods like Lean Startup improve how you search within an already-feasible node. Competition determines who wins a ripe node. The dependency graph determines which nodes are feasible at all, and how many. The first two operate within the frontier. The third determines the frontier. The chart is flat because the third dominates the first two in the aggregate. Methods and competition decide who succeeds. The graph decides how many opportunities are real.

Which makes "visionary founder" a more concrete idea than genius or hubris. It's graph-reading. Elon Musk has succeeded across rockets, EVs, neural interfaces, and AI. Reusable rockets became feasible when materials science, simulation compute, and manufacturing automation converged. EVs became feasible when lithium-ion costs crossed a threshold. The skill that transfers across wildly different domains is the ability to look at the current state of the world and see which nodes are about to become real. Great serial founders repeat because this skill is transferable. It's not domain expertise. It's reading prerequisites.

Venture capital is a portfolio bet on the graph. Spread enough positions across the frontier and you increase the odds of being near something ripening. VC returns follow a power law that hasn't changed much, and that distribution probably reflects the structure of the graph more than the skill of any individual investor.


I don't think any of this is specific to technology.

CRISPR is the clearest non-tech example. It required the mapped genome, which required PCR, which required understanding DNA polymerase. Multiple labs were converging on gene editing at the same time because the node was ripe. The same dependency layering shows up in art (Impressionism requiring portable paint tubes and photography's threat to representational art) and even in financial crises (CDOs requiring mortgage-backed securities to exist first). Wherever you look, progress seems to move by unlocking nodes whose prerequisites have resolved, and the graph constrains what's possible more than any individual actor does.


There's something strange worth sitting with. The internet, mobile, cloud, and AI were each massive general-purpose technologies. They didn't unlock one node. They unlocked entire new layers of the graph. The branching factor exploded every time. And the survival rate still didn't move. No more winners. No more losers. The same ratio, at a much larger scale. Because each wave drew proportionally more founders into the newly exposed frontier. The graph gets bigger but its shape stays the same.

This is probably why general-purpose technologies matter so disproportionately. They don't just advance one domain. They increase the branching factor across the whole graph. And it raises a question about AI specifically: if the constraint on progress is dependency resolution speed, AI might be the first technology that doesn't just widen the frontier but accelerates how fast nodes within it ripen. That would be a different kind of change. Not more nodes at the same speed, but the same nodes resolving faster. Whether this actually changes the chart or just draws in entrants even faster is an open question. The evidence so far is uneven. AI dramatically speeds up some work and actively slows down other kinds. The acceleration will be jagged across the graph.

The self-similarity claim is testable. Patent citation networks, product space data, and startup cohort data could measure whether the frontier's branching rate really keeps pace with new entrants. The literature on recombinant growth and path dependence points this direction, but the specific connection to survival rates hasn't been tested.

And if any of this is useful, the practical question is: which dependencies have recently resolved? For the current AI wave, the obvious candidates are evaluation infrastructure, expert-heavy vertical workflows where human judgment filters false positives, and the physical layer (power, cooling, interconnection) that moves at the speed of atoms, not bits.


The flat chart isn't a failure to help founders. It's a fingerprint of how progress actually works. The structure is stable even when everything around it changes, because it reflects a dependency graph whose topology is more fundamental than any method, capital base, or individual.

The founders who repeat are the ones who can read the graph.


Mathematical Appendix

Setup

Let G be a directed acyclic graph (DAG) where each node represents a potential innovation, company, or opportunity. Each node i has a set of parent nodes P(i) — its prerequisites.

A node i becomes feasible at time t when all of its parents have been unlocked:

feasible(i, t) = 1 if and only if ∀ j ∈ P(i): unlocked(j, t) = 1

A node that appears feasible to an observer but has unresolved prerequisites is a phantom node:

phantom(i, t) = 1 if feasible(i, t) = 0 and perceived_feasible(i, t) = 1

This is the Webvan condition. The idea is legible before its complements have resolved.

Frontier dynamics

At any time t, define:

When a node is unlocked, it exposes on average b new child nodes. Some of these will be immediately feasible (their other prerequisites already resolved), most won't. Let α be the fraction of newly exposed nodes that are immediately feasible. Then the frontier grows approximately as:

F(t+1) ≈ F(t) + αb · unlocks(t) − unlocks(t)

The first term is new feasible nodes appearing. The second is feasible nodes being consumed by successful startups.

The survival rate

A startup founded at time t succeeds (survives to horizon τ) if it is probing a real node and wins the competition at that node. The probability decomposes as:

S(τ) ≈ P(real node) · P(win | real node)

Where:

P(real node) = F(t) / [F(t) + Φ(t)]

This is the ratio of real opportunities to total perceived opportunities. It's the structural term — what fraction of the things that look like good ideas actually are good ideas right now.

P(win | real node) ≈ 1 / k(t)

Where k(t) is the average number of entrants competing at each feasible node. This is where methods (search quality) and competitive dynamics (Red Queen) operate.

Why the rate is stable

The key claim is that F(t) / [F(t) + Φ(t)] is approximately constant over time. This happens if the graph is self-similar: the ratio of feasible to phantom nodes at the frontier looks the same regardless of the frontier's size.

Intuitively, each generation of the graph has a similar combinatorial structure. When a GPT (general-purpose technology) widens the frontier, it exposes many new nodes, but most of them have unresolved co-prerequisites from other branches. The fraction that are immediately feasible stays roughly constant because the dependency structure at the new frontier is statistically similar to the old frontier.

Additionally, N(t) is coupled to F(t). More feasible nodes → more visible opportunities → more entrants. If N(t) scales roughly linearly with F(t), then k(t) stays roughly constant too. Both terms in the survival equation hold steady.

The result:

S(τ) ≈ constant

Not because nothing changes, but because the structural quantities change together.

Three levels, formalized

The essay distinguishes three levels. In the model:

  1. Search quality (M) — affects how efficiently a founder identifies whether their node is real or phantom, and iterates toward product-market fit within a feasible node. Operates on P(win | real node). Lean Startup lives here.

  2. Competition (C) — the number of entrants per feasible node, k(t). Also operates on P(win | real node). Red Queen lives here.

  3. Frontier topology (F, Φ) — the ratio of real to phantom nodes. Operates on P(real node). The dependency graph lives here.

The aggregate survival rate is dominated by level 3 because M and C are approximately symmetric across founders (everyone has access to the same methods, faces similar competition), while the frontier ratio is a structural property of the graph that no individual founder can change.

Branching factor

The branching factor b is not constant. It depends on the type of node being unlocked:

This explains why GPTs matter disproportionately. A GPT doesn't just advance one branch. It increases b across the whole graph, widening the frontier by a multiplicative factor rather than an additive one.

However, high b doesn't change the survival rate, because the new frontier has the same statistical structure as the old one (self-similarity), and the increased opportunity surface draws proportionally more entrants.

Testable predictions

  1. Simultaneous discovery frequency should correlate with dependency resolution clustering. When multiple prerequisites resolve in a short window, you should see more independent attempts at the same node. Patent data could test this.

  2. "Too early" failures should cluster at nodes with high parent count. The more prerequisites a node has, the more likely some are unresolved, and the more likely an entrant is probing a phantom. Startup failure data matched against technology dependency maps could test this.

  3. Entry rates should lag frontier expansion, not lead it. If N(t) is coupled to F(t), you should see startup formation rates increase after major platform unlocks, not before. BLS formation data against GPT adoption timelines could test this.

  4. The ratio F/(F+Φ) should be approximately scale-invariant. Product space data and patent citation networks could measure whether the feasible fraction at the frontier holds steady as the graph grows.

Notation summary

SymbolMeaning
GThe dependency graph (DAG)
P(i)Set of prerequisites for node i
F(t)Number of feasible nodes at time t
Φ(t)Number of phantom nodes at time t
N(t)Number of new entrants at time t
bAverage branching factor
αFraction of newly exposed nodes that are immediately feasible
k(t)Average entrants per feasible node
S(τ)Survival rate at horizon τ
MSearch quality (methods)
CCompetitive intensity