The Diligence Stack - By Creative Strategies

The E/AI Index: Budget Architecture and the Next Phase of Enterprise AI Adoption

Ben Bajarin — Tue, 12 May 2026 15:15:42 GMT

We have been working through the enterprise AI monetization question in the order above because the market still mistakenly compresses several different issues into one broad adoption narrative. The first report in this series focused on customer ROI and argued that AI becomes economically meaningful when it changes the cost, speed, staffing, quality, or capacity attached to a repeated unit of work. The second report moved from ROI proof to platform control and asked which layer captures the economics once those workflows move from pilot to production. This report takes the next step into budget architecture, where the AI cycle becomes easier to analyze because CIOs are now deciding which existing spend pools should fund the next wave of deployment.

An observation worth highlighting, that comes on the back of our 25+ years studying this industry, is that enterprise AI adoption will not move through the market as one uniform curve. Consumer technology has always had different adoption profiles, with early adopters, the early majority, late majority buyers, and laggards all behaving differently as a market matures. Enterprise technology follows a similar pattern, although the behaviors are shaped by budget ownership, governance, security posture, regulatory burden, data readiness, organizational complexity, and tolerance for operational risk. The most progressive enterprises are important use cases because they give us leading indicators of what may become possible. They also represent a minority of the market, and their behavior should not be treated as the steady-state adoption pattern for the long tail of enterprise buyers.

That is why we are committed to tracking the adoption path in real time rather than drawing broad conclusions from the most aggressive early adopters alone. Early enterprises can move faster, accept more deployment complexity, tolerate higher model or integration cost, and make bigger organizational changes because they often have stronger technical teams, cleaner data architecture, more executive sponsorship, or a clearer strategic mandate. The broader enterprise market behaves differently. The majority of companies move through procurement, governance, legal review, security architecture, workflow integration, and CFO scrutiny at a slower pace. That does not make the cycle less real, but it changes how we should think about timing, value capture, and the durability of current usage patterns.

Our CIO E/AI Index work helps locate where enterprise AI sits on the adoption curve. AI has clearly moved beyond experimentation, but it remains early in terms of full-scale operating change. In the panel, 67% of respondents now have AI running in production functions, 69% have a dedicated AI budget, and AI represents roughly 9% of IT spend on average, with budgets expected to grow again in the next planning cycle. That supports the broader AI spending cycle, but the more important finding is that enterprise behavior is becoming segmented. Progressive adopters are moving faster, funding AI more deliberately, and beginning to separate what deserves scale from what stays in pilot.

The funding mix is where the budget story becomes more useful. Roughly 28 cents of the incremental AI dollar comes from net-new IT budget expansion, while the remaining 72 cents is being pulled from somewhere else inside the enterprise. Existing software budgets, IT services, systems integrators, BPO contracts, headcount savings or slower hiring, license consolidation, cloud expansion, and business-unit budgets all show up in the mix. We call this out because a dollar added to the IT envelope behaves differently from a dollar taken out of a software renewal or services contract. Enterprise AI increasingly contains both at the same time, which is why the vendor read-through is becoming more selective than the aggregate spending numbers imply.

This is also where the pricing question gets more complicated. CIOs are asking which tools can be funded from an existing cost pool, which workflows have enough measurement discipline to justify the spend, and which vendors are charging an AI premium without changing the underlying job being done. A vendor with AI attached to a governed workflow, tied to a measurable operating metric and funded from services avoidance, license consolidation, or cycle-time compression, has a stronger pricing claim than a vendor adding AI on top of a seat count the customer is already trying to rationalize.

The early ROI evidence still points toward throughput before broad labor replacement. Respondents report higher output with the same headcount, faster project delivery, lower external services spend, and slower hiring before large-scale headcount reduction. That means the first financial evidence of AI is more likely to show up in services-line compression, faster internal project cycles, reduced outside implementation capacity, slower backfill behavior, and lower unit cost inside support, development, analytics, and IT operations. Headcount reduction may come later in some categories, but it is not the first or cleanest marker of value.

The incumbent read is therefore conditional on workflow depth. Cloud infrastructure, cybersecurity, data platforms, and productivity suites screen positively because they own interfaces and control points AI must traverse to be useful and governable. Categories tied more closely to seat access, content production, manual routing, summarization, or labor-heavy services carry more risk because AI can compress the work without requiring replacement of the system of record. The full report works through that category split in detail, because the market can treat both as “software,” while CIO budget behavior is already separating the two.

Agentic AI fits the same logic. The priority is clear, but production deployment remains concentrated in bounded workflows where approval paths, data access, and auditability can be controlled. The constraints are increasingly operational: data readiness, integration, identity, permissioning, auditability, security, exception handling, and cost predictability. That suggests the next layer of enterprise AI spend should accrue not only to model access or generic copilots, but to the control plane that makes AI deployable at scale.

The practical implication is that enterprise AI should now be analyzed through budget formation and adoption-profile behavior, not adoption alone. The evidence increasingly says AI is economically meaningful, but the value will not distribute evenly across the stack or arrive uniformly across enterprise cohorts. The next phase of diligence is identifying which budget lines AI is consuming, which workflows are producing measurable returns, and which vendors can convert those returns into pricing power without triggering procurement pushback. That is where AI moves from usage to earnings power.

The E/AI Index is Creative Strategies’ recurring CIO/CTO research series tracking how enterprise AI moves through adoption profiles, budget formation, deployment maturity, workflow ROI, and vendor selection.

What paid subscribers get in the full report

The full E/AI Index dataset and adoption-profile read, including how progressive early adopters differ from the broader enterprise market and why the long tail of CIO behavior matters for sizing where we are in the AI cycle.
A budget-source model for the incremental AI dollar, separating net-new IT budget expansion from dollars being reallocated out of software, services, BPO, headcount, license consolidation, cloud, and business-unit budgets.
A category-level spend-intent map across the enterprise stack, showing where CIOs expect to increase spend, where budgets are being reviewed, and which categories screen as most exposed to AI-driven substitution.
A framework for separating real AI ROI from usage theater, focused on workflows with measurable baselines, budget owners, governance requirements, and operating denominators that procurement can actually defend.
Our read on why services compression comes before broad application replacement, including the specific kinds of SI, managed-services, implementation, testing, documentation, and support work most exposed to AI-enabled internal teams.
An incumbent-risk model based on workflow depth, distinguishing platforms with identity, data context, governance, and execution control from vendors more exposed to seat access, manual routing, content production, or shallow workflow attachment.
Updated agentic AI deployment data, including the gap between pilots, employee-assist workflows, narrow execution under human approval, and true multi-step production agents across systems.
A control-plane spending map, covering the data readiness, identity, security, auditability, observability, workflow integration, and cost-predictability layers that CIOs say are now gating broader deployment.
A risk hierarchy for enterprise AI adoption, including where CIOs are most concerned about data leakage, reliability, regulatory exposure, shadow AI, proprietary workflow exposure, and model-cost unpredictability.
The full read-through for 2026 and 2027, including what we are watching, what would change our view, and which signals would indicate AI is moving from controlled production into broader budget reallocation across the enterprise.

From Model Wars to Platform Wars

Ben Bajarin — Thu, 07 May 2026 14:57:20 GMT

This report is the natural follow-on to our recent note, From AI Usage to AI Earnings Power.

That note argued that the first investable AI cycle is forming inside dense workflows where work is repeated, measured, permissioned, and close to execution. Customer ROI is the bridge from usage to monetization, and the durability of AI software, model, and infrastructure spend depends on whether AI changes the economics attached to a repeated unit of work.

This report takes the next step. If customer dollars begin moving inside those workflows, the value-capture question becomes which layer of the enterprise stack captures them as deployments convert from pilot to production. The platform competition is the immediate consequence of the AI ROI discussion. Platform control follows budget formation.

The model race still remains for capability and lab-layer unit economics. The customer relationship increasingly depends on the interfaces that own permissions, telemetry, audit, and the budget conversation once agents do real work.

From ROI Proof To Platform Control

Our prior report focused on which workflows justify enterprise AI spend. This report focuses on who captures the economics once those workflows move into production.

Three points connect the two reports. AI ROI evidence is forming inside a more specific set of workflows than the broad enterprise AI narrative implies, and those workflows share a baseline, a budget owner, and a measurable economic unit of work. They are also where execution, permissioning, and audit are most concentrated, which makes them the workflows where platform position is most contested. Value capture follows the layer that proves and prices the completed work. Model supply and interface ownership also matter, but they do not automatically determine where the durable economics settle. Usage and revenue can grow at different layers at the same time, and the long-duration economics tend to accrue to the layer the customer associates with budget movement.

The Three Control Points

We map the value-capture question across three layers. The model, runtime, and interface layer is where foundation model vendors compete to become the default intelligence behind agentic workloads and, increasingly, the entry point where user intent enters the system. The orchestration and workflow layer turns model calls into completed business outcomes. The system-of-record and governance layer owns identity, entitlement, approval, and audit. Economics do not always accrue where compute is consumed. They accrue where enterprises standardize the system of action.

OpenAI is executing a horizontal strategy built on distribution and intent capture, and the test is whether horizontal usage becomes recurring workflow ownership. Anthropic is executing a narrower production-credibility strategy, with coding as the strongest current ROI wedge, and the test is whether that wedge travels into other measurable-ROI workflows. Neither answer is fully proven against the workflow evidence we have so far. See our deep dives on both companies.

Why Coding Is The Operational Laboratory

Coding is the first workflow where AI ROI, workflow ownership, and governance bottlenecks are visible at the same time. The bottleneck has moved from authoring to review, integration, security, and policy. We treat coding as the live laboratory for how the value-capture question is likely to evolve in legal, finance, customer operations, regulated servicing, and analytics, not as the full market.

Three Cross-Cutting Themes

Pricing is moving from seats toward hybrid seat, action, and outcome models that price the economic unit of work. Enterprises are multi-model by design and consolidating by workflow, which is not the same thing as consolidating by vendor. The labor mix is moving toward a new role we describe as the AI orchestrator.

The market can support more than one winner across layers. The model, runtime, and interface layer likely supports two to three franchises with operating credibility. The orchestration and workflow layer likely supports a handful of category leaders. The governance layer should consolidate toward a smaller set of platforms that own identity, entitlement, and audit. Reading the question as a binary between OpenAI and Anthropic risks missing where customer ROI dollars are accumulating inside the workflows we tracked in the prior note. The next question is whether buyers are ready to fund those workflows at scale, which is where our CIO/CTO survey work picks up next week.

What Paid Subscribers Get In The Full Report

• Our three-layer map of the enterprise AI control plane, with the criteria we use to judge which layer captures lasting margin in each category.
• A direct mapping from the AI ROI evidence in the prior report to platform position, including which workflows are most likely to define the next wave of value capture.
• A variant-perception section: what consensus believes, what is underappreciated, and where this report differs.
• A side-by-side read on OpenAI and Anthropic, sharpened around distribution and intent capture versus production credibility and utility economics, with the conditions that would cause us to reweight each.
• The counterthesis on workflow and system-of-record incumbents, with a named list of vendors we view as structurally advantaged in the agentic cycle once ROI is proven.
• Our triangulated view of frontier-lab revenue, enterprise penetration, workload share, and growth, expressed as ranges rather than point estimates and marked as triangulated rather than company-guided.
• An expanded read on Claude Code and coding agents, including what ROI, bottleneck migration, pricing dynamics, and workflow ownership tell us about where the next high-value workflow lands.
• Pricing architecture in practice, with the seat, action, outcome, and work-unit combinations we are seeing across CRM, ITSM, developer, and security suites.
• The buyer behavior view, including why multi-model by design does not prevent consolidation at the workflow layer.
• Token economics and unit cost curves, including how we think about deflation, call volume, and gross margin trajectory for model vendors and platforms.
• Labor and org design, including the AI orchestrator role, the change in junior hiring, and the productivity and quality tradeoffs we see in main-branch work.
• The security and governance read, including why AI-generated code failure rates matter for enterprise budget decisions and how that lesson generalizes to adjacent workflows.
• A diligence approach for evaluating any company in the stack, with the questions we ask and the red flags we weight most heavily.
• A twelve-month watch list across labs, platforms, incumbents, and private companies, with specific and measurable signals.
• What must be true for OpenAI, Anthropic, and the counterthesis to each hold, expressed as testable conditions rather than narratives.
• What would make us wrong, expressed as explicit falsification conditions with observable signals investors can track.
• The value-capture implications for public platforms, private labs, and the enterprise software stack, expressed as analytical weighting and indicators to track.

From AI Usage to AI Earnings Power

Ben Bajarin — Tue, 05 May 2026 14:50:20 GMT

Research Series Note: We are spending the next several reports on the AI monetization question from three angles. Today’s note focuses on what customer evidence is telling us about AI ROI. Thursday’s report will look at how we see the AI platform war evolving as value capture moves from models to workflows, data, and control points. Next Tuesday, we will publish findings from a CIO/CTO survey we collaborated on, focused on AI ROI, budget formation, and how enterprise buyers are deciding which deployments deserve more funding.

Over the better part of the last two years, we have tracked enterprise AI adoption as it moved from experimentation to early production, with the ROI discussion becoming more central as deployments moved closer to operating budgets. The process is still early, and we would be careful not to overread any single customer story or survey result, but we are at the point where AI has clearly moved from capability to economic proof. Most companies no longer need to be convinced that AI can generate useful output, improve workflows, or create new product experiences. The harder issue is whether those improvements are large enough, repeatable enough, and measurable enough to justify the next layer of spending across software, models, services, and infrastructure. That is where the market’s AI narratives start to miss the economic question. Usage shows that a product has distribution. ROI shows whether the customer has a reason to keep funding it. We outlined the software monetization model needed for AI in the report below.

That is why we went through the exercise of collecting tangible AI ROI use cases rather than relying on broad adoption surveys (we have that data as well) or vendor-level attach commentary. We are sensitive to the broader question of AI cycle durability, and one of the key variables in that discussion is whether customers can point to enough measurable value to justify the spending now being built into software roadmaps, frontier-lab revenue expectations, and infrastructure deployment plans. The durability of this cycle will not be determined by capability alone. It will depend on whether AI becomes economically useful enough inside customer workflows to support continued budget formation.

That distinction is becoming more important as AI moves deeper into enterprise budgets. Software vendors need customer ROI to defend premium SKUs, higher attach rates, and usage-based pricing. Frontier labs need enterprise ROI to support consumption growth and the revenue expectations now embedded in the category. Infrastructure vendors need the same proof because the broader AI capex cycle ultimately depends on customers being able to convert compute into business value. CIOs and CFOs are also changing how they evaluate AI projects. The experimentation phase is not over, but the next wave of spending will face a more practical test: which workflows are being repriced, restaffed, accelerated, or made cheaper to run?

Our latest research note looks at agentic AI through that lens. Our conviction, as of now, is that the better read is that the first investable ROI cycle is forming in a more specific set of workflows than the broad enterprise AI narrative implies. The strongest evidence appears where AI is tied to repeated work with a measurable baseline and a budget owner already attached. Contact centers, regulated servicing operations, IT access management, developer workflows, and enterprise search or context layers all prove out well because they have visible denominators: calls, tickets, access requests, summaries, code cycles, compliance reviews, or knowledge retrieval tasks. When AI changes the cost, speed, quality, or capacity of those units of work, the monetization claim becomes easier to evaluate.

Outside that group, the evidence is still mixed. Sales, RevOps, HR, and finance back-office workflows have large budgets and repeated tasks, but attribution, exceptions, permissions, and liability make automation harder to underwrite. Broad knowledge-worker assistants may show usage and time saved, yet those metrics often stop short of proving a funded operating change. Fully autonomous cross-enterprise agents remain even earlier, with reliability, identity, data integration, and liability still limiting the move from interface to execution. The evidence is still incomplete, but the direction is worth taking seriously.

For stakeholders, we believe the economic issue is budget formation (as we will show in our CIO/CTO survey). The customer ROI test starts with the spend pool the agent is permitted to influence. Labor in a contact center, after-call work in a servicing operation, IT ticket queues, access management, developer capacity, onboarding, enterprise search, and compliance documentation are all different budget conversations. The more measurable the work, the easier it is for the customer to defend spend and for the vendor to price against value created.

That also changes how we think about value capture. The agent interface may not (more on this Thursday) always be the economic control point. In some workflows, the application vendor controls the system of action. In others, the data and context layer gets funded first because enterprises need governed knowledge before agents can act. Identity and security vendors may become more central as agents behave like non-human actors inside enterprise systems. Model providers can see consumption grow while the workflow economics accrue to applications, orchestration layers, or internal routing systems. Services firms may benefit from data cleanup and integration near term while facing pressure later if AI automates repetitive support, QA, migration, or maintenance work. For more on this, see our report on who has competitive moats in SaaS.

The practical implication is that AI ROI should be evaluated at the workflow level before it is extrapolated to the enterprise software stack. The cycle that earns the most analytical weight first will be the one where the productivity claim and the monetization claim become the same claim. That is the core focus of this report.

Paid subscribers get the full report, including:

A ranked evidence ladder separating stronger operating-economics cases from projected, anecdotal, infrastructure, and risk evidence.
Five primary customer case studies across healthcare call centers, regulated mortgage servicing, IT access management, developer workflows, and enterprise search/context.
A workflow-density heat map showing which enterprise AI use cases have the clearest near-term ROI visibility and which remain harder to underwrite.
A customer-story-to-budget-formation table mapping each use case to the affected budget line, likely control point, public-company read-through, and durability test.
A value-capture layer map covering workflow owners, data/context platforms, developer platforms, identity and security, model providers, SIs, and vertical AI vendors.
An earnings-call tracking dashboard focused on production conversion, paid attach versus bundling, workflow-level economics, data-readiness pull-through, agent governance, developer-tool durability, services mix, and pricing-model evolution.
A risk framework for where the market may be overgeneralizing, including broad copilot adoption, AI attach, seat-based SaaS pricing, SI exposure, model-layer value capture, and data-readiness bottlenecks.
The broader AI monetization read-through tying customer ROI evidence to software pricing, frontier-lab consumption, and the durability of infrastructure spend.

SanDisk’s NBM Moment: A Different Commercial Model for Memory and Storage

Ben Bajarin — Fri, 01 May 2026 15:00:35 GMT

SanDisk’s earnings were extraordinary by any normal semiconductor standard, however, what stood out most from the quarter was the company’s disclosure around new multi-year customer agreements. SanDisk’s NBM updated disclosure gives us a useful early view into how the commercial structure of storage (and memory) may evolve as AI infrastructure becomes a larger share of demand. The historical model for NAND has been built around bit supply, utilization, spot pricing, inventory, and capex discipline. Those variables don’t go away, although they no longer capture the full economic structure if customers are willing to reserve future supply and attach financial commitments to that demand. The more useful forward framework is likely to include contract coverage, enforceability, pricing structure, renewal cadence, and the portion of future bits already tied to customer infrastructure plans.

The nature and size of SanDisk’s disclosure are what make the shift worth highlighting, and contemplating how this structure may fundamentally change memory and storage contracts going forward. Management said it has signed five multi-year NBMs to date, with more than one-third of FY27 bits already under firm customer commitments, more than $11 billion of financial guarantees, and roughly $42 billion of minimum contractual revenue from only the three agreements signed during the quarter. The agreements include quarterly volume commitments, a mix of fixed and variable pricing, and durations that can extend up to five years. We call this out because they take the focus off near-term pricing strength and say more about the value customers are placing on assured future access. For SanDisk, the benefit is better visibility into consumption, allocation, mix, and margin durability.

The supplier-customer mismatch is the key point and functional change. SanDisk runs a fab-based model with relatively consistent output, while customers have historically wanted supply assurance and quarterly pricing optionality at the same time. Management described the new structure as a way to obtain “certainty of economics,” which we think is the most useful phrase from the call. A supplier can make different decisions around allocation, inventory, capex, and customer mix when demand is committed and financially backed rather than forecasted and repriced every quarter. The customer also receives a more reliable supply path for infrastructure plans that are becoming harder to adjust at the last minute. As stated, we believe this structure becomes a normal environment for all those we label as masters of the supply chain with regard to storage and memory, and very likely the entirety of the semiconductor supply chain.

As should be obvious, AI is the demand mechanism behind the change. But more specifically, as we have been calling out with regard to memory and storage, it is the co-designed nature of AI-related memory and storage content that is driving this change. For that reason, management connected the NBM structure to inference, longer context, KV cache, RAG, and agentic systems, all of which increase the need for high-performance, low-latency flash inside AI infrastructure. In that environment, NAND is becoming more integrated into the AI factory because systems need to retain context, intermediate data, and external datasets around the model. When customers commit years of demand against those requirements, it tells us storage access is becoming valuable enough to reserve in advance, especially when the cost of being wrong on supply can affect broader infrastructure deployment. We detail all that is going on with storage in co-optimized AI infrastructure in our deep dive on storage.

We would be careful with any claim that cyclicality is over as a whole. NAND and DRAM will still have pricing cycles, inventory corrections, supply responses, and periods of digestion after customers pull demand forward. The better observation, and which is our conviction, is that AI infrastructure may be changing the shape and severity of those cycles. The old model allowed customers to preserve optionality while suppliers absorbed most of the volatility. This model moves part of that volatility back to customers through committed demand, financial guarantees, and purchase obligations. If these structures broaden, the cycle becomes less dependent on quarterly spot-price negotiation and more dependent on how much future supply has already been allocated under enforceable commitments. The latter is how we see this playing out.

The broader read-through is that SanDisk may be the clearest public example of a larger shift already forming across memory and storage. The same logic should apply to HBM, high-capacity DRAM, and other AI-tied memory configurations where future access is even more strategically important. Hyperscalers, model labs, and AI infrastructure operators are planning GPU clusters and inference capacity years in advance. That planning increasingly requires a memory and storage stack with similar visibility. Customers may still prefer flexibility, although the cost of being under-allocated is rising as AI systems become more dependent on specific memory and storage configurations.

For stakeholders, the modeling framework should expand beyond near-term ASPs and bit growth. Contracted bit coverage, RPO-like disclosure, financial guarantees, fixed versus variable pricing exposure, customer renewal behavior, and the maturity ladder of agreements become more useful indicators of earnings quality. The key debate moves from whether current earnings represent peak conditions to how much of the earnings base is supported by customer behavior that looks structurally different from prior cycles.

Our view is that SanDisk’s NBM disclosure is an early sign that memory and storage are moving toward a different commercial architecture. The market has historically discounted peak memory earnings because the cycle eventually gave them back. If a larger portion of forward supply becomes contracted, enforceable, and tied to multi-year AI infrastructure demand, then durability becomes a larger part of the discussion. That would be a materially different way to model storage and memory over the next several years, or longer. For further reading, see our reports below.

Microsoft’s AI Capex Is Buying the Enterprise Agent Stack

Ben Bajarin — Thu, 30 Apr 2026 17:13:34 GMT

After completing AI growth thesis reports on Amazon and Google, both of which held up very well in light of both companies’ earnings reports yesterday, we now turn our attention to Microsoft.

Microsoft’s AI debate has been too centered on the visible parts of the model: Azure capacity, GPU availability, capex, and cloud gross margin. Those are the right places to start, but they do not fully explain the growth thesis. The issue we keep coming back to is allocation. Microsoft does not have unlimited deployable AI capacity, and every GPU pushed toward one workload has an opportunity cost somewhere else. Some capacity is sold externally through Azure. Some support OpenAI-related demand. Some is consumed internally by Microsoft 365 Copilot, GitHub, Fabric, Foundry, Dynamics, Security, and the agent-governance layer now forming across the enterprise stack. That allocation decision is where the earnings power debate sits.

Q3 gave more support to the view that Microsoft’s AI numerator is expanding. AI annualized revenue run rate surpassed $37 billion, Azure grew 40%, and Microsoft 365 Copilot crossed 20 million paid seats. We would not treat those as separate proof points. They point to the same mechanism: Microsoft is turning AI capacity into infrastructure consumption, software seats, developer usage, data-platform pull-through, and agent tooling. This is why Azure-only return math is the wrong singular focus. It captures the most visible revenue stream, while missing the monetization that shows up across the software estate.

We understand the attention high fixed costs get as a part of this cycle. Capex remains elevated, Q4 spend is guided higher, and calendar-year 2026 capex is expected to reach roughly $190 billion, including about $25 billion tied to component inflation rather than incremental capacity. Microsoft Cloud gross margin compressed to 66%, with Q4 guided to roughly 64%. We recognize the margin pressure, but the key question is whether Microsoft’s software monetization scales quickly enough to offset the denominator the market can already see. If Microsoft can convert scarce compute into Microsoft 365 ARPU, GitHub usage, Fabric consumption, Security attach, and agent-governance revenue, this becomes a broader enterprise software monetization cycle rather than a pure Azure capacity build.

The broader cloud RPO and backlog data help explain why the debate over whether the AI infrastructure cycle is durable remains flawed. Across AWS, Google Cloud, and Microsoft, the forward revenue base tied to cloud and AI infrastructure has moved from steady compounding into a steeper AI-era slope. The numbers are not perfectly comparable: AWS discloses long-term performance obligations primarily related to AWS, Alphabet reports remaining performance obligations, or “revenue backlog,” primarily related to Google Cloud, and Microsoft reports total commercial RPO rather than Azure-specific backlog. Even with those caveats, the direction is useful. AI demand is now showing up in contracted revenue visibility across the major cloud platforms, adding another layer of evidence beyond management commentary, quarterly growth rates, and capacity announcements.

The backlog inflection provides the industry evidence behind the durability debate. AWS and Google confirm that AI infrastructure demand is broadening across hyperscale cloud, but Microsoft’s growth thesis turns on a familiar question: whether scarce compute allocated internally can convert into software ARPU, usage meters, Fabric pull-through, security attach, and agent-governance control points. That is why the Microsoft debate cannot stop at backlog, Azure growth, or capacity additions.

The first-party allocation thesis remains a key part of the core angle. Our working model assumes roughly 30% of newly deployed GPU capacity is going to first-party workloads and roughly 70% to external Azure customers, triangulated from channel work and consistent with Q3 commentary. Internal capacity allocated to Microsoft 365 Copilot, GitHub Copilot, Foundry, and Dynamics AI does not appear as third-party Azure revenue. It monetizes through seat ARPU, data pull-through, workflow attach, and governance. The cost is visible in cloud gross margin today, while the software return is still scaling across Microsoft 365 Commercial Cloud and the broader enterprise estate.

Customer evidence is moving in Microsoft’s direction, but adoption is still early. CIO and channel work places AI at roughly 8–10% of IT budgets, with most large organizations now carrying a dedicated AI budget line. Funding is coming from both reallocation and incremental budget, and the reallocation pressure appears heavier on IT services and systems integrators than on cloud infrastructure or cybersecurity. That mix favors Microsoft’s protected cloud and security layers and its packaged AI software model. The constraint is production maturity, with channel checks still showing GenAI rollout in the single digits and most customers in pilot phases.

We understand the risks that still exist. Microsoft Cloud gross margin is under pressure, component inflation is now a material capex variable, and bookings will remain noisy as the OpenAI comparison rolls through the model. The revised OpenAI partnership improves Microsoft’s economics on OpenAI-derived workloads through retained equity, revenue-share payments through 2030 subject to a cap, continued non-exclusive IP rights through 2032, and the removal of Microsoft’s revenue-share obligation to OpenAI. The offset is that OpenAI now has more freedom to serve products across other clouds.

We would evaluate Microsoft’s AI capex as a portfolio monetization cycle, not only as an Azure capacity cycle. Azure is the infrastructure base, but the return path runs through Microsoft 365 Copilot, E5, the announced E7 frontier suite, Fabric, GitHub, Dynamics, Security, and the Agent 365 / Copilot Studio governance layer. The infrastructure cost is already visible in the income statement, while software and governance ARPU remain earlier in their curve. The next four quarters should show whether Microsoft’s user-plus-usage transition accelerates faster than depreciation pressure weighs on cloud gross margin. Microsoft’s AI capex is buying the foundation for a broader enterprise software pricing and control layer.

What subscribers get in the full note

Full multi-vector AI ROIC framework, including why Azure-only return math understates the numerator after the Q3 print.
Updated earnings analysis covering AI ARR, Azure growth, capex composition, capacity, and the OpenAI restructure.
First-party versus third-party GPU allocation, refreshed with the Q3 allocation language.
Customer-side validation triangulated across CIO, partner, and channel work on AI budget formation, vendor positioning, and deployment maturity.
Azure growth, capex, margin, and capacity model, including component-price disclosure and Q4 trajectory.
Copilot, E5, and the E7 frontier suite as a layered ARPU model.
Fabric and the data pull-through layer underneath enterprise agents.
Agent 365 and Copilot Studio as the enterprise agent governance and control plane.
OpenAI concentration, the new revenue-share structure, multi-cloud risk, and Anthropic integration.
Competitive framework versus AWS and Google, scenario framework, and a metric monitoring dashboard for the next four quarters.

Custom ASIC Is No Longer One Market

Ben Bajarin — Tue, 28 Apr 2026 16:08:29 GMT

Coming out of discussions at Google Cloud Next last week, we expect custom AI silicon to move back toward the center of the AI infrastructure debate. The spend trajectory across hyperscalers, dedicated infrastructure operators, and frontier model labs continues to support our view that custom silicon remains a key component of the buildout. The next phase of the discussion should focus on how that spend is captured, which layers of scope carry durable economics, and where strategic control shifts as customers become more sophisticated.

The public conversation still tends to group custom ASIC exposure into a single market narrative at a point when the underlying market is becoming more layered. Custom ASIC now covers programs with meaningfully different economics, margin structures, durability, and control points. That shorthand was useful when the category mostly referred to a compute die. The category now increasingly describes full systems spanning compute, memory, networking, I/O, packaging, and integration.

That shift changes how we frame the opportunity. Vendor exposure should be evaluated by scope quality: which layers of the program are owned, how scarce those layers are, how much execution risk they remove, and whether the role can support durable earnings quality as hyperscalers retain more architectural control internally.

Across the cohort, the businesses being grouped together are doing very different work: some vendors are selling broad execution ownership across compute, packaging, and networking, others are selling attach, others are selling I/O and modular implementation, and others are monetizing physical design, foundry adjacency, and packaging coordination. Hyperscalers will continue to outsource what is scarce, risky, or time-sensitive, while continuing to push to reclaim the layers where internal ownership lowers cost or improves control, and the variable investors should be tracking is which vendor owns which layer of the stack, how durable that scope is, what attach travels alongside it, and where insourcing pressure is most likely to land first.

Why this matters now

Google’s most recent TPU roadmap is the most timely evidence for this view. By separating the eighth-generation TPU family into training-oriented and inference-oriented chips, Google is signaling that workload classes are now diverging at the silicon level. We expect that divergence to deepen as training systems continue to optimize around scale, synchronization, memory bandwidth, and reliability, while inference systems optimize around latency, utilization, cost per token, and deployment flexibility.

That divergence should also change supplier allocation. The full partner structure across these systems is not completely disclosed, but supply-chain checks point to a more layered model in which different external partners participate in different parts of the stack while Google retains significant internal architectural ownership. The same direction of travel is visible across AWS, Microsoft, Meta, OpenAI, and Anthropic. Each customer appears to be making its own decision about where internal architecture matters most and where external execution, IP, or capacity can create leverage.

The framework

The full report separates the cohort into five distinct business models, each with its own profit pool and its own relationship to insourcing pressure. Premium full-scope orchestration carries the broadest scope and the strongest margin tier alongside the most concentrated insourcing pressure across the next two to three generations. Flexible compute plus attach can be margin dilutive on standalone compute and is decided on whether attach travels alongside the compute award. Hybrid I/O and modular custom silicon represents a different price point and is the most important business-model experiment in the group. Back-end implementation and turnkey execution carries lower margin and higher beta to ramps, while foundry-adjacent packaging and enablement carries scarcity exposure tied to advanced packaging tightness, with revenue conversion that is more lumpy than headline ASIC framings imply.

Applied across Broadcom, Marvell, MediaTek, Alchip, and GUC, the framework produces five very different exposures. The margin spread alone runs from blended margins in the teens at the back end of the stack to mid-50s at the premium full-scope orchestration tier. That gap is too wide to treat custom ASIC exposure as economically equivalent across the vendor base. Custom silicon is continuing to be important part of the AI infrastructure buildout, but the next phase of the debate will be decided by scope quality rather than socket wins: who owns scarce IP, who reduces execution risk, and who can attach higher-quality content to the compute program itself.

A Broadcom dollar, a Marvell dollar, a MediaTek dollar, an Alchip dollar, and a GUC dollar do not carry the same margin structure, durability, or strategic risk. Treating them as comparable assets confuses exposure with exposure quality. The vendors that retain scarce IP, reduce execution risk, or attach higher-quality content to custom compute should have a better chance of holding their economics through the buildout. The vendors with thinner positions will need volume, repeatability, or scarcity to support valuation durability. That distinction is the core of the full report.

What full subscribers receive

The full institutional report, including the framework, company sections, scorecards, margin work, and watchlist, written in our voice and structured for buy-side use.
The custom silicon stack map with a layer-by-layer view of where economic value, insourcing risk, and strategic leverage actually sit.
The five-model business framework covering premium full-scope orchestration, flexible compute plus attach, hybrid I/O and modular silicon, back-end implementation, and foundry-adjacent enablement.
Full company sections for Broadcom, Marvell, MediaTek, Alchip, and GUC, including stack ownership, revenue pool exposure, and the central underwriting question for each name.
The comparative framework table covering primary role, scope ownership, main revenue pool, margin quality, insourcing risk, attach leverage, customer concentration, and what investors are underwriting.
Per-company scorecards summarizing strengths, weaknesses, opportunities, threats, and margin tier.
A dedicated margin debate section that traces gross margin tiers across the cohort and explains why hyperscalers will continue to pressure pricing.
The Google case study applied to the broader market transition rather than as a single-customer note.
A risks section covering what would weaken and what would strengthen the framework.
The full What We Are Watching section across the cohort and per name, updated through subsequent notes.

Google’s AI Capex Is Being Measured Against the Wrong Revenue Line

Ben Bajarin — Thu, 23 Apr 2026 15:25:49 GMT

We attended Google Cloud Next this week, sat through the key sessions, and spent time with management, partners, customers, and the broader ecosystem. What follows is our framework for evaluating Google’s business as a whole through the lens of its current AI capital cycle.

The market’s debate around Google has settled around a question that captures the visible part of the investment cycle, but not the full return structure underneath it. Can the company earn an acceptable return on roughly $180 billion of 2026 capex if that spend is measured against cloud AI revenue alone? We think that framing leaves out too much of where the infrastructure is already monetizing.

We think that framing misses the structure of the business Google is actually building (has been building).

The same infrastructure supporting Gemini inside GCP also powers Search, lifts ad yield, expands Workspace, and increasingly sits underneath agentic software and orchestration layers that do not show up neatly in a single “AI revenue” bucket. From there, the more useful question becomes whether Google’s AI stack is generating returns across enough monetization surfaces at once to make the capital cycle more rational than the market currently assumes.

As the evidence has started to broaden beyond management commentary, the debate has become easier to frame. In the fourth quarter of 2025, Google Cloud grew 48% year over year and backlog doubled to $240 billion. Search revenue grew 17%, its fastest rate in roughly four years. Cloud operating margin reached 30.1%. In the first quarter of 2026, agency checks pointed to conversion-rate growth of 13.7%, up from 6.6% a year earlier. We reference these datapoints not in isolation; rather, they are what the flywheel should look like if the same compute base is improving more than one revenue line at once.

The Search debate is also more nuanced than either side is allowing. The bear case has been right about one thing: AI Overviews are reducing click-through rates on informational and educational queries, with third-party work suggesting declines of roughly 20% to 30%. But those were never the core monetization surfaces. The more important point is that the categories where Google actually monetizes, shopping, travel, local services, and other bottom-funnel queries, look much more resilient than the broad disruption narrative implies. Commercial-intent share loss remains unproven, and the more relevant signal so far is improving matching and conversion rather than rising ad density. That makes monetization parity more credible than feared, though still short of fully proven. For a competitive comparison, see our full note on OpenAI an its revenue prospects.

Cloud Next sharpened the infrastructure side of the story. Google launched two eighth-generation TPUs built for different workload classes, with TPU 8t aimed at training and TPU 8i aimed at inference. That split gives a clearer read on how Google sees AI economics evolving, with training and inference increasingly treated as distinct infrastructure problems. As we have argued for some time, training and inference are not the same infrastructure problem. Inference is becoming more constrained by memory, latency, and coordination overhead, which is why Google emphasized BoardFly networking, HBM expansion, and “breaking the memory wall.” In our view, that is the deeper signal from Cloud Next. Google is positioning for the phase of AI where serving models efficiently across Search, Gemini, and enterprise agents matters as much as training them in the first place.

Our report also develops a point the market is still underweighting. Agentic AI is increasingly bringing CPUs and orchestration closer to the center of the stack. Triangulated estimates suggest that as these workloads scale, CPU attach becomes a more meaningful contributor to cloud revenue and margin over time. Axion is still early, but it no longer makes sense to treat it as immaterial.

The full report is narrower than a complete Alphabet sum-of-the-parts review and more expansive than the cloud-only ROIC debate. It is focused on the market’s most important current question: whether Google’s AI capital cycle is earning acceptable returns once those returns are measured across the surfaces the infrastructure actually powers.

What paid subscribers get in the full note:

• The full multi-vector ROIC framework, including the attribution logic behind why cloud-only return math understates the cycle

• The disaggregated Search durability section, including what is proven, what is still management claim, and where the real risk is concentrated

• A full Cloud Next infrastructure analysis on TPU 8t, TPU 8i, BoardFly, goodput, and what the split roadmap says about Google’s workload assumptions

• The Axion and CPU opportunity, sized as an emerging call option rather than a current run-rate business

• Bull, base, and bear valuation frameworks, with the variables most likely to move the debate over the next four quarters

Google’s TPU Strategy Offers a Clearer View of the Next AI Bottleneck

Ben Bajarin — Wed, 22 Apr 2026 16:44:53 GMT

Why Google’s New TPU Strategy Carries More Strategic Weight Than a Routine Silicon Update

This note is not intended to be a chip-versus-chip architectural comparison, nor is it an attempt to rank Google’s latest TPU designs against competing silicon. That is a different exercise, and in many ways a less useful one at this stage. What we are more interested in is the logic behind Google’s design choices, because those choices offer a clearer view into how the company sees AI workloads evolving and where it believes custom silicon creates strategic leverage. In that sense, the announcement is useful less as a scorecard and more as a window into Google’s infrastructure philosophy. What stands out is a company designing around anticipated workload behavior two to three years ahead, and doing so from the vantage point of a tightly integrated stack that spans models, products, networking, systems, and data centers.

Google’s latest TPU announcement makes more sense when viewed through the underlying infrastructure question it is addressing. The market is still trying to frame AI infrastructure around a single curve of larger models, more compute, and more accelerators. Google is signaling a more developed view of where they believe the stack is headed. Training and inference are evolving into distinct infrastructure problems, each with its own bottlenecks, each with its own economic logic, and increasingly each with its own optimal silicon design. That is the larger takeaway from Google introducing two eighth-generation TPUs at once: TPU 8t for training and TPU 8i for inference. In our view, that design split offers a more useful window into Google’s competitive position than the raw performance figures alone.

What stands out to us is how deliberately Google tied the product story to workload behavior. TPU 8t is framed as the training “powerhouse”. TPU 8i is framed as the reasoning engine. Google is emphasizing that these are separately designed chips built for different use cases rather than variations of the same underlying product. That framing carries strategic significance because it reflects an internal judgment that the most valuable AI workloads are diverging in ways a single generalized design would serve less efficiently. Something we agree with, as we see specialization as a continued trend. As model development continues to scale, training still demands immense bandwidth, synchronization, and tightly coordinated compute. As AI products move into reasoning, agents, and real-time enterprise workflows, inference places increasing pressure on latency, memory capacity, and the network path between chips. Google is building for both trajectories at the same time.

The training side of this announcement extends a path Google has been on for years. TPU 8t preserves Google’s emphasis on giant pods and supercomputer-scale coordination while materially lifting the performance envelope of the system. On the slide shown during the presentation, Google compares Ironwood with TPU 8t and shows FP4 exaflops per pod increasing from 42.5 to 121, bidirectional scale-up bandwidth per chip rising from 9.6 to 19.2 Tb/s, and scale-out networking bandwidth moving from 100 to 400 Gb/s. Pod size also increases from 9,216 chips to 9,600. Those figures point to a familiar but still consequential objective. Google wants larger training domains with fewer communication penalties as model complexity rises. The practical outcome is that more of the theoretical compute becomes usable compute, which supports faster iteration on frontier models and a better chance of sustaining performance leadership at large scale.

The inference narrative was helpful because it reveals more about how Google views the next phase of AI deployment. TPU 8i is presented as a dedicated reasoning engine, and the slide language around it is revealing: “Breaking the Memory Wall,” “Accelerated Agent Processing,” “Boardfly Networking,” and “Cost Efficiency.” The points they are emphasizing tell us Google sees inference as a systems problem shaped by memory pressure, coordination overhead, and the need to keep response times low as model behavior becomes more dynamic. Reasoning and agentic workloads place heavier demands on infrastructure because the work extends beyond generating one response and increasingly involves multiple steps, more memory state, and more system coordination. As that becomes more common, the economics of inference depend increasingly on how efficiently the infrastructure handles those behaviors. And these inference economics will be one of, if not the, single biggest factor in ROIC.

The best example of that design philosophy is the network topology Google discussed for TPU 8i. The team explains that previous connectivity favored throughput, which aligned well with moving large quantities of data. Their newer inference design shifts attention toward minimum latency. Google describes a network topology called Boardfly that reduces the diameter of the network, which shortens the distance between chips. That is a technical choice with a direct product-level benefit. Lower network distance supports faster movement of information between compute nodes. Faster information movement supports lower end-to-end latency. Lower latency improves the usability of search, assistants, enterprise agents, and any workflow where responsiveness shapes user experience. The design decision is therefore easier to understand when mapped to the service layer. Google is tuning silicon and interconnect around the characteristics of reasoning workloads rather than simply scaling a training-oriented architecture into inference. Again, emphasizing optimizing the system, or specialized design for specific AI workloads, remains a strategic initiative.

The pod-level figures for TPU 8i reinforce the same point. Google shows pod size increasing from 256 chips on Ironwood to 1,152 on TPU 8i, FP8 exaflops per pod increasing from 1.2 to 11.6, and total HBM capacity per pod moving from roughly 49 TB to 331.8 TB. Those numbers suggest Google expects advanced inference to require substantially more clustered compute and materially more memory attached to the system. The broader implication is that inference capacity is becoming more tightly linked to memory architecture and interconnect design than many investors still assume. When Google references “breaking the memory wall,” it is pointing toward a familiar bottleneck in a new context. Model serving, especially for reasoning systems, becomes constrained not just by arithmetic throughput but by how much model state can be held and moved efficiently across the machine. More on memory in our report below.

That shift has strategic consequences for Google because many of its most important AI surfaces are increasingly inference-heavy. Search, AI Overviews, AI Mode, Gemini, productivity tools, enterprise assistants, and future agent-driven services all depend on serving models quickly and repeatedly at very large scale. Google says directly in the discussion that the value of AI increasingly comes from serving the model. That line deserves attention because it helps explain why a dedicated inference TPU deserves to be viewed as a core infrastructure investment rather than as a side branch of the roadmap. Training establishes capability. Inference determines how broadly that capability can be delivered, how efficiently it can be monetized, and how reliably it can be embedded across the company’s product portfolio.

The presentation also offered a useful reminder that Google is telling an infrastructure story rather than a chip story. The event begins with energy, data centers, cooling, racks, hardware, networking, software, models, and products. The TPUs sit inside that broader architecture. We think that framing is central to how Google wants investors to view its advantage. The company is not trying to win on component performance in isolation. It is trying to show that vertical integration allows it to shape the entire system around the workloads it sees emerging inside DeepMind, Search, YouTube, Ads, Gemini, and Cloud. That creates a design loop that few companies can replicate. When the organization building the models, the products, and the infrastructure are tightly linked, the hardware roadmap can reflect anticipated workload shifts two or three years ahead. Google says this explicitly when it describes the need to predict where these workloads are going before they become obvious externally.

This is where the announcement translates into competitive advantage. Many firms can purchase AI infrastructure. A smaller group can design some of it. An even smaller group can observe frontier workloads inside their own model teams, tune custom silicon for those workloads, deploy it across large internal products, and then expose that same infrastructure through cloud. Google occupies that narrower category. That creates leverage in several forms. It supports better utilization across internal and external demand. It improves the return on custom design investment because the technology can be amortized across many high-value workloads. It also allows the company to learn from production behavior at a scale that most enterprises and many model labs do not see firsthand. Over time, that feedback loop can compound into better system design and stronger service economics.

Another part of the discussion that deserves more attention is reliability. Google spends time on “goodput,” (we detail that in our NVIDIA report), which it defines as actual forward progress in computation rather than theoretical throughput. At the pod sizes and supercomputer scale Google is describing, the system is coordinating thousands or tens of thousands of chips. In that environment, chip failures are part of operating reality. Google explains that once systems reach that scale, the important engineering challenge involves detecting failures quickly, reconfiguring the system, and avoiding silent data corruption that can spread through the broader workload. That is an important window into the company’s infrastructure philosophy. What Google is ultimately delivering to its own services and to cloud customers is a production-scale system built to keep making forward progress at very large scale. Viewed from a strategic lens, it shapes actual cost, deployment efficiency, and service reliability in live environments.

Google suggests, and we agree, that agentic computing will bring general-purpose CPUs back into a more prominent role while specialization continues to deepen across AI infrastructure. That view fits with the broader direction implied by the TPU launch. The infrastructure stack is becoming more heterogeneous. Training remains specialized. Inference is becoming more specialized. Orchestration and agent execution will elevate the role of CPUs (and increase the ratio of CPU to GPUs per our models) and additional system components over time. The implication is that AI data centers are moving toward a more segmented architecture where the economic value comes from balancing specialized compute, memory, networking, and orchestration around specific workload classes. More on the CPU resurgence in our agentic CPU report.

For investors, the key issue is how to interpret this in competitive terms. Google’s latest TPU generation is best evaluated through the economics and performance of the workloads becoming most valuable across the AI stack, and through how much its integrated infrastructure approach improves both. We think this announcement suggests that it does. The split between TPU 8t and TPU 8i shows a company designing around where demand is moving rather than around a generic view of compute growth. It shows an infrastructure organization translating technical insight into service-level outcomes. It also shows a company that is committed to custom silicon as one part of a broader systems advantage spanning networking, software, reliability, and product integration. That combination gives Google a stronger position in AI than the market often credits, particularly when the debate shifts from who can train frontier models to who can deliver them efficiently across products used by billions of people and enterprises alike. The serving of models at scale, the inference era, is upon us.

The larger takeaway is that Google is shaping AI infrastructure around a more mature understanding of the workload mix ahead. TPU 8t extends its training ambitions. TPU 8i brings more explicit architectural focus to reasoning and agentic inference. Together, they reflect a company designing for the next phase of AI deployment rather than reacting to the last one. For a market still looking at AI silicon through a mostly generalized lens, that is a useful signal about where the next layer of competitive separation may emerge.

Full report tomorrow for subscribers on Google growth thesis.

For now, some eye candy.

The Diligence Stack delivers analyst-grade intelligence on the companies reshaping the technology landscape. We analyze the full stack—from silicon to software to business model—to reveal what drives real differentiation. For investors, operators, and technical leaders who need to understand strategy, competitive advantage, and whether the technical foundations beneath them are built to last.

Click below to subscribe.

Foundry Economics in the AI Age

Ben Bajarin — Tue, 21 Apr 2026 15:03:32 GMT

A fresh perspective is needed on foundry economics at the leading edge. For years, the industry lacked a clear cost-per-transistor framework, even as the economic foundation of Moore’s Law was becoming less durable. That framework held through N3E. It no longer holds at N2 and A16, and the evidence is now strong enough to reframe the investment case. More importantly, the break did not begin only at the newest nodes. The engineering cadence of Moore’s Law continued, but the economic logic behind it began to fade years ago. Cost per transistor had already become flat, with only marginal improvement, across leading-edge transitions. What our work now shows is that the curve has moved beyond flattening and into outright inversion. Anyone who has studied the semiconductor industry for any length of time knows this is important because the industry was built not just on scaling density, but on the expectation that each new node would make compute cheaper to manufacture. Once that stopped being true, each new node increasingly needed to be justified by the value of the end workload rather than by automatic cost reduction alone.

Our proprietary front-end cost model shows that the cost to manufacture a billion transistors on a large monolithic AI die rises from roughly ~$5.4 at TSMC’s N5 node into the high-single-digit to low-teens range at A16. We view roughly ~$11 as the middle of a plausible range that holds across reasonable variation in wafer pricing, large-die yield, and effective transistor density. The strongest claim is the slope, not the exact level (given we needed to triangulate a ton of data): across three node generations, the cost curve has moved decisively upward rather than down. That directional shift marks the inversion of the economic engine that powered five decades of semiconductor progress. For most of that period, each shrink reliably lowered cost per transistor, and the industry built its growth model around that assumption. For the AI-exposed portion of the stack, that assumption now needs to be retired.

The inversion starts with the cost of building leading-edge capacity itself. Capital intensity per 50,000 wafers per month of advanced capacity has risen from roughly $16 billion at N5 to an estimated $28 to $30 billion at N2 and A16. Equipment reuse rates, which ran 60 to 70 percent at the last major transition (covered in our WFE report), are collapsing at the newest nodes because the shift to gate-all-around architecture and backside power delivery renders much of the existing installed base obsolete. Yield pressure remains one of the clearest reasons cost is rising at the leading edge. Very large AI chips already lose a meaningful share of output before they ever become finished products. When those chips are then paired with stacked high-bandwidth memory, the economics become even less forgiving because each additional layer creates another opportunity for usable output to fall away. In practical terms, memory dies that look healthy on their own can still produce a much lower effective yield once they are assembled into a full stack, even before packaging losses are included.

What stands out to us is that this cost inflection is arriving at the same moment AI demand is redefining what advanced compute is worth. In a different era, this kind of cost structure may have forced a much broader pause at the leading edge. If not a pause, at least a drastic slowing as each new node would have a hard time being justified. Instead, AI has become the first end market large enough, urgent enough, and economically important enough to absorb it. The procurement framework at the top of the infrastructure stack has evolved from cost per transistor to throughput per watt, tokens per dollar of deployed infrastructure, and revenue capacity per cluster. The relevant question is whether the system generates enough economic return to justify the silicon premium, and for AI training and inference at scale the answer remains clearly affirmative. Visibility into GPU demand extends well into 2027/28 with gross margins maintained in the mid-70s, and high-bandwidth memory supply is essentially spoken for into 2028.

The implication is that cost inflation and value inflation are occurring simultaneously. The leading edge is becoming a premium manufacturing tier rather than a universal cost-down path, and the suppliers that benefit most are the ones converting scarce, expensive silicon into systems valuable enough that buyers do not negotiate on the per-transistor premium. In that sense, AI did not restore the old economics of Moore’s Law. It introduced a new economic rationale for scaling after the original one had already weakened. We view that shift as highly significant for the broader semiconductor industry. Without AI, the loss of economic Moore’s Law would likely have imposed a much harder ceiling on how far and how fast the leading edge could continue to advance. Equipment spending reflects this shift. WFE is projected to reach $125 to $145 billion (some estimates $170+ B!) this year, driven less by the number of new fabs and more by the tool intensity required at each successive node.

Where the value accrues, which end markets can absorb the rising cost, how the memory stack reinforces the same thesis, and what signals would change our view are the subjects of the full report. We develop the foundry TAM and WFE market framework separately in our companion reports on the capacity buildout.

What subscribers get in the full report:

Our proprietary front-end cost model with node-by-node economics from N7 through A16, including cross-foundry comparisons for Intel 18A, Intel 14A, and Samsung SF2
The complete memory cost framework: commodity DRAM, NAND, HBM3e, and HBM4 with yield-layer decomposition and wafer trade ratio analysis
A nine-segment value capture matrix identifying where pricing power, margin expansion, and durability sit across the AI semiconductor supply chain
End-market justification table showing which product categories can absorb rising silicon cost and which cannot
WFE structural consequence analysis: why equipment intensity is rising even as the number of qualifying end markets shrinks
Tiered stock positioning: best expressions, debated names, and where the thesis is crowded or conditional
Five concrete proof points to watch over the next two to three quarters
Full model methodology appendix

Neoclouds and the Three Business Models

Ben Bajarin — Tue, 14 Apr 2026 16:06:43 GMT

Neoclouds sit at the center of the AI infrastructure durability debate because they are increasingly the external channel through which hyperscalers secure additional power and compute to convert demand into revenue. CoreWeave ended 2025 with $66.8 billion in contracted backlog. Nebius signed a deal with Meta worth up to $27 billion. IREN landed a $9.7 billion contract with Microsoft. Core Scientific is locked into $10.2 billion over 12 years from CoreWeave. Cipher Digital secured $9.3 billion across two flagship leases. Together, those deals show hyperscalers are still reaching beyond their own footprints (while still rapidly trying to secure first-party contracts) to secure scarce capacity wherever it can be found. That suggests the buyers with the best visibility continue to view AI demand as durable enough to justify locking up external power and compute today. From there, the more useful question is how that demand maps onto the very different business models now being grouped together under the Neocloud label.

Our view has been that, for some time, the category has been analyzed too broadly. These companies are often discussed as though they are variations of the same model responding to the same opportunity. We do not think that is right. What the market is calling Neocloud increasingly breaks into three distinct businesses with different economics, different risk exposure, and different claims on long-term durability. That distinction is more important now because the constraint that created this category is changing. For the past two years, the defining bottleneck has been access to GPUs, and that remains the case. But we have to equally appreciate the constraint that is deliverable power.

That shift is why simple revenue comparisons can be misleading if they are not tied back to business model. CoreWeave generates roughly $10 to $12 million in annual revenue for every energized megawatt of capacity it operates. Core Scientific generates about $1.4 million per megawatt. At first glance, that spread seems to settle the comparison. In reality, it captures two very different positions in the value chain. CoreWeave owns the GPUs, builds the software stack, manages the orchestration layer, and retains the technology-refresh risk on hardware that some would argue is likely depreciating economically faster than its accounting life suggests. Core Scientific owns the land, power, and cooling, while letting its customer bring the hardware, absorb the obsolescence risk, and pay for long-duration access. The revenue per megawatt is lower, but so is the exposure to the most volatile part of the stack.

We think that distinction is where the category begins to separate. Full-stack AI platforms such as CoreWeave and Nebius capture the most value per megawatt, but they also carry the greatest exposure to hardware refresh, financing, and utilization risk. Bare-metal GPU cloud providers such as IREN monetize owned power and owned hardware with less software differentiation and a different return profile. Then there is a third group that we think should be viewed on its own terms: former bitcoin miners such as Core Scientific, Cipher Digital, and TeraWulf that are increasingly functioning as long-duration infrastructure hosts. These businesses may ultimately prove to have the most durable economics in the group precisely because they sit lower in the stack and avoid much of the technology-refresh cycle that compresses returns higher up. A related question, and an increasingly important one, is which of these models is best positioned to deliver better customer economics as the market shifts from securing capacity to optimizing token production.

The reason this takes on more weight now is that power is becoming the strategic asset that is hardest to replicate on a useful timeline. Switchgear can take a year to procure. Transformers can take more than two. High-voltage substations often run three to five years. ERCOT has reported 137 new large-load submissions totaling roughly 140,000 megawatts of potential demand by 2036. That number is important not simply because it is large, but because it shows how quickly grid demand is compounding against infrastructure that remains slow to permit, slow to procure, and slow to energize. The companies that secured power early, whether through owned sites, brownfield conversions, or interconnection rights, have a strategic advantage that cannot be recreated quickly with capital alone.

At the same time, stakeholders also need to keep sight of who is writing the largest contracts. The same hyperscalers outsourcing capacity today are also the entities most capable of internalizing that demand over time. Microsoft represented 67% of CoreWeave’s 2025 revenue, up from 62% the year before. That is a sign of strong current demand, but it is also a reminder that concentration and renewal dynamics will matter much more once hyperscalers bring more of their own capacity online. That is the real debate from here. Demand is clearly real. The harder question is which business model can convert that demand into durable returns once power, not GPUs, becomes the primary constraint and once the largest customers have more internal options.

The full report explores each of these models in depth. It is available to paid subscribers below.

What the full report covers:

The three-business-model framework: why full-stack AI platforms, bare-metal GPU clouds, and Power Landlords should not be valued on the same basis
Revenue per megawatt by company: triangulated estimates for CoreWeave, Nebius, IREN, Core Scientific, Cipher, and TeraWulf, with margin and capital-intensity context
A near-term deliverability ranking: which companies can actually energize capacity versus which are still in permitting queues
The software-yield test: what separates a defensible platform from scarce-capacity rental, and who passes
Hardware refresh and the duration mismatch: why six-year depreciation schedules on three-to-four-year economic lives create a hidden fault line
The Power Landlord case: contract structures, NOI margins, tenant credit quality, and why lower in the stack may mean cleaner on duration
The hyperscaler paradox: how the same customers funding the buildout could compress the opportunity, and what contract structures protect against it
Valuation framework: current market snapshot plus street-derived analytical frameworks, with the three traps investors should avoid
What changes the view: the specific variables to monitor and how they affect each business model differently

Liquid Cooling: The Thermal Prerequisite for AI Infrastructure Scale

Ben Bajarin — Thu, 09 Apr 2026 14:35:48 GMT

Prerequisite: 800 VDC: The Inflection Point Reshaping Datacenter Power and AI Infrastructure (March 2026). This report builds directly on that analysis and assumes familiarity with the power architecture transition it describes.

The Thesis

Our view is that liquid cooling has become just another in a long line of deployment gating factors for next-generation AI compute. The GPU power roadmap is the biggest driver of this shift. For two decades, server racks ran at 5 to 20 kilowatts, and air conditioning handled the heat without difficulty. A single GB200 NVL72 rack runs at roughly 120 kilowatts, six to twenty times that historical baseline and well past the roughly 40 to 50 kilowatt threshold where air cooling becomes increasingly uneconomic without hybrid or liquid assist. The current GPU generation already operates in territory where liquid is the only viable primary thermal architecture. Vera Rubin pushes further in a fully liquid-cooled, fanless tray design. Our supply-chain checks suggest subsequent platforms push rack power considerably higher still. The gap between GPU thermal requirements and air-cooling capacity widens with every generation.

Most of the market is still treating liquid cooling as a component upgrade, and we think that framing understates the actual constraint. Liquid cooling is the thermal gating factor on how much next-generation compute gets deployed. If the cooling infrastructure is not ready, the GPUs do not get installed. That single dependency changes how the TAM should be sized, how the supply chain should be valued, and which companies hold positions that matter for earnings durability.

What We Found

The economics of liquid cooling at the rack level have shifted in ways that most models have not absorbed. We estimate that cooling content per GB200-class rack now exceeds power content per rack by a factor of roughly two to three, reversing the historical relationship where power infrastructure was the dominant cost line in datacenter builds. The subscriber section breaks down the building blocks of that estimate, from tray-level cold plate content through shared CDU and plumbing infrastructure, and reconciles our full-stack TAM framework against industry equipment forecasts from firms like Dell’Oro. The TAM story is larger than most investors expect, but only when you define the scope correctly.

Beyond the volume ramp, three structural shifts are reshaping the competitive landscape simultaneously. The CDU architecture is moving from sidecar to in-row configurations, changing the plumbing topology of the datacenter and which suppliers benefit. A new technology layer, micro-channel lids, is emerging at the chip-package level and creating a competitive surface that did not exist twelve months ago. And a consolidation wave exceeding $15 billion in disclosed and estimated deal value has swept through the sector as legacy HVAC and electrical infrastructure companies acquire liquid cooling capabilities they cannot build organically in the time the market requires. We detail each of these shifts, the companies positioned at the center of them, and the specific milestones that would change our view in the subscriber section.

The Power-Cooling Convergence

The liquid cooling transition and the 800 VDC transition we analyzed in Part 1 are co-dependent. More efficient power delivery reduces the waste heat that cooling systems must handle. Denser liquid-cooled racks justify the capital investment in high-voltage DC distribution. The AI datacenter of the future is being co-designed around both systems as a single integrated infrastructure, and the companies that can deliver across both power and cooling hold a structural advantage over point-product suppliers. The subscriber section quantifies that convergence and identifies which companies span both layers.

What subscribers get in the full report:

Cooling technology comparison. Air, rear-door heat exchangers, direct-to-chip, and immersion rated across six dimensions including viable rack density, PUE, and GPU roadmap alignment
GPU-roadmap adoption timeline. Three-phase framework from GB200 through Vera Rubin through Rubin Ultra, with density thresholds, CDU architecture shifts, and MCL adoption milestones
Facility-level TCO analysis. Capex per megawatt, operating cost savings, payback periods, and a full rack-level content bridge breaking out tray content, shared CDU/plumbing allocation, and power versus cooling spend per rack
Layered TAM waterfall. Rack-level versus full-stack sizing from 2024 through 2030, with explicit scope notes reconciling our estimates against Dell’Oro and other industry forecasts
Full-stack company scorecard. Nine companies across three tiers (component specialists, system suppliers, infrastructure integrators), with share estimates, margin profiles, key risks, and the specific conditions that would drive estimate revisions for each name
M&A tracker. Eight deals totaling $15 billion+ in disclosed and estimated value, with deal structures, strategic logic, and remaining private acquisition targets
Catalyst tracker. Eight specific milestones we are monitoring, from Vera Rubin volume ramp and MCL qualification timing to CDU lead-time normalization and first major brownfield conversions
Risk framework. Four scenarios that would change our view, from cold plate commoditization and CDU supply bottlenecks to immersion cooling disruption and operational reliability failures, each with specific trigger conditions
800 VDC convergence analysis. How the liquid cooling transition co-depends on the power architecture shift we covered in Part 1, and which companies hold positions that span both layers

Storage Wars: When Memory and Storage Collapse Into One Layer

Ben Bajarin — Tue, 07 Apr 2026 15:07:47 GMT

This report is a companion to our memory report found here:

The market still tends to frame AI infrastructure demand through compute. In that view, storage is a downstream beneficiary. More accelerators ship, more storage gets attached. We think that framing is becoming less useful as inference architectures evolve. Agentic workloads are shifting a larger share of the system burden into orchestration, memory movement, and state management, which means the pressure point increasingly sits in IO and memory coordination rather than raw compute alone. Industry estimates that point to a four-fold increase in CPU core requirements in a fully agentic 1GW datacenter matter less as a precise forecast than as a signal of where the architecture is tightening. More of the system is being consumed by coordination. That is why storage IO is moving toward the center of the inference stack.

The most distinctive way to see that shift is at the level of a single session. A 128K-context interaction on a 70-billion-parameter model can require roughly 167GB of KV cache for one inference sequence. That figure is useful because it breaks the conventional intuition around where the bottleneck should sit. Context memory can outrun premium memory capacity before the system runs out of compute. Once that happens, overflow is no longer an edge case. It becomes an architectural requirement. At hyperscale, where thousands of these sessions are served concurrently, the industry is being pushed toward a tiered hierarchy with HBM handling active cache, DRAM absorbing overflow, and NVMe flash serving persistence and extended context. Flash is moving into the inference loop because the memory system increasingly needs it there.

That same logic is now reshaping the demand base for NAND. For most of flash history, the defining end market was the smartphone. In 2026, datacenter applications are positioned to consume more than half of global NAND output. The significance of that crossover is not simply that demand is rising. It is that the center of gravity in flash is moving toward infrastructure workloads at the same time supplier inventories have fallen to roughly one to two weeks, the leanest level since 2018 by our work, and fulfillment rates for some OEM customers have dropped as low as 20 percent. The enterprise SSD market is projected to reach $51.4 billion by 2027, while fabricators continue reallocating cleanroom space from NAND toward DRAM and HBM. That combination points to a tighter supply environment, firmer pricing, and a market structure increasingly influenced by datacenter requirements rather than consumer replacement cycles.

The physical architecture is changing in parallel. A standard CPU rack historically operated in the 5 to 10 kilowatt range. NVIDIA’s next-generation Vera Rubin NVL72 system pairs compute with an STX storage cabinet that includes 1,152TB of SSD NAND, while rack-scale power moves toward the megawatt range. The important point is not simply that rack density is rising sharply, though it is. It is that storage is being designed directly into the compute system (co-optimized) as a structural layer of the platform. NVIDIA’s SCADA architecture, which provides direct PCIe connections from GPUs to NVMe SSDs without routing that traffic through the CPU, reflects the same design logic. At these densities and throughput requirements, storage placement becomes part of system architecture rather than a downstream deployment choice.

Where this becomes more complicated for stakeholders is in value capture. NVIDIA is pulling the hot storage tier into its own platform architecture through STX, while hyperscalers are expanding proprietary infrastructure in parallel. That leaves the storage opportunity more uneven than the usual assumption that rising AI demand lifts all boats equally. The shift itself looks increasingly clear. The harder question is who captures the economics as storage moves closer to the center of the inference stack. Our full report lays out a five-layer framework for thinking about value capture across that hierarchy, where merchant vendors remain exposed, where captive platforms gain leverage, and which signals would cause us to revise that view.

What the Full Report Covers

The KV cache mechanism in detail, including the specific memory hierarchy architecture in Vera Rubin NVL72 and how SCADA bypasses traditional I/O bottlenecks
NAND supply-demand sizing with bear/base/bull scenario analysis, including our base-case estimate of incremental ICMSP-related NAND demand in 2026, reconciled against the memory report
A five-layer value-capture framework mapping NAND bit supply, SSD controllers, system/software, platform/HCI, and captive/integrated layers with merchant exposure and captive risk assessments for each
The merchant-versus-captive analysis: our assessment of how much incremental storage spend remains available to public-market vendors versus being absorbed into NVIDIA’s STX stack or hyperscaler proprietary fabrics
Developed investment views on Pure Storage and Micron, including financial detail, technical differentiation, margin trajectories, and specific risk factors
Private company tracking on VAST Data, Weka, and Hammerspace as potential public-market or acquisition candidates
A quarterly tracking scoreboard with six indicators and explicit bullish/bearish signals for monitoring the thesis over the next two to three earnings cycles
Four specific falsifiers that would cause us to materially revise the thesis, including CXL displacement risk, hyperscaler storage internalization, NAND supply overshoot, and AI monetization slowdown
Timing analysis with explicit uncertainty bands on ICMSP adoption rates, STX commercialization, and the transition from current to next-generation deployment configurations

The Next Debate in Memory Is Duration, Not Demand

Ben Bajarin — Thu, 02 Apr 2026 15:25:36 GMT

We believe the debate in memory has evolved from whether AI has created a real super cycle (the market has largely accepted that point) to what kind of market memory is becoming, how long the current earnings window can last, and what the first real signals of normalization will look like. Our view is that the market structure has changed materailly. Memory is behaving less like it has historically as commodity component category governed by ordinary cyclical pricing and more like a constrained infrastructure layer inside the AI stack. We believe this is a key distinction, even if that is the reality now vs. what happens years from now, because it changes how investors should think about duration, through-cycle earnings power, and where the next pressure points and leverage emerge.

The first leg of the thesis we outlined in our memory report. AI servers absorb far more memory than traditional server deployments. HBM has absorbed a disproportionate share of wafer allocation and packaging capacity, and our supply-chain work suggests that dynamic becomes more pronounced in 2027 as a larger share of wafers shifts toward HBM production. Conventional DRAM and NAND have been left with less flexible supply just as inventories have fallen and demand has broadened. That alone was enough to create a meaningful upcycle. What stands out to us now is that the shortage has moved well beyond a narrow HBM story. Tightness has spread into conventional DRAM, NAND, and selected older nodes that many in the market had assumed would remain manageable. Once that happens, the discussion shifts from premium AI memory to the structure of the broader market.

Micron’s latest quarter strenghtened that narrative. The company posted fiscal Q2 revenue of $23.9 billion, gross margin of 75%, and guided fiscal Q3 to roughly $33.5 billion of revenue, 81% gross margin, and $19.15 of EPS. Those are extraordinary numbers, but the more important part of the update sits structurally underneath them. Management reiterated that both DRAM and NAND remain constrained beyond 2026, and that customers are still receiving only roughly 50% to two-thirds of medium-term demand. That points to a pricing environment grounded in a real physical shortage of bits, with new supply still arriving too late to meaningfully relieve the market.

The contract discussion matters more now than it did earlier in the cycle. Multi-year agreements, tighter fulfillment, and a greater focus on assured supply all suggest buyer behavior is changing alongside pricing. We think that holds weight because it points to a memory market where allocation and mix control carry more power than they did in prior cycles.

Breadth is the second important shift. HBM remains the visible bottleneck, but it no longer explains the full earnings profile of the group. Conventional DRAM and NAND are contributing more meaningfully to the current setup, which broadens the reset across the stack and supports a wider earnings window than a narrow HBM view would imply.

The third piece is where the pressure shows up first outside the supplier group. Memory scarcity is increasingly showing up as a hardware constraint, with the most price-sensitive parts of the market sitting closer to the point where higher input costs begin to pressure margins and demand. That helps explain why supplier strength and downstream weakness can coexist for longer than many expect.

We also think the variables that could change the view are becoming clearer. The most important ones are contract structure, fulfillment, node mix, storage attach, and the pace at which architectural efficiency begins to matter in the outer-year model.

Our bottom line is that the memory story now needs to be understood through market structure as much as magnitude. Scarcity has broadened across the stack, customer behavior is changing, supply relief still looks late, and the next debate is about durability. For investors, the key question is how much of current memory earnings power belongs in the through-cycle base, and what evidence would signal that normalization is beginning. Our current view is that those signals still sit ahead of us.

What Subscribers Get in the Full Report

A full framework for why the memory market is shifting from ordinary shortage to allocation-led market structure, and why that matters for through-cycle earnings power.
A deeper DRAM section on why the price reset has moved beyond HBM into conventional DRAM and why that changes the stock setup.
A dedicated section on legacy nodes, including why DDR4, LPDDR4, and selected embedded categories have become strategically scarce again.
A more detailed HBM4 discussion focused on qualification timing, node mix, premiums, and how the HBM3E to HBM4 transition shapes blended economics.
A full NAND section on why enterprise SSD, inference context storage, and the broader AI memory hierarchy make NAND more important than the market has been discounting.
A downstream hardware section that separates the most price-inelastic buyers from the most elastic end markets and explains where the memory tax is most likely to break demand first.
A supply section on why rising capex does not translate into near-term relief, with a focus on cleanroom timing, migration constraints, and bit-growth limits.
A competitive read-through on Micron, Samsung, SK hynix, and the categories where stock dispersion is likely to widen.
A “what we are watching” section covering fulfillment rates, multi-year agreements, HBM4 mix, enterprise SSD adoption, and the efficiency debates that could alter duration.
Our bottom-line view on what the market is still discounting incorrectly about scarcity, breadth, and normalization.

800 VDC: The Inflection Point Reshaping Datacenter Power and AI Infrastructure

Ben Bajarin — Tue, 31 Mar 2026 15:23:02 GMT

There is a dynamic that is still underappreciated as we think about AI infrastructure. The future of data center compute topology is unsettled and not set in stone. A lot is chaging, each year, and we expect significant advancements in compute capabilties, at the CPU, GPU, XPU, levels as architectures evovle, chiplet designs diffuse, and we push further into the angstrom era. There are, however, a few constants. AI rack density will be the year over year story, in mutiple dimensions, and is beginning to force a real electrical re-architecture inside the datacenter. The legacy power delivery chain, built around AC distribution and low-voltage conversion for 10 to 20 kilowatt racks, was never designed for the 120 to 300-plus kilowatt densities now showing up in accelerated AI infrastructure. Our view is that 800-volt DC is the likely architectural destination, and NVIDIA’s platform roadmap is the factor most likely to determine the pace of that transition.

The real debate is whether the datacenter can keep absorbing more compute density without changing the underlying power architecture. The efficiency gains from 800 VDC are real, but they are secondary to the structural question of how much rack-level compute the electrical plant can physically support. At these rack power levels, the old system becomes physically and economically strained. Delivering 120 kilowatts at 48 volts requires roughly 2,500 amps. That drives oversized copper, more conversion stages, more waste heat, and more cooling burden. Moving to 800 volts cuts required current by roughly 16 times, reduces conversion losses, shrinks cabling and busbar burden, and improves the share of total site power that can be translated into useful AI work. We think the market is still underappreciating the real issue. It largely treats 800 VDC as a power-efficiency story, when the more important question is what it changes about rack-level architecture, utilization, and ultimately AI infrastructure economics. We think it is better understood as a compute-capacity story.

That distinction becomes more important as the GPU roadmap advances. NVIDIA has already been clear about how architectures are evolving, with legacy 54V architecture giving way to a path that points toward 800 VDC in Kyber-era AI factories. Our supply-chain work suggests the transition likely passes through mixed-voltage Rubin-era configurations before Rubin Ultra (and beyond) pushes the architecture into territory where 800 VDC becomes far less optional. We are careful to separate what is publicly confirmed from what comes from our own work, but the broader point is: future compute performance is increasingly constrained by the facility as much as the silicon. The ability to design more performance into the chip only translates into usable compute if the power and thermal envelope can support that performance at the intended rack density and utilization. That puts power delivery losses, thermal overhead, and facility-level efficiency much closer to the center of the AI infrastructure debate. In other words, the power plant is moving into the critical path of the compute roadmap.

That has implications well beyond utility savings, even though the economics help. If more of a site’s megawatts can be converted into productive compute rather than lost in conversion and cooling, then electrical architecture begins to influence realized throughput, rack density, floor-space efficiency, and deployment speed. The future AI datacenter is likely to be more power-defined than prior generations. We expect more sidecar deployments, more dedicated high-density AI zones, and a clearer separation between AI-native capacity and general-purpose datacenter capacity. Over time, we think this creates a bifurcation in facility design: one class of datacenter optimized around the power and thermal demands of next-generation AI clusters, and another that remains suited to lower-density enterprise and mixed workloads. That is an architectural shift, and it should change how the industry evaluates datacenter capacity from here.

The investment implication is that this transition restructures the bill of materials for datacenter power. It expands the role of SiC and GaN power semiconductors, high-voltage busbars and connectors, rack-level power conversion, and DC-native infrastructure, while reducing the relevance of parts of the legacy low-voltage and multi-stage AC stack. That makes the content escalation material, and it raises the bar for the economic and compute performance benefits we outline below. We estimate power component content per rack increasing roughly 4x from GB200 to Vera Rubin and approximately 11.5x from GB200 to Rubin Ultra across three GPU generations. When content per rack moves by more than 10 times over two hardware generations, the market structure changes with it. That is where we think the opportunity becomes more interesting. The question is which suppliers gain share in a redesigned electrical architecture and which ones lose content as the stack changes underneath them. The AI capex tailwind is real, but the value distribution shifts materially when the underlying power delivery platform restructures.

The timing also looks more front-loaded than many models imply. What we are hearing from the supply chain is that the transition is already underway in targeted AI clusters at the largest operators. Power semiconductor suppliers are qualifying for higher-voltage datacenter applications, PSU vendors are ramping new product lines, and connector vendors are already working through qualification tied to next-generation rack builds. Public datapoints are beginning to reflect the same trend. Infineon is guiding to a sharp ramp in AI-datacenter power revenue, and Vertiv’s backlog and order trends point to a market where infrastructure demand is already tightening. We continue to think the market is earlier in recognizing the earnings implications for the direct beneficiaries than it is in recognizing the AI compute build itself.

Where we still want more evidence is around pace and breadth. We do not know yet how quickly 800 VDC moves from hyperscaler AI islands to broader colocation and retrofit adoption. We are also watching whether 380/400V DC remains good enough for longer than expected, whether SiC cost curves and yields move fast enough to support broader rollout, and when NVIDIA’s future rack requirements are confirmed in a way that removes any remaining ambiguity for operators. Those are the swing variables. But the core point is already obvious: the next phase of AI infrastructure is going to be constrained as much by how effectively operators can convert megawatts into deployed compute as by how many accelerators they can buy. That puts 800 VDC much closer to the center of the investment debate than most investors currently treat it.

The subscriber deep dive that follows covers the full technical, economic, and investment case. It includes:

A map architecture comparison of 800 VDC versus legacy AC, enhanced 48V, and 380/400V DC, covering efficiency, current burden, copper cost, and ecosystem maturity
The three-phase adoption timeline tied to NVIDIA’s GPU roadmap, from hyperscaler pilots through broad colocation adoption, with the specific forcing function at each stage
Payback economics at 100 MW and 1 GW scale, including incremental system cost, energy savings, cooling capex avoidance, and floor space recapture
A TAM waterfall reconciling $30 to 45 billion in total datacenter power, AI-specific component TAM, serviceable retrofit spend (haircut from 42 GW to 8 to 15 GW), and SiC/GaN opportunity
A consensus-gap earnings bridge for five key names, showing where we think sell-side models are underestimating the 800 VDC revenue ramp through 2028
A tiered company scorecard ranking 10 beneficiaries across power semiconductors, server PSUs, datacenter infrastructure, and connectors, with directness of exposure, timing, qualification status, and key risks
A company-specific catalyst tracker with 11 milestones we are monitoring, from NVIDIA architecture disclosures to OCP specification finalization to individual company product launches
An analysis of displaced layers and strategic losers, including legacy PDU manufacturers, low-voltage cabling suppliers, silicon MOSFET/IGBT vendors, and AC-mode UPS providers
Our risk framework covering four scenarios that would change our view, each framed with the specific conditions and thresholds we are watching

TSMC COUPE: Why the CoWoS Pattern Is Repeating in Silicon Photonics

Ben Bajarin — Thu, 26 Mar 2026 15:13:13 GMT

While we understand why, much of the current narrative around the optical market is spent debating when co-packaged optics scales and which active optics vendors benefit first. We think the more important question is who controls the manufacturing platform that the transition depends on. That distinction matters because the connectivity stack is already moving from a secondary infrastructure category toward a first-order AI systems constraint. In our broader work, we have framed connectivity as a market expanding from roughly $25 billion to $35 billion today toward $67 billion to $96 billion by the end of the decade, with high-end transceiver demand accelerating sharply alongside cluster scale. In that context, the foundry that controls the optical manufacturing layer matters more than most investors currently appreciate.

Given multiple discussions with folks at NVIDIA, Broadcom, and Ayar Labs, we felt the foundry discussion around optics was timely, especially around TSMC, in which all three of those companies are customers. TSMC is increasingly positioned to occupy the same role in silicon photonics that it came to occupy in advanced AI packaging. COUPE (Compact Universal Photonics Engine) is a manufacturing platform that combines leading-edge electronic logic, a mature photonics process, and advanced packaging integration in a way competitors cannot yet match at system level. Today, that means a 6nm electronic IC, a 65nm SOI photonic IC, hybrid bonding through SoIC-X, and then a roadmap into CoWoS that lets optical engines move from board-level and substrate-level integration toward eventual interposer-level integration next to the switch ASIC or XPU. TSMC is the only foundry that currently brings all of those pieces together in one stack and under one roof.

TSMC has indicated COUPE can lower power consumption by 40% at the same speed versus conventional micro-bump approaches, or deliver a 170% speed gain at the same power envelope. More broadly, the optical roadmap points to a step down in energy per bit from more than 30 pJ/bit for conventional copper toward under 5 pJ/bit for co-packaged optics on substrate and ultimately below 2 pJ/bit as optical I/O moves onto the interposer. In our Copper to Fiber report, we detail the economics of power savings for optical. It is these kinds of systems-level efficiency gain that changes packaging economics, cluster power budgets, and the competitive position of the foundry, enabling it.

The near-term reality, however, is that this is still an early market. CPO is not yet the cheapest answer. CPO solutions are at least 10% more expensive than comparable optical transceivers, and NVIDIA’s current Quantum-X800 implementation shows why. The switch integrates 72 optical engines and 18 external light sources, and the optical engine alone represents roughly 44% to 45% of the CPO bill of materials. That is a useful reminder that the value is already moving toward semiconductor process content even before CPO becomes the dominant form factor. In other words, the market is paying more upfront for a design that improves power, density, and future scaling.

What makes the foundry discussion interesting is that TSMC is not the only credible player, just the one with the strongest end-state position (with two of the market-leading customers in NVIDIA and Broadcom). GlobalFoundries deserves credit for an earlier start, a real installed base, and a legitimate monolithic integration story through Fotonix. It expects silicon photonics revenue to exceed $1 billion by 2030, and its AMF acquisition is expected to bring more than $75 million of revenue in 2026. Tower is also a key player. The company has meaningful PIC traction, over 50 active silicon photonics customers, a fresh NVIDIA-related 1.6T module engagement, and reported $228 million of silicon photonics revenue in 2025, more than double the prior year. Both companies remain important to the optical value chain. But the comparison shifts once the market moves from standalone optical engines, pluggables, and NPO toward fully integrated CPO. Neither GlobalFoundries nor Tower has TSMC’s combination of fine-node EIC capability and CoWoS-class packaging integration, and that becomes the structural dividing line as optical moves closer to the processor. From our supply chain discussions, whether the customer uses GlobalFoundries or Tower Semi, all roads will still lead back to TSMC, and the attraction of having ALL of your semiconductor manufacturing under one roof is why we believe TSMC will be best positioned over the long-haul.

That is why we think the right way to model this is not through a standalone “CPO revenue line” for TSMC. The direct revenue remains small relative to TSMC’s size for now. The value shows up more indirectly through better advanced packaging utilization, a richer wafer mix across EIC and PIC production, and, most important, strategic leverage. Once customers design optical engines, packaging flows, and qualification cycles around the TSMC stack, the dependency compounds. That is the same pattern investors saw with CoWoS before the financial impact became obvious in reported numbers. Our argument is that silicon photonics is beginning to follow a similar path.

What’s in the full report

A detailed breakdown of how COUPE works, why heterogeneous integration is winning, and why node disparity between EIC and PIC is structural rather than temporary.
A fair comparison of TSMC, GlobalFoundries, and Tower Semi, including where each still matters in pluggables, PICs, NPO, and CPO.
A buy-side framework for underwriting the thesis in the model, including where the economics show up even before direct CPO revenue becomes material.
A 12–18 month monitoring checklist covering qualification milestones, substrate-to-interposer migration, and the proof points that would strengthen or weaken the thesis.
The real bear case: why the bigger risk may be a longer bridge through NPO and detachable optical architectures, rather than outright displacement of TSMC.
Ecosystem evidence from NVIDIA, Broadcom, and Ayar Labs showing where product gravity is already forming in the optical stack.

Secret Agent CPU

Ben Bajarin — Tue, 24 Mar 2026 18:32:21 GMT

The Thesis in 60 Seconds

We believe the shift from monolithic LLM inference to multi-step agentic workflows structurally changes the compute mix inside datacenters. Training-era architectures assumed GPUs would dominate every phase of inference. Agentic workloads have challenged that assumption. When an agent calls a tool, queries a database, waits for human approval, or orchestrates sub-agents, the GPU sits idle while the CPU does the work. Our model estimates put that idle window at ~12 to ~22 percent of total inference time, and it scales with agent complexity. When you look at datacenter CPUs today, we position them as “cloud native,” meaning built to run cloud/web software. We believe a new class of CPU, one that is agentic native, will grow the CPU market even more than most forecasts. We position agent native CPUs, as a dedicated CPU tier, specifically architected for agentic workflows, alongside the GPU cluster, which recovers that idle capacity and, at modest allocations, improves both throughput and cost per token.

Early estimates put a CPU rack cost at roughly $300K ($500k most bullish modeling) fully loaded in a base configuration. A GPU rack costs roughly $4+M. The power draw runs about 7 to 1. At a 5 percent CPU power allocation in a 1GW facility, our model shows a ~2 percent increase in effective token throughput and a ~3.7 percent decrease in cost per token, with only a ~1.7 percent increase in total capex. The breakeven sits at roughly 10 percent allocation, beyond which token throughput declines faster than cost savings accrue. The sweet spot is narrow but real, and it scales with facility size.

The Context

How fast things change in AI land. If we were talking a year ago, the current thesis would have been that data center CPUs are being commoditized, while the GPU was absorbing the vast amount of the value layer. While the GPU/XPU racks will still command the largest dollar share, the relevance of the CPU has come full circle as the compute layer shifts to agentic workloads. As inference evolves from prompt-response interactions toward agentic systems that retrieve information, call tools, manage state, take actions, and coordinate multi-step workflows, the bottleneck begins to move. In that environment, the limiting factor is less often raw model throughput in isolation and more often the system’s ability to orchestrate work around the model. That has direct implications for how future AI datacenters should be architected and, by extension, how stakeholders should think about where incremental infrastructure dollars will be allocated.

Our view is that training-era CPU-to-GPU assumptions have been challenged in light of agentic inference. The older framework treated the CPU largely as a control layer attached to a GPU-heavy architecture, essentially just a basic head node CPU. That may have been sufficient when the dominant job was training large models or serving relatively simple inference requests (turn-based LLM). That compute cycle becomes challenged when the workload shifts to reasoning models. In agentic environments, each model step can trigger retrieval, reranking, serialization, memory lookup, policy checks, browser or application interaction, logging, compliance handling, computer use, and workflow management. All of that consumes CPU cycles, memory capacity, and system-level coordination, often at a rate that is out of proportion to how the market currently thinks about CPU as a percentage of AI infrastructure.

Agentic workloads introduce latency and utilization penalties that do not show up in conventional compute framing. GPUs are highly efficient when fed continuously. They become materially less efficient when they are forced into intermittent work patterns because the orchestration layer cannot keep up. Said differently, if the GPU is waiting on the CPU tier to prepare the next step, the most expensive part of the cluster is underutilized because the cheaper part of the cluster is underprovisioned. That is a poor trade even before we get to the question of power and capex allocation. We think this is where much of the current market framework is still lagging the workload transition.

This is also why we believe the CPU side of AI infrastructure is being underappreciated in both architecture planning and market modeling. This thesis is not that GPU demand weakens. In fact, the opposite is likely true. A better-provisioned CPU orchestration layer can improve effective GPU utilization and increase the economic output of the GPU fleet already being deployed. That is an important distinction in a quickly changing market. The CPU does not compete with the GPU in this framework. It raises the return on the GPU asset by reducing idle time and smoothing workflow execution. In practical terms, that means AI infrastructure can become more CPU-intensive even as the center of gravity remains decisively GPU-led.

CPU Architectures: Cloud Native and Agent Native

Our work suggests that this ratio shift is fundamentally underappreciated. The training-era world looked closer to a 1:4 CPU-to-GPU/XPU resource relationship in AI deployments. Agentic inference moves toward 1:1, and in some forms of enterprise workflow automation, can push the requirement higher still. That should not be confused with a claim that power or capex allocation becomes balanced in the same way. CPU server-equivalents consume a fraction of the power and cost of GPU racks. That is precisely why the economics, as a factor of true TCO, matter. In our datacenter modeling, even a relatively modest dedicated CPU tier can improve overall system economics by lifting effective throughput and reducing cost per unit of useful work. We find it useful to separate this CPU demand into two layers: cloud-native CPUs, meaning the existing fleet that is being repurposed or reconfigured to handle agentic orchestration tasks, and agent-native CPUs, meaning greenfield racks purpose-built for agentic workloads from the ground up. NVIDIA's progression from Grace to Vera dedicated CPU racks, and now Arm's AGI CPU with 136 cores at 300 watts, a dedicated core per thread, and rack densities above 45,000 cores in liquid-cooled configurations, represents a new class of agent-native silicon purpose-built for orchestration-heavy workloads rather than adapted from existing cloud infrastructure. The market, in our view, still tends to assume that the best answer is to push nearly every available watt into GPUs. That may remain directionally right for training clusters. It is unlikely to be right for agentic serving environments where coordination overhead is a critical part of the workload.

While we remain convinced the datacenter CPU TAM will experience growth not seen in years, questions remain. The cloud-native installed base will absorb some of this demand, which is why our model applies a shared infrastructure discount that narrows over time as agent-native deployments scale. The exact pace of this transition will vary by workload mix, model architecture, enterprise adoption patterns, specific customer needs, and how quickly AI systems move from answer engines to execution engines. Not every inference workload is equally agentic, and not every agentic workload is equally CPU-constrained. We also need to keep monitoring what hyperscalers actually optimize for in production, because architecture decisions in the field have a way of clarifying debates faster than conference slides do. What would change our view is evidence that orchestration overhead compresses materially as model and serving stacks mature, or that agentic software patterns prove narrower than current adoption signals suggest. For now, what stands out to us is that the software stack is moving in the opposite direction. It is becoming more stateful, more tool-heavy, and more operational.

The market broadly understands the GPU buildout. We are less convinced it fully appreciates the CPU infrastructure required to make agentic AI economically work at scale. As AI shifts from generating answers to completing work, the orchestration layer becomes more consequential, and the infrastructure conversation broadens with it. That does not reduce the importance of GPUs. It changes what must be attached to them, what improves their utilization, and which parts of the stack are quietly becoming more valuable than consensus currently reflects.

What subscribers get in the full report

The full architecture case for why training-era CPU-to-GPU assumptions break down as workloads move from chat and copilots toward agentic inference and enterprise automation.
Our framework for the main CPU demand drivers, including multi-agent fan-out, retrieval and reranking, state management, policy and compliance logic, software-operation agents, and workflow coordination.
A detailed 1GW datacenter stress test showing how a modest CPU power allocation can improve effective throughput and lower cost per unit of useful work.
The installed-base argument for why existing enterprise CPU infrastructure becomes a force multiplier for future AI deployment rather than a displaced legacy layer.
Updated datacenter CPU forecast, with TAM expansion from agentic CPUs
The monitoring framework we would use to validate, refine, or challenge the thesis as real-world deployment data comes in.
The strategic implications across hyperscaler silicon, merchant CPUs, memory, networking, and the broader attach opportunity created by agentic serving.

NVIDIA Is Expanding the AI Factory and Narrowing the Competitive Window

Ben Bajarin — Thu, 19 Mar 2026 16:37:26 GMT

Executive takeaway. GTC 2026 reinforced our view that NVIDIA is shifting the competitive field away from performance and toward the economics (TCO) of the AI factory as a whole. We believe the decision is actively shifting to cost per token, tokens per second per watt, deployment speed, utilization, and revenue per megawatt, thus the moat widens materially as the advantages compound across the full stack rather than resting on any single component. Vera CPU racks and Groq LPX racks sit inside that framework. They expand NVIDIA’s direct addressable market, but the more important contribution is that they occupy the adjacencies competitors had hoped to use as entry points. When we look at the post-GTC story in its entirety, we see something broader than faster silicon. We see a company that is systematically extending its control of the AI factory, and the competitive attack surface is getting narrower with each product cycle.

NVIDIA disclosed visibility into more than one trillion dollars of Blackwell and Rubin purchase orders through 2027, and that figure covers only GPU systems, and likely associated networking but not all networking. It does not the Groq LPX low-latency inference racks, standalone Vera CPU systems (Jensen said this could be multi billion business), storage, or the enterprise RTX data center business, all of which are incremental. Our work suggests the total revenue opportunity through 2027 sits meaningfully above current consensus, though the exact magnitude depends on how quickly the newer product categories ramp and how broadly customers adopt the full five-rack factory architecture. This maps to one of our key questions, and research topics: what is the attach rate of Vera CPU racks and Groq racks to Vera Rubin super pods and in what ratio.

In line with the NVIDIA’s theme of “extreme co-optimization,” GTC was about the introduction of a complete five-rack AI factory design. NVIDIA now ships five distinct rack types, all liquid-cooled and sharing common power and cooling assumptions: the Vera Rubin NVL72 GPU rack, the Vera CPU rack, the Groq 3 LPX rack, the BlueField-4 STX storage rack, and the Spectrum-6 SPX Ethernet rack. Ratio/attach rate aside, when you look at the reference superpod, we shouldn’t focus on only the sheer compute density—more than 1,100 Rubin GPUs, over 2,500 LPUs, and roughly 1,400 Vera CPUs in a single deployment—but how deliberately that compute is organized as one system. What this shows is how NVIDIA’s roadmap is evolving into a more tightly integrated system, where the rack “whole compute system is the unit of compute.

We don’t view the strategy here purely for the incremental revenue from new categories, but the way NVIDIA is turning adjacent layers of the stack into extensions of the broader platform. The portfolio expansion is not portfolio expansion for its own sake. It is that these layers become more valuable when designed to work together, improving the performance and economics of the overall system while reducing the number of viable entry points for competitors. What were once potential wedges for alternative vendors increasingly become integrated parts of the NVIDIA architecture, and that materially raises the bar for competition.

The full report goes significantly deeper on all of these dimensions, including specific revenue estimates by product category, the monetization framework for LPX and Vera, a detailed competitive map, the optical transition sequence, infrastructure bottleneck analysis, and what would need to go wrong for the thesis to weaken.

What subscribers get in the full report

• Revenue bridge by bucket for LPX, Vera CPU, storage, and networking through 2027, with assumptions and confidence intervals for each category

• LPX monetization framework mapping three value creation paths (ARPU expansion, utilization uplift, workflow throughput) to specific buyer segments and workloads

• Vera CPU attach economics including our estimate of CPU-resident compute share in agentic workloads and why customers prefer NVIDIA-supplied CPU over mixed-vendor alternatives

• Competitive map with three explicit buckets: where hyperscaler ASICs still make sense, where merchant GPU competitors remain relevant, and where NVIDIA is raising the bar with platform integration

• Software TCO analysis tying CUDA, Dynamo, Spectrum-X800, BlueField STX, DSX Air, and NemoClaw into measurable deployment economics

• Optical transition sequencing with supply chain estimates for optical engine demand, OCS revenue trajectory, and the path to the $80-100B AI optical TAM by 2030

• Five detailed watchpoints with risk scenarios explaining what a negative outcome on each variable would mean for the broader thesis

• Stock catalyst framework identifying the three proof points that move NVIDIA from a strong-quarter story to a durable-platform story, with near-term datapoints to track

• Full superpod architecture breakdown with GPU, LPU, and CPU counts per pod, generational performance comparisons, and factory output metrics per gigawatt

OpenAI: Three Engines, One Platform

Ben Bajarin — Mon, 16 Mar 2026 15:52:59 GMT

Similar to the exercise we did on Anthropic, we wanted to do a similar deep dive on OpenAI. In our view, both companies have meaningful overlap but also important differences in technology, strategy, and market positioning. The exercise for both was a hypothetical: what would an initiation note look like on these companies, given neither is public yet? We focused on the business and technology fundamentals and the path to revenue growth using data we could find, or was able to triangulate from numerous sources.

The Scale

When it comes to OpenAIs scale, there is perhaps no single technlogy (experience?) that has diffused faster than ChatGPT. 900 million weekly active users. 50 million paid consumer subscribers. 9 million-plus business users. 7 million workplace seats. 4 million developers building on the platform. Revenue that went from 2 billion dollars in 2023 to 6 billion in 2024 to approximately ~15-20 billion in 2025 recognized revenue. An annualized run rate likely to be in the range of 20-30 billion by year end. And a fresh 110 billion dollar raise at an 840 billion dollar post-money valuation, with management guiding to 280 billion dollars-plus in revenue by 2030.

We think it is interesting how early OpenAI has architected for five parallel monetization vectors: consumer subscriptions, enterprise seats, developer APIs, advertising, and infrastructure. Not all five are at scale yet, advertising in particular is still nascent, but all five are in market and generating signal within three years of meaningful commercial launch. Most technology companies take a decade or more to reach this breadth.

The Shape of the Numbers

OpenAI has 50 million paid subscribers out of 900 million weekly active users. That is a mid-single-digit paid conversion rate. For context, Spotify runs roughly 44 percent paid conversion. Netflix is essentially 100 percent since the free tier went away. Even at the low end of consumer internet, most mature platforms convert 5 to 10 percent of actives into paying users over time. OpenAI is at roughly 5.5 percent today with a product that is still rapidly improving and expanding use cases every month.

If paid conversion moves to 8 to 10 percent against a growing WAU base, the subscription revenue opportunity alone could reach 15 to 20 billion dollars annually before you layer on enterprise, API, or advertising revenue. The current subscription run rate is already approximately 10 billion dollars, and what makes this particularly compelling is that the consumer story is the smallest of the three engines we underwrite. On subscriptions, we think the argument that most consumers won’t be enticed by a subscription is fair if the only valuable use case is essentailly “search.” But that is not what we believe AI evolves into, particularly in the age of agentic AI. While we do think most mainstream consumers will be free users, montized by ads (very well), we also think a much higher percent of consumers will subscribe to at least some tier given the broader value AI will bring.

The Enterprise Signal

The enterprise data is where we spent the most time and where the thesis becomes most differentiated. Business users grew from 150,000 in early 2024 to more than 9 million by February 2026. CIO surveys from late 2025 show more than half of CIOs naming OpenAI as their primary large language model, roughly double the nearest competitor. More than 90 percent report net spending increases for AI, and the vast majority are funding it with new budget rather than cannibalizing existing IT spend.

The reality is that most enterprises are still early. Our checks suggest most organizations are somewhere between crawl and walk in their AI deployment journey. Agents for complex use cases could take two to five years to reach production. This is a positive for the thesis because it means the revenue ramp has a very long runway. The current trajectory is the beginning of a multi-year expansion cycle, and enterprise will become the largest revenue contributor by 2030 by a wide margin.

The Margin Story Most People Miss

There is a widespread narrative that OpenAI is a money-losing company that needs to grow into profitability. The reality is more nuanced. Compute margins, which measure inference-specific unit economics, expanded from 35 percent in January 2024 to 70 percent by October 2025. Inference is already profitable at the model layer. Company-level losses arise because training the next generation of frontier models is funded from the same compute budget during a period of exponential scale-up. Adjusted gross margin, which includes training cost amortization, sits in the low-to-mid 30s percent range today and is projected to expand toward 58 to 65 percent by 2028 to 2029. The profitability framing is better understood as a demand forecasting problem rather than a structural question. That distinction matters for valuation.

The 850 Million (and growing) Free Users

If OpenAI has 900 million WAU and 50 million paid subscribers, that leaves roughly 850 million free-tier users generating no subscription revenue today. At even 2 to 3 dollars per user annually in advertising revenue, that free tier represents 1.7 to 2.6 billion dollars in annual ad revenue from a user base that is already engaged and growing. OpenAI’s initial CPM rates are running roughly three times the digital advertising industry average, suggesting premium pricing power from the start. We are intentionally leaving advertising out of our base case because the thesis works without it, but the scale of the free-tier opportunity is hard to ignore as optionality.

The Valuation Math

At 840 billion dollars post-money on approximately 13 billion in 2025 recognized revenue, OpenAI is valued at roughly 65x current revenue. By traditional metrics, this looks expensive. Against 280 billion dollars-plus in 2030 management guidance, it compresses to roughly 3x. The question we spent the most time on was whether that 2030 figure is achievable. After triangulating across company disclosures, enterprise survey data, developer ecosystem growth, and our proprietary consumer model, we believe it is. We also believe OpenAI requires a multi-tier valuation framework rather than a single revenue multiple, because SaaS subscription revenue, consumption-based API revenue, and infrastructure economics carry fundamentally different margin profiles and comparable sets.

How This Compares to Our Anthropic Work

We published a separate institutional thesis on Anthropic, and the two analyses are worth reading together because they illuminate different paths through the same market. Both companies operate in what the framing as a Cournot oligopoly, a three-to-four player frontier market with high barriers and positive margins in equilibrium. Both face the same underlying inference economics where profitability is a demand forecasting problem rather than a structural threshold. And both benefit from the same structural tailwind: enterprise AI budgets that are overwhelmingly incremental rather than cannibalistic of existing IT spend.

Where they diverge, at least for now, is on the consumer front. OpenAI is a consumer-first platform that is pulling enterprise behind it. A large percent of current revenue comes from individual subscriptions, and the 900 million WAU base creates a lead generation funnel for workplace deployment that no competitor can replicate. We also believe OpenAI has some structural first mover advantages in the consumer market that are not fully appreciated. Anthropic is enterprise-first and developer-first, with roughly 30 to 35 percent of revenue from enterprise and a developer tool in Claude Code that is already generating meaningful run-rate revenue on its own. OpenAI monetizes breadth across five simultaneous vectors. Anthropic monetizes depth with a tighter product surface and a multi-rail compute strategy that gives it infrastructure optionality.

The margin trajectories also differ in composition even if they converge in destination. OpenAI’s compute margins have expanded faster on a larger revenue base, but Anthropic’s gross margin bridge from roughly 50 percent today toward 58 to 77 percent by 2028 reflects a different cost structure with less consumer infrastructure overhead. We think both companies reach healthy margins by decade end, but through different paths, and that distinction matters for how you build a position in the category.

The broader point is that this is not a winner-take-all market. We believe both OpenAI and Anthropic can sustain leading positions in a durable oligopoly, and that the analytical vocabulary we developed across both reports, the intelligence utility classification, the three-tiered valuation decomposition, the Cournot equilibrium framing, applies to the category as a whole rather than to any single name. Subscribers to the Dilligence Stack have access to both theses and the framework connecting them.

What is in the full report

Our three-engine thesis framework (consumer monetization, enterprise and API, inference economics) with detailed analysis of each
A proprietary consumer revenue model projecting subscription and advertising revenue through 2040
Denominator bridge explaining the relationship between WAU, modeled consumer actives, ad-eligible DAU, and paid subscribers
Full metric taxonomy separating recognized revenue, annualized run rate, adjusted gross margin, and compute margin with precise definitions
Revenue segment bridge table decomposing estimated revenue by segment across 2025, 2027, and 2030 with source tags on every line
Intelligence utility valuation framework with three-tiered comparable set analysis and applied multiples
Consumer growth cycle analysis with historical parallels to Google and Meta platform economics
Enterprise adoption analysis informed by CIO surveys and channel checks across 25-plus organizations
Detailed analysis of upside optionality in advertising, hardware, and third-party infrastructure
“What Must Be True” milestone framework for 2027 and 2030
“What Would Make Us Wrong” section covering model commoditization, enterprise deceleration, engagement plateau, compute cost, and competitive share loss scenarios
Cournot oligopoly competitive structure analysis

Masters of the Supply Chain

Ben Bajarin — Thu, 12 Mar 2026 15:59:26 GMT

To quote NVIDIA’s CEO Jensen Huang, from a QA he participated in at the Morgan Stanley TMT conference, “I love constraints. I love constraints. And the reason for that is because in a world of constraint, you have no choice but to choose the best. You can't squander your choice.”

As a series of reports we have written have explored, we are in the midst of the greatest supply chain constraints the semiconductor industy have collectively seen. Supply chains that were architected for a different era of compute demand are now running into constraints that cannot be resolved by capital alone, at least not quickly. It is our view that what is happening across every vector, but for this report specifically, advanced packaging, leading-edge logic, high-bandwidth memory, and substrates is not a temporary dislocation but a structural reorganization of how value is captured across the semiconductor stack, one that will persist through at least 2027 and likely well beyond. As we are fond of saying, the semiconductor industry has forever been changed, and the competitive edge will flow to those who are the masters of the supply chain.

The core dynamic is this: in a deeply constrained supply environment, scale matters not as an end in itself, but because it buys pre-commitment to scarce capacity, qualification priority at every binding chokepoint, and supply assurance that cannot be obtained by capital alone once the shortage is already underway. Companies with the financial commitments already in place, the supplier relationships already locked in, and the full-stack execution capability to navigate simultaneous shortages at every layer of the stack are compounding their advantages faster than the market appreciates. The gap between the companies that control supply and those who are competing for what remains is not narrowing; it is widening by the quarter.

The numbers that have come out of our research stand out for a few reasons. Firstly, the concentration of the most constrained packaging resource in the AI infrastructure build-out is more extreme than most market participants realize. TSMC is guiding toward 140,000 CoWoS wafers per month by end-2026 as an exit-rate target, and that capacity ramps through the year, meaning effective full-year output is considerably lower. NVIDIA has secured approximately 650,000 wafers against that ramp, representing roughly half of the total projected 2026 CoWoS production on an average basis and leaving every other high-complexity accelerator program competing for the remainder of an already undersupplied market. Adding to this insight, we have been told via supply chain checks that most of TSMC’s continual upward revisions to ramp CoWoS are being secured by NVDIA. The second dynamic worth appreciating is what this environment has done to memory economics.

Producing a single wafer of HBM consumes the equivalent factory capacity of three wafers of standard server RAM, and that constraint has translated directly into operating margins that would have seemed implausible for a memory company two years ago. The reality is that the suppliers positioned at the binding constraints of this build-out (and around the periphery of the ecosystem) are experiencing a generational windfall, and the conditions that created it are not going away.

We are intentionally framing this analysis around supply chain control rather than AI demand exposure more broadly, because we believe the latter framing is where most of the market remains anchored and where the analysis is least differentiated. Pure demand exposure to AI is no longer a sufficient thesis for generating alpha in this sector. What matters now is identifying which specific bottlenecks remain binding and which companies are structurally positioned at those chokepoints. That is a much smaller and more precisely defined set of names than the broad AI basket, and it requires a different analytical framework than most sector research currently applies.

There is also a consumer electronics dimension to this story that we have spent time working through, and it does not fit the simple winner-or-loser framing. The dominant consumer electronics platform is navigating this environment from a position of relative strength, absorbing component cost increases that are significant by any historical measure, securing supply that competitors cannot obtain at any price, and reporting blended corporate gross margins above 48 percent through it all. That thesis is structurally different from the infrastructure layer, for a few reasons we detail carefully in the full report, and it carries qualifications that matter for institutional positioning. But it belongs in this framework. And in case it wasn’t obvious, this company is Apple.

The flip side of the supply chain master's thesis is equally important to understand. Tier-2 foundries are running at utilization rates that make their depreciation burdens increasingly painful to carry. Commodity hardware OEMs are trapped between suppliers with complete pricing power and consumers who will not absorb even modest retail price increases. Smaller AI chip designers are discovering that announced deployment timelines are aspirational targets, not commitments, in an environment where packaging allocation is controlled by the largest customer in the market. The crowding-out dynamics that benefit the masters of this supply chain actively penalize everyone below them in the stack.

What’s in the Full Report

The advanced packaging chokepoint — why a persistent 25–30% CoWoS supply deficit through 2027 structurally advantages one company above all others, and what that means for every other accelerator program competing for the same capacity
The HBM supercycle mechanics — the manufacturing physics that deleted 30% of global standard server RAM capacity and created the conditions for record memory operating margins, with a detailed look at where the challengers stand in the catch-up race
TSMC’s pricing power transition — how the world’s most important foundry moved from cost-plus to value-added pricing, and what the 85% wafer price premium between nodes means for customer economics and margin distribution across the stack
The custom silicon integration moat — why full-stack ASIC capability commands a record $162 billion consolidated backlog, and why the financial barrier to entry for competing programs keeps the incumbent firmly in position
Consumer electronics: adjacent beneficiary — the case for why one consumer hardware platform is navigating the memory supercycle from a position of relative strength, the margin hedge that makes the math work, and the geopolitical concentration risk that distinguishes it from the infrastructure pure-plays
The vulnerable — a detailed breakdown of tier-2 foundries, commodity OEMs, and AI chip startups, and why the dynamics that benefit the supply chain masters actively penalize this group
Positioning framework — a four-tier positioning structure that distinguishes infrastructure chokepoint masters, consumer electronics scale leaders, names requiring supplemental research, and outright avoidance candidates

Apple: The Full Stack Compounder

Ben Bajarin — Tue, 10 Mar 2026 17:43:56 GMT

While we don’t always favor a sum-of-the-parts analysis and framing for every company, we think it applies perhaps most relevantly to Apple. The deep ecosystem experience is a byproduct of a product excellence that hooks customers, keeps them loyal, and naturally extends their value to that ecosystem over time. The key upside driver for Apple has always been getting first-time customers onto the platform, and we believe Apple is now as well-positioned as it has ever been to grow its base at a faster rate and benefit from the flywheel of its ecosystem. That flywheel is the connective thread of this report: product excellence (consistent brand experience) drives installed base growth, which fuels services monetization, which deepens hardware lock-in, which generates the free cash flow that funds the next turn of the wheel. It is now accelerating across five dimensions simultaneously, and the compounding effects are what make Apple’s forward trajectory more durable than any single product cycle.

A key validation of Apple is the health of their business. Apple consistenly ranks in the top three of our financial health scorecard and the numbers to date speak to that health. Apple reported record total company revenue of $143.8 billion in Q1 FY2026 (the December quarter), up 16% year-over-year, with Products at $113.7 billion and iPhone surging 23% to $85.3 billion. But the number that matters more for the long thesis is the services line, which hit $30 billion in a single quarter, operating at a 76.5% gross margin, building on $109.2 billion in full-year FY2025 Services revenue. That margin is nearly double the roughly 36 to 40% on hardware, and as services grow from roughly 26% of total revenue toward 30% and beyond, Apple’s blended corporate gross margin trends higher over time. Consolidated gross margins have already reached 48.2%, and our research points toward 50%+ over the medium term, which represents not a cyclical improvement but a secular mix shift. All of this from a company, who for the better part of a decade had a consensus view point that they were only one low-cost competitor away from disruption. Who is now, potentially on the cusp of disrupting those said low cost competitors.

A key thesis in our Apple growth framework has always been a more affordable entry point to the Mac. Our consumer research has consistently validated this, showing high interest and consideration for Mac but with price remaining the primary barrier. Enter the MacBook Neo, which at $599 ($499 for education) targets the estimated 80 to 90 million consumer and education PCs shipping annually below $800, a segment of the market where Apple has had no meaningful presence. Our proprietary TAM model projects a base case of 8 to 13 million Neo units annually by 2028, generating $5 to $7 billion in incremental annual revenue. What makes this strategic is that roughly 3-3.5 million of those 2026 units are pull-in buyers entering the Apple ecosystem for the first time, each becoming a new endpoint in the flywheel.

The third dimension is geographic expansion, and India has long been pegged as one of the next growth vectors for Apple. Apple’s iPhone unit share in India has roughly doubled since 2022, with our estimates placing India-specific revenue at $8 to $10 billion and growing at a 25 to 30% CAGR, faster than any other major geography. The iPhone installed base in India is still under 60 million in a market of 500 million middle-class consumers, with service attach rates well below global averages, which means the monetization runway is substantial. Apple has committed structural capital to the region, with 20%+ of global iPhone production now coming from India, the supplier base having tripled, and six company-owned retail stores open. A sub-$450 entry iPhone alone could expand Apple’s addressable smartphone TAM in India by 15 to 20 million units annually, with the $599 Neo opening a separate PC opportunity in a 14 to 16 million unit market where Apple holds less than 5% share.

The fourth dimension is AI hardware, where Apple is entering the wearable AI category across three form factors: smart glasses, AI companion wearables, and smart home devices, targeting a combined 40 to 50 million units and $20 to $25 billion in hardware revenue by the end of the decade. The global smart glasses market alone is growing above 35% annually to an estimated 65 to 80 million units by 2030, and Apple’s vertical integration in silicon, software, and privacy architecture gives it a structural advantage in the most power- and latency-constrained computing form factors. Layer in the health wearables roadmap potential (rumored), including blood pressure, blood glucose, and hearing aids, and the wearables franchise could become Apple’s most significant new hardware growth vector since Apple Watch.

The fifth dimension, and the most underpriced, is Apple’s structural positioning for agentic AI at the edge. The consensus narrative that Apple is “behind” in AI fundamentally miscategorizes the competitive dynamic (and we believe doesn’t understand the impact agentic will have on edge devices), because Apple is not competing in the model layer but rather in the execution with the silicon and software infrastructure on which models run at the point of user interaction. With 2.5 billion active devices growing by an estimated 150-180 million per year, Apple is quietly assembling what may be the most extensive edge inference fleet in the industry, funded by a business that generated $98.8 billion in company-wide free cash flow in FY2025.

The reality is that the flywheel is not slowing down. It is finding new areas on which to compound. The Neo opens a TAM the Mac has never addressed. India is delivering 25–30% revenue growth as Apple’s leaky bucket thesis converts competing platform users at scale. AI companion hardware opens a $20–25 billion product category. And the services business continues its transformation from growth driver to margin engine. Each vector reinforces the others, and the full thesis is detailed in the subscriber report.

What’s in the Full Report

Services as Lead Argument: Why $109B+ in FY2025 Services revenue at 76.5% gross margin is the real re-rating catalyst, and how Apple Creator Studio ($12.99/mo) is the first proof point for AI software monetization.
Complete MacBook Neo TAM Model: Full scenario analysis with gross vs. net revenue sensitivity, pull-up vs. pull-down unit splits, vendor displacement by OEM, and projections through 2028.
The Leaky Bucket, India and Platform Switching: Our proprietary framework for why Apple’s 90%+ retention rate and aggressive entry pricing are converting competing platform users at record rates. Includes deep-dive India analysis: $8–10B in estimated revenue growing at 25–30% CAGR, installed base under 60M in a 500M addressable market, and how a sub-$450 iPhone could expand smartphone TAM by 15–20M units with the $599 Neo opening additional PC share.
AI Companions, The $20 to $25B Hardware Opportunity: Our framework for Apple’s entry into smart glasses, AI wearable pendants, and smart home across 40–50M units by end of decade. Includes three-tier market analysis, Apple’s two-phase glasses strategy, health wearables roadmap ($10–15B long-term revenue potential), and why Apple’s silicon advantage is structurally larger in wearable AI than in phones. Estimated 4–6% EPS contribution by FY30.
Edge AI as Multiple Support: How Apple’s unified memory architecture, Foundation Models framework, and 300M+ AI-capable SoCs shipping per year [Estimate] support duration and valuation rather than near-term revenue.
Why the Market Disagrees: A direct engagement with the strongest sell-side valuation counterargument, and three specific reasons we think the premium is structurally justified.
KPI Scoreboard and Kill Criteria: Six observable metrics to track over the next 12 months and five conditions that would cause us to reconsider the structural growth case.