
Research
AI Agents
Let the racehorses run. Your enterprise needs workhorses.

Article by
Dr. Anoj Winston Gladius
·
The five largest US cloud and AI infrastructure providers — Microsoft, Alphabet, Amazon, Meta and Oracle — have collectively committed to spending between $660 and $725 billion on capital expenditure in 2026. That is roughly 3.5 times their combined 2024 capex. Bank of America and Evercore both project 2027 will clear $1 trillion. Across the same period, Anthropic's internal documents project a $14 billion loss for 2026 and no positive free cash flow before 2028 at the earliest; OpenAI began serving advertisements on its free tier in February. The combined revenue of every pure-play frontier lab still amounts to a small fraction of the infrastructure investment being deployed on their behalf. The investors funding this build-out are betting on revenue trajectories that will eventually have to clear trillions of dollars to justify the math. There are two very different ways European enterprises can respond to that landscape. Most are responding the wrong way.
This is the seventh piece in a series I have been writing for neuland.ai. The thread that runs through all of them is the same: in enterprise AI, the value, the risk and the moat sit in the layer above and around the model, not in the model itself. [¹] February's piece was about the control plane that turns LLMs into governed enterprise capabilities. April's was about model drift and the Multi-LLM resilience answer. The first May piece was about model topology — where models run, who controls them. The second was about compliance as a system property. The third made the same argument for agent security. The fourth made it for protocol governance with MCP. The argument I want to make here is the one that ties everything together economically. Cost discipline in enterprise AI is also a property of the system. And the system most European enterprises are quietly committing themselves to today is one optimised for a different set of investors, in a different currency, under a different legal regime, with no path to positive cash flow before the back end of the decade.
There is a better strategy available, and it has been quietly maturing while everyone was distracted by the frontier.
The market that is not the frontier
It is worth being clear that two distinct AI markets are now operating in parallel, and conflating them is the most expensive analytical mistake a CIO can make in 2026.
The first market is the frontier model market — the race between OpenAI, Anthropic, Google DeepMind, xAI, Meta Superintelligence Labs and a handful of others to build the largest and most capable generalist systems. This is the market driving the $700 billion capex line. The players in it are competing for benchmark leadership, mindshare and the eventual revenue trajectories their investors are pricing in. The economics of this market are extraordinary: roughly 88% gross margins for the chip supplier (Nvidia), a small group of cloud platforms taking the capex risk, and the model labs sitting between them holding the unit economics. [²] Mark Zuckerberg's Meta alone increased its 2026 capex guidance to $125–145 billion in April. [³] Anthropic moved from a $1 billion annualised revenue run rate in December 2024 to roughly $30 billion by April 2026 — the fastest enterprise revenue ramp in recorded history — while still projecting a $14 billion loss for the year and no positive free cash flow before 2028. [⁴]
The second market is the enterprise deployment market — the work of integrating AI capability into the actual business processes of actual companies, generating measurable productivity and predictable cost. The technology used in this market is mostly downstream of the first one. Open-weight models released eighteen months ago, on serving stacks that have matured, on hardware that has depreciated, against business problems that have been understood for decades. The value created in this market is real and substantial. The investment required to participate in it is, in 2026, a tiny fraction of what is being spent at the frontier.
The strategic insight is that these are not the same market. The first market is making the second market possible by absorbing the cost of the science. The first market's investors are paying for the discovery; the second market gets to use the results once they have stabilised. The first market is operating on a five-year revenue projection that requires step-function increases in inference monetisation. The second market is operating on a quarterly procurement cycle that requires predictable, defensible cost per unit of business value.
The honest engineering position is this: most enterprises do not need to participate in the first market. They need to participate carefully and well in the second.
Why this strategy works now (and didn't 18 months ago)
The "let them run ahead, we'll deploy what they ship" position is not new. What is new is that the conditions to actually execute on it have, in the past 12 to 18 months, all simultaneously arrived.
Open-weight models have measurably caught up for enterprise workloads. GLM-5.1 briefly held the top spot on SWE-Bench Pro under MIT licence in April. [⁵] Mistral Large 3 sits at 675 billion parameters under Apache 2.0. DeepSeek-V3.2, Qwen 3.6 and the broader open ecosystem are within striking distance of frontier proprietary models on the workloads that make up the majority of enterprise AI traffic — document processing, classification, retrieval-augmented chat, structured extraction, summarisation, internal knowledge. On these workloads, independent evaluations now place leading open-weight models at roughly 94–95% of comparable frontier proprietary capability. [⁶]
Serving stacks for open-weight models have matured into production-grade infrastructure. vLLM, SGLang, Ktransformers, xLLM and similar projects now offer the throughput, batching, quantisation, KV-cache management, speculative decoding and observability that an enterprise inference workload actually needs. [⁷] What required a research team to operate in 2024 requires a platform engineering team in 2026.
Fine-tuning toolchains have become genuinely productive. LoRA, QLoRA, DPO, RLAIF, parameter-efficient fine-tuning of all variants — the techniques that let a small enterprise team take a mid-sized open-weight model and adapt it to a specific vertical (legal, financial, manufacturing, healthcare) are now mainstream engineering practice rather than research projects. A fine-tuned 13–70 billion parameter model that has absorbed your domain's vocabulary, document structure, decision logic and edge cases will outperform a frontier API on your specific workload, at a fraction of the per-inference cost, with the additional benefit that your proprietary knowledge is no longer being transmitted to a third-party endpoint on every request.
The used GPU market is liquid and mature. A new H100 80GB runs €25,000–40,000 in 2026; refurbished, €21,000–34,000; used, €15,000–28,000. [⁸] A100 80GB pricing has dropped further — €4,000–9,000 used, against €7,000–15,000 new. [⁹] Even three-year-old H100s are holding roughly 75–85% of their acquisition value, because inference demand for previous-generation hardware has remained strong as enterprises catch up with frontier capacity. The DACH market has multiple resellers, integrators and ITAD specialists operating in this space at enterprise scale. With the proper due diligence — service-tag verification, SMART power-on hours, Lifecycle Log review on the iDRAC controller — a refurbished H100 acquisition can be made to procurement and audit standards that survive board scrutiny.
Sovereign and EU-jurisdictional cloud has become a real option, not a slogan. This is the development that has changed most quickly and that most DACH procurement teams have not yet fully internalised. STACKIT, run by Schwarz Digits and backed by the Schwarz Group — independent of external investors and therefore not under quarterly VC pressure — now offers GPU instances on enterprise-grade infrastructure with full GDPR compliance and no CLOUD Act exposure. [¹⁰] IONOS has launched an AI Model Hub with OpenAI-compatible API endpoints, H200 and H100 GPU instances and Intel Gaudi accelerators, on infrastructure owned and operated under German listed-company governance. [¹¹] T Cloud Public (formerly Open Telekom Cloud) gives Deutsche Telekom's regulated-sector posture. Plus Server and Hetzner round out the German options for self-managed deployment. Aleph Alpha's PhariaAI provides a fully sovereign stack with on-premises deployment and guaranteed European data residency, in partnership with AMD and Schwarz Digits. [¹²] The European Commission's binding Cloud Sovereignty Framework came into effect in October 2025. [¹³] As Andreas Nauerz, IONOS's Chief Product Officer, put it earlier this month: "Sovereignty is determined not just by the physical location of a data centre, but by who owns the provider, and where the legal jurisdiction sits." [¹⁴]
The cautionary tale to keep in mind, however, is that not every product marketed as sovereign actually is. OpenAI for Germany, announced in early 2026 and positioned as the public-sector sovereign answer for AI in the Federal Republic, is delivered through SAP's Delos Cloud — which itself runs on Microsoft Azure infrastructure underneath. [¹⁵] The German federal procurement marketing materials use the word "sovereign." The underlying deployment topology routes through a US-incorporated hyperscaler. This is the difference between sovereignty by sticker and sovereignty by construction, and it is the same pattern I argued in the Copilot piece earlier this month: compliance is a property of the entire system, not of the layer the marketing material chose to highlight. [¹⁶]
What the deployment topology actually looks like
A serious DACH enterprise stack in 2026, operated to a discipline that survives both the next funding cycle of the frontier labs and the next downturn of the European procurement environment, looks roughly like this.
The model layer has three tiers, not one. The workhorse tier — fine-tuned open-weight models in the 13 to 70 billion parameter range — runs on owned hardware where possible, on used or refurbished H100/A100 capacity, behind a serving stack the platform team controls. This tier handles 70 to 90 percent of enterprise inference traffic by volume: document processing, classification, RAG, summarisation, structured extraction, internal Q&A, simple agent tasks. The sovereign cloud tier runs equivalent or larger open-weight models on STACKIT, IONOS, T Cloud Public or another EU-jurisdictional provider, handling variable load and the workloads that justify cloud economics over capital purchase. The frontier tier is reserved for the specific workloads where the capability gap still justifies the dependency, the cost, and the jurisdictional exposure: long-horizon coding agents, certain reasoning-heavy analytical tasks, multimodal work that the open ecosystem has not yet caught. This tier is a small minority of total traffic.
The orchestration layer above this tiered model layer is what makes the strategy executable. It needs to route per workload under policy, abstract capability so that the application surface doesn't know or care which tier served the request, enforce identity and audit uniformly across all three tiers, hold the integration to enterprise systems centrally, and accommodate fine-tuning workflows that move workloads from frontier dependency toward workhorse independence over time. It also needs to do all of this while remaining hyperscaler-independent, so that the customer's strategic choice of topology is preserved rather than locked into one vendor's roadmap.
This is what the neuland.ai HUB is built to be: an enterprise management and orchestration platform. The HUB sits as the management and orchestration plane above heterogeneous execution surfaces — MCP servers, CLI and shell execution, controlled code-execution sandboxes, browser automation, deterministic orchestration of multi-step workflows, direct enterprise connectors — and above the three-tier model layer described above, applying identity, RBAC, audit, policy, capability abstraction and cost-aware routing uniformly. The HUB itself deploys on the same range of options: customer's own data centre, EU-jurisdictional sovereign cloud, or a hyperscaler region where the workload genuinely justifies it. We integrate fine-tuning workflows directly, so that the path from "we are paying frontier API prices for this workload" to "we are running a workhorse model that has absorbed our domain" is a managed transition rather than a research project. [¹⁷]
A measured call-out
I want to be careful with the framing here. This piece is not anti-frontier-lab. The work coming out of Anthropic, OpenAI, Google DeepMind and Meta Superintelligence Labs is genuinely impressive engineering science. The open ecosystem is downstream of decisions those labs made about how much science to publish, which depends on revenue trajectories that depend on the $700 billion capex commitment being deployed on the labs' behalf. Without the racehorses, there are no workhorses 18 months later.
What I am pushing back against is the assumption that the right strategy for a European enterprise is to ride the racehorses at retail price. The math does not work. The combined hyperscaler 2026 capex is roughly 3.5 times their 2024 spend. The aggregate revenue of pure-play frontier labs is a small fraction of that infrastructure outlay. When investors eventually price the return expectation correctly — and they will, eventually, as Bank of America and Evercore are now both projecting 2027 capex over $1 trillion — the pricing decisions, the feature decisions, the jurisdictional decisions and the contract terms will all rationalise in the direction of those investors' return requirements. They will not rationalise in the direction of European enterprise procurement predictability, DSGVO compliance posture or sovereign data residency. They are not unsuited to the European market by design; they are merely unaligned by structure.
The structural answer is the deployment topology I have described. Workhorse models, owned or sovereign-cloud-hosted infrastructure, fine-tuning for the vertical, frontier capacity reserved for the genuine minority of workloads that need it, and an orchestration layer that makes the whole thing operate as a single managed capability rather than a collection of integrations into US-hosted APIs — or, equally important, into EU-hosted endpoints that mask US-operated models running on US infrastructure underneath.
Personal take
I want to close with the cultural argument, since this audience is principally in the DACH market and the cultural fit of this strategy with German Mittelstand instincts is, in my view, the strongest argument of all.
The German Mittelstand has been doing capital-conservative, engineering-excellent, long-relationship-supplier, generational-thinking strategy for two centuries. Owned assets over rented ones; predictable cost over exciting capability; deep supplier relationships over transactional ones; quality over speed; engineering substance over marketing surface. Apply the same discipline to AI procurement and the conclusion is almost trivially obvious. Owned and sovereign-cloud-hosted workhorse models, fine-tuned for your business, governed through a control plane you can audit, accessed through an orchestration layer that gives you optionality across the model tiers — this is exactly the architecture a serious German engineering company would have arrived at, on its own, if the AI hype cycle of 2023–2025 had not pulled procurement into the racehorse market by default.
I have written it in this series before and will continue to write it: the differentiator in enterprise AI in 2026 is no longer the model. It is the topology, the orchestration, the governance, and the discipline. If a frontier model lands in our HUB two weeks after it has landed in one of our competitors' products, or on the public API of one of those competitors, that is not a delay we apologise for. We need that time to verify the tool-call surface, the data-flow paths, the jurisdictional behaviour and the operational characteristics of the new model against the same 100% DSGVO conformance standard we apply to every other capability the HUB exposes to a regulated workload. Better safe, properly evaluated, fine-tuned where it matters, sovereignly hosted, and predictably costed than fast at retail price under a foreign jurisdiction with a $14 billion annual loss profile on the other side of the API.
A brief note on the regulatory landscape, since it continues to develop. On 7 May 2026, the EU's Digital Omnibus on AI agreement postponed the high-risk Annex III obligations from 2 August 2026 to 2 December 2027, and Annex I obligations to 2 August 2028. [¹⁸] GPAI enforcement powers under Chapter V remain on the original 2 August 2026 schedule. The strategic implication for European enterprises is unchanged from what I argued in the previous pieces in this series: regulatory deadline pressure has weakened slightly, buyer scrutiny has intensified, and the architecture decisions of Q3 2026 are the ones that determine whether your AI stack survives the 2027 procurement cycles intact.
Let the racehorses run. Build workhorses. The topology is what matters.
¹ Series articles at neuland.ai/insights. Previous pieces: "Control Panels, Execution Surfaces…" (Feb 2026); "Wenn KI-Systeme plötzlich schlechter werden" (Apr 2026); "Open weights took the top spot. Meta walked away." (Apr/May 2026); "Compliance is a system property, not a checkbox" (May 2026); "The lethal trifecta is not a vulnerability. It is a property of the system." (May 2026); "MCP solved the integration problem. It just made the governance problem bigger." (May 2026).
² Combined hyperscaler 2026 capex: Microsoft tracking $120–190 billion, Alphabet $175–185 billion, Amazon ~$200 billion, Meta $125–145 billion, Oracle $50 billion — aggregate $660–725 billion. See Futurum Group, "AI Capex 2026: The $690B Infrastructure Sprint," February 2026; Yahoo Finance, "Hyperscalers Hit $700 Billion in 2026 AI Spending Plans," April 2026; Tom's Hardware, "Google, Microsoft, Meta, and Amazon capex spending to hit $725 billion in 2026," April 2026. Bank of America and Evercore both project 2027 capex above $1 trillion. Nvidia gross margins ~88%.
³ Meta increased its 2026 capex guidance to $125–145 billion in late April 2026, citing higher component pricing (notably memory) and growing competition for land, power and skilled labour. Q1 2026 revenue grew 33% to $56.3 billion.
⁴ Anthropic financial position per Vin Vashishta, "$700 Billion in Capex. $50 Billion in Revenue. AI's Math Is Broken.," May 2026; Anthropic moved from ~$87 million run rate (January 2024) to ~$30 billion (April 2026). Internal documents project $14 billion 2026 loss; positive free cash flow not projected before 2028. Series G February 2026 at $380 billion post-money; offers reported at $800–900 billion in May 2026. OpenAI began serving advertisements on the free tier in February 2026.
⁵ See earlier piece: "Open weights took the top spot. Meta walked away. The real question is where these models actually run." (April/May 2026). GLM-5.1 released 7 April 2026 by Z.ai; MIT licence; 754 billion parameters (40B active); SWE-Bench Pro: 58.4 — held leaderboard top spot for nine days.
⁶ Independent SWE-Bench Verified composite analyses, April–May 2026. GLM-5.1 trails Claude Opus 4.6 on Terminal-Bench + NL2Repo composite (54.9 vs 57.5) and reaches ~94.6% of Claude Opus 4.6's coding capability on the broader composite. DeepSeek-V3.2, Qwen 3.6, Mistral Large 3 within striking distance for the majority of enterprise non-frontier workloads.
⁷ Mature open-weight serving stacks (April 2026): vLLM (UC Berkeley, broadly deployed), SGLang (LMSYS), Ktransformers (Tsinghua), xLLM (multiple contributors), TGI (Hugging Face). All support production-grade throughput, batching, quantisation, KV-cache management, speculative decoding.
⁸ H100 GPU 2026 pricing: new $25,000–40,000 (PCIe at lower end, SXM5 at upper end); refurbished $21,000–34,000; used non-refurbished $15,000–28,000. H100 holding 75–85% of acquisition value across 24 months. See Compute Exchange, "NVIDIA H100 GPU Price in 2026," April 2026. Blackwell general availability expected to exert 10–20% downward pressure on H100 secondary pricing once widely available.
⁹ A100 80GB pricing: new $7,000–15,000; used $4,000–9,000. See Jarvislabs, "NVIDIA A100 GPU Price in 2026," March 2026; Introl, "Secondary GPU Markets," March 2026. A100 prices may decline a further 10–15% through 2026 as enterprises continue upgrading to Hopper and Blackwell generations.
¹⁰ STACKIT — Schwarz Digits cloud service, backed by Schwarz Group (Lidl, Kaufland — the largest retail company in Europe). Marketed as independent of external investors. GDPR-compliant; data hosted exclusively in Europe. Partnerships with ServiceNow and Salesforce (Tableau).
¹¹ IONOS — German cloud and hosting provider, part of United Internet AG (Frankfurt-listed). 2026 portfolio includes AI Model Hub with OpenAI-compatible API endpoints, H200 and H100 GPU instances, Intel Gaudi accelerators, managed Kubernetes, Object Storage, Data Centre Designer. Member of Gaia-X and Sovereign-X initiatives.
¹² Aleph Alpha PhariaAI: enterprise GenAI operating system with on-premises deployment and guaranteed European data residency; partnerships with AMD and Schwarz Digits.
¹³ European Commission Cloud Sovereignty Framework, binding from October 2025. Defines and assesses digital sovereignty of cloud services in terms of ownership, jurisdiction, operational control, technical implementation.
¹⁴ Andreas Nauerz (Chief Product Officer, IONOS), as quoted in IoT Now, "Europe's €180 million move: Sovereign cloud rebuild starts now," 25 May 2026. Full quote: "Sovereignty is determined not just by the physical location of a data centre, but by who owns the provider, and where the legal jurisdiction sits."
¹⁵ "OpenAI for Germany" announcement, 2026: SAP–OpenAI partnership delivering OpenAI capability to the German public sector via SAP's Delos Cloud platform, which itself runs on Microsoft Azure infrastructure. SAP announced expansion of Delos Cloud in Germany to 4,000 GPUs to support demand. See TechRadar coverage, "Germany is getting its own sovereign version of OpenAI," 2026.
¹⁶ See previous piece in this series: "Compliance is a system property, not a checkbox" (May 2026), particularly the argument that EU-residency for the inference layer does not retroactively launder data flows that exit the EU through the tool layer or the underlying infrastructure.
¹⁷ neuland.ai HUB capabilities referenced: identity / RBAC / audit trail / tool-call governance / capability abstraction / Multi-LLM routing / cost-aware routing per workload classification / fine-tuning workflow integration / hyperscaler-independent deployment (on-premises, EU-jurisdictional sovereign cloud, hyperscaler region as required). neuland.ai AG retains responsibility for content quality and clean delivery of results.
¹⁸ Council of the EU and European Parliament provisional political agreement on the Digital Omnibus on AI, 7 May 2026. Annex III high-risk obligations postponed from 2 August 2026 to 2 December 2027 (16-month delay); Annex I obligations postponed to 2 August 2028 (12-month delay); Article 50(2) watermarking moved to 2 December 2026. GPAI enforcement powers under Chapter V remain on the original 2 August 2026 schedule.