Compute Designed for Agents

Build 2026 just validated the local AI strategy and gave IT the compliance answer it needed. Here is the architecture, and why it matters.

Build-2026-lead-image

The Short Version

▸ Build 2026 was a platform architecture event, not a hardware launch

▸ MAI (Microsoft AI) models run identically on-device and in Azure, dual-mode by design

▸ Foundry Control Plane routes between MAI models on based on policy

▸ Agent 365 governs both venues through Entra, Defender and Purview

▸ Scout is the first end-to-end proof that the pieces compose

▸ Tiered routing cuts blended token cost from $18.40 to $2.31 per million

When Microsoft CEO, Satya Nadella, unveiled the Surface RTX Spark Dev Box at Microsoft Build 2026, built on the recently announced NVIDIA RTX Spark chip that Jensen Huang, NVIDIA’s CEO, had unveiled a day earlier at Computex Taipei, the room read it as a hardware moment. It was not. The hardware was the prop. The actual announcement was that Microsoft has finished building the platform IT departments have been holding out for: a single governed fabric where the same agent can execute on a desk in Boston or in an Azure region, with one identity model, one policy plane and one set of audit hooks.

Build 2026 closed a gap that has stalled enterprise agent deployment for two years. The gap was never the model. It was the operating layer underneath the model. Local inference solved the cost and latency problem. Cloud inference solved the scale and reasoning problem. Nobody had stitched the two together with governance that would survive a SOX audit, a PII data-handling review or an IP exfiltration assessment. Microsoft just did, and it built the hardware to enable it.

This is the angle that matters for IT leaders sitting on stalled agentic AI projects: the compliance answer arrived. The architecture is no longer hypothetical. And the silicon, on both sides of the boundary, was designed for agents, not people.

"For forty years, you launched apps. The PC is being reinvented."
— Jensen Huang, NVIDIA · Computex Taipei, June 1 2026

A Platform Event Disguised as a Hardware Launch

The Surface RTX Spark Dev Box, announced by Nadella himself on the Build keynote stage, looks like a compact desktop. Inside is the NVIDIA RTX Spark superchip with 128 GB of unified memory and a petaflop of FP4 tensor performance. The Dev Box runs the same software stack, container formats and model tooling as Azure GPU infrastructure. Binaries need a recompile when moving between architectures, but frameworks, inference runtimes and deployment pipelines transfer intact. Microsoft preloads it with VS Code, GitHub Copilot in Terminal, WSL and PowerShell 7. What matters is a developer can build, test and run an agent locally and ship it to Azure without changing a single line of code.

The Surface Laptop Ultra, also announced by Nadella in the days leading into Build, and confirmed for fall 2026 availability, carries the same chip in a 15-inch chassis with a 2,880×1,920 mini-LED display at up to 2,000 nits peak brightness. Apple’s M5 Max, shipping today in the MacBook Pro, hits 128 GB of unified memory at 614 GB/s and delivers 12 to 18 tokens per second on 70B parameter models. Both vendors landed on the same architectural conclusion: tear out the discrete GPU, fuse CPU, GPU and memory onto a single die and stop pretending agentic workloads can be served by hardware that was designed for web browsing and office use.

Build Blog image 1.1

Daymark's Experience

I have been running an M5 Max MacBook Pro through testing on a variety of agentic workloads for a few months now. OpenClaw and NemoClaw for orchestration, Hermes for long-horizon planning and research, VS Code for management and other supporting tools (e.g. ollama) have all shown steady improvement. MLX, Apple’s framework tuned for the unified memory architecture, delivers another 40 to 80 percent of throughput on top of llama.cpp on the same hardware.

The takeaway is straightforward. We now have credible options for running models locally to power agentic workflows, which translates directly into a more cost-effective approach to enterprise AI. Microsoft’s hardware announcements, MAI LLM releases and enhancements to their supporting cloud services further solidify this trajectory. The local side of the architecture is no longer aspirational, it is something a developer can spec, buy and have on the desk in short order.

Hybrid Models Address the Cost Gap

The reason this matters now is that the economics of cloud-only agentic workloads stopped being defensible in the first quarter of 2026. Gartner’s March 2026 analysis confirmed that agentic workflows consume 5 to 30 times more tokens per task than conversational AI. A Q1 2026 study of 2.4 billion enterprise API calls found that organizations routing everything to frontier models paid a median of $18.40 per million tokens. Organizations running tiered routing, local for routine tasks and frontier for complex reasoning, paid $2.31. That is an eight-fold cost penalty for ignoring local inference. GitHub itself moved Copilot to usage-based AI Credits billing on June 1 because the inference cost had become impossible to absorb at flat rate.

The two most public confirmations of the structural problem landed in the last three weeks. Uber burned through its entire 2026 AI coding tools budget in four months after rolling out Claude Code to roughly 5,000 engineers and watching monthly per-engineer spend reach $500 to $2,000 for heavy users. Uber’s COO publicly questioned whether the spend was worth it. Days later, Microsoft confirmed it was cancelling Claude Code licenses across its Experiences and Devices division by June 30, steering thousands of engineers toward GitHub Copilot as tool costs continued to climb. When two of the most sophisticated AI buyers in the industry both hit the same wall in the same quarter, the signal is no longer interpretive. Cloud-only agentic deployment is a budgeting problem before it is anything else.

Build Blog Image 2

Figure 1 · Blended Token Cost at Agentic Scale

Build Blog Image 3

Build 2026 turns that math from a finance problem into an architecture problem. The Foundry Control Plane routing policy is the artifact that operationalizes it. The MAI model family removes the behavioral risk of routing across providers. Agent 365 removes the compliance risk of routing across execution environments. The hardware removes the performance gap associated with local compute.

MAI: One Model Family, Two Execution Models

The MAI model family was one of the most consequential announcements at Build. Microsoft announced seven in-house models at once: MAI-Thinking-1, its first reasoning model, at 35 billion active parameters with a 256K context window; MAI-Code-1-Flash, the compact coding model now rolling into GitHub Copilot; MAI-Image-2.5 and its Flash variant for text-to-image and image-to-image work; MAI-Transcribe-1.5 for speech recognition; and MAI-Voice-2 and its Flash variant. All seven were trained from scratch, using Microsoft's own Maia silicon with no distillation from other labs.

The strategic logic behind the family matters as much as any individual model. MAI-Code-1-Flash is not there to compete with GPT-5 on benchmarks. It is there to handle the high-volume, lower-complexity coding requests that currently run through Copilot at frontier model prices. MAI-Thinking-1 handles the long-horizon reasoning work in Azure where it earns its cost. Microsoft's design direction is to route within a single model family across that cost curve, which removes the behavioral risk of routing between providers and gives enterprises a single training lineage to audit. Whether the local NPU deployment of compact MAI models fully deliver on that local-cloud consistency promise is something production use will confirm.

Foundry Control Plane: Routing as a Managed Concern

Microsoft Foundry’s Model Router makes model selection a platform decision instead of an application-code decision. Rather than hard-coding every app to a specific model or provider, teams can use Foundry to route requests based on goals like cost, latency, and quality. Developers can still choose a specific model when they need control, but the same application pattern can support routed or direct model calls.

"This is the end of vendor lock-in in AI. Enterprises should not have to rewrite their apps every time a model improves."
— Scott Guthrie, Microsoft EVP, Cloud and AI · Build 2026 breakout

That is a strong claim, and the Model Router is what makes it operationally possible. Application code does not change when the routing changes.

Foundry now gives customers access to more than 11,000 models across Microsoft, OpenAI, third-party, industry, and domain-specific model families. That does not eliminate vendor lock-in completely, but it reduces the amount of rewiring required when a better model becomes available.

For enterprise IT, the value is governance. When routing lives in Foundry instead of inside custom application code, model decisions become easier to review, change, and audit. Teams can see which model served a request, adjust routing behavior, and manage tradeoffs between performance and cost more consistently.

The key caveat is that Microsoft’s public documentation supports model routing, cost/quality routing modes, failover, and a common API pattern. It does not yet clearly document a fully automatic local-to-cloud handoff or a single policy that routes across every MAI, GPT, Claude, and local endpoint.

Figure 2 · The Local Cloud Fabric, Agent 365 Governed

Build Blog Image 4

Agent 365: The Compliance Answer IT was Waiting For

Agent 365 was one of the Build announcements enterprise IT leaders should not overlook. Microsoft is positioning it as the control plane for AI agents, extending Entra, Defender, and Purview controls to agents across the organization.

In practice, that means IT gets a more consistent way to discover, manage, secure, and govern agents, whether they are built with Microsoft tools, partner frameworks, or registered from other environments. Entra provides identity and access controls, Defender adds threat detection and security posture management, and Purview brings data protection, classification, and compliance oversight.

The important point is governance. As agents move from pilots into production, IT needs to know what agents exist, what they can access, what data they use, and whether they are behaving safely. Agent 365 gives administrators a central place to see and govern that activity instead of relying on one-off controls buried inside each agent framework.

Why This Matters

For corporate IT teams managing PII exposure, IP protection, SOX controls and internal data-loss prevention, this is what changes the calculation. Agent projects have stalled in many enterprises for two reasons. The first is that nobody could answer the question of which entity is performing the action when an agent acts on behalf of a user. The second is that nobody could classify the data crossing the agent’s tool boundary.

Agent 365 addresses both. Agents now have identities in Entra. Their data interactions get labeled by Purview. Their behavior gets monitored by Defender. The compliance story for an agent is the same compliance story IT has been running for users and devices for the last decade.

Scout: The First Autopilot and the First End-to-End Proof

Microsoft Scout, announced at Build and rolling out as an experimental release through the Frontier program, is the first instance of what Microsoft is calling an Autopilot: a persistent, always-on agent that takes action across the Microsoft 365 estate without being prompted each turn. Where Copilot is conversational, Scout stays resident, watching the cadence of the workday. Microsoft lists capabilities including inbox triage, calendar optimization, meeting preparation, daily briefings, follow-up management, risk detection across communications and cross-agent coordination. Scout operates across Teams, Outlook, OneDrive, SharePoint, the Windows desktop and the web simultaneously, grounded by Work IQ over the Microsoft Graph. The Build demo showed Scout reading a customer email in Outlook, cross-referencing SharePoint, updating an Excel dashboard and sending a Teams confirmation without user intervention.

The reason Scout matters is not the demo. Scout is the first evidence that platform pieces work in a cohesive manner. Microsoft built Scout on OpenClaw, the open-source agent harness, and on Work IQ as the context layer. Scout has its own Entra identity, its data interactions flow through Purview, its behavior is observable in Defender and it calls MAI models through Foundry Control Plane. Sensitive actions can require a human sign-off before they execute. Every Build 2026 announcement traces through Scout.

Compute Designed for Agents, not People

There is a phrase from Jensen Huang’s Computex keynote worth holding on to. His framing was that for forty years, you launched apps; the PC is being reinvented around agents. The implication is not metaphor. It is silicon design.

A laptop designed for a person needs a fast single-threaded CPU, a discrete GPU for occasional bursts and enough memory for thirty browser tabs. A laptop designed for an agent needs a wide memory bus, a deep tensor engine and enough unified memory to hold a 70B model alongside the operating system. The M5 Max, Surface Ultra, and the RTX Spark are answers to the second question. They are not faster versions of the first kind of laptop. They are different machines, sized for a different workload, optimized for a different bottleneck.

Token generation is memory-bandwidth bound. The Apple chip wins on raw bandwidth at 614 GB/s. The NVIDIA chip wins on quantized compute with a petaflop of FP4 and on CUDA portability to Azure. Both are answers to the same question.

That is the architectural signal IT leaders should take from the last two weeks. The two most consequential silicon vendors in the industry converged on the same design of compute for the same target workload. The hardware, the compliance plane and the model family are all real and shipping.

Build 2026 was not a hardware show. It was the first time Microsoft and its silicon partners showed up with the complete picture. The agent is governed. The model is portable. The hardware is designed for the workload. The compliance answer is named. That is the platform IT has been waiting for, and the strategy is now executable.

About Daymark

Daymark Solutions has been an IT Integrator since 2001, headquartered in Burlington, Massachusetts and serving customers across North America. Daymark designs, implements and supports enterprise infrastructure across two converging practices: a modern data center practice spanning virtualization, enterprise storage, data protection, networking and cybersecurity as well as a Microsoft cloud and AI practice covering Azure, M365, Copilot, Copilot Studio, Foundry and Fabric.

Daymark holds Microsoft Frontier AI partner status, the designation Microsoft reserves for partners with the advanced certifications and proven delivery record to lead enterprise AI engagements. Daymark also operates a dedicated Azure Government practice, including GCC High enclave design and implementation for defense industrial base contractors managing CUI and pursuing CMMC compliance.

If you’ve got questions or want to discuss how these announcements can work within your environment, please contact us . We’re excited about the shift this complete platform architecture will bring to our customers.

Daymark IT Insights

Compute Designed for Agents

A Platform Event Disguised as a Hardware Launch

Daymark's Experience

Hybrid Models Address the Cost Gap

Figure 1 · Blended Token Cost at Agentic Scale

MAI: One Model Family, Two Execution Models

Foundry Control Plane: Routing as a Managed Concern

Figure 2 · The Local Cloud Fabric, Agent 365 Governed

Agent 365: The Compliance Answer IT was Waiting For

Why This Matters

Scout: The First Autopilot and the First End-to-End Proof

Compute Designed for Agents, not People

About Daymark

Subscribe to Daymark Insights

Latest Posts

Browse by Tag

How Can We Help?

About

Connect

Links

Contact

Daymark IT Insights

Compute Designed for Agents

A Platform Event Disguised as a Hardware Launch

Daymark's Experience

Hybrid Models Address the Cost Gap

Figure 1 · Blended Token Cost at Agentic Scale

MAI: One Model Family, Two Execution Models

Foundry Control Plane: Routing as a Managed Concern

Figure 2 · The Local Cloud Fabric, Agent 365 Governed

Agent 365: The Compliance Answer IT was Waiting For

Why This Matters

Scout: The First Autopilot and the First End-to-End Proof

Compute Designed for Agents, not People

About Daymark

Subscribe to Daymark Insights

Latest Posts

Browse by Tag

How Can We Help? hbspt.cta._relativeUrls=true;hbspt.cta.load(30865, '5de282fd-640c-49b2-bf45-603dbee66842', {"useNewLoader":"true","region":"na1"});

About

Connect

Links

Contact

How Can We Help?