concepts · tweet · 7 min
Excel AI Agent Architecture Comparison Study
Nicolas Bustamante · Feb 25, 2026
How different are these systems really? How much do they diverge from just giving a general purpose agent access to a spreadsheet?
Excel AI agents are everywhere right now. Every major platform is shipping one, and the demos all look the same: type a prompt, get a spreadsheet. I kept seeing new ones pop up and started wondering: is there actually a tech edge here, or is this going to be a pure distribution play where the biggest platform wins regardless of architecture?
I reverse engineered three production Excel AI agents: Claude in Excel (Anthropic), Microsoft's Copilot Excel Agent, and Shortcut AI.
I dug into their tool schemas, stress tested their error handling, mapped their verification loops, and pushed each one to its limits. It was a lot of fun!
I learned a lot. Claude's tool design is genuinely impressive. Shortcut has ambitious scaffolding, some of it quite clever. Microsoft's Copilot is not where it should be yet, but the "yet" matters more than the current state. I think Microsoft is gonna win ultimately. They are moving fast and have a lot of distribution.
Interestingly, these agents are not wrappers around an LLM. They are tool-calling agents with structured schemas, Python sandboxes, overwrite protection protocols, and carefully designed verification loops. The differences between them reveal fundamental tradeoffs in agent design that apply to every AI agent being built today, not just Excel.
Here's what I'll cover:
-
The Three Architectures - 14 structured tools vs. 2 raw tools vs. 11 tools with a helper API
-
How They See Your Spreadsheet - Lazy loading vs. eager loading, and why it matters more than the model
-
The Overwrite Protection Spectrum - Tool-enforced vs. behavioral, and only one gets it right
-
The Two-Tier Tool Hierarchy - Why every agent needs a safe path and an escape hatch
-
The Blind Agents Problem - Two out of three can't see your spreadsheet
-
The Python Sandbox Bridge - Two isolated worlds connected by one agent
-
The Bloomberg Formula Trick - Writing formulas for add-ins you don't control
-
The Self-Verification Loop - How each agent checks its own work
-
Memory, Simulation, and What's Broken - Features that ship before they work
-
The DCF Test - Same prompt, three agents, three very different models
-
What This Means for Agent Design - Five questions every agent builder must answer
The first surprise: the model matters less than the tools.
All three agents I tested use frontier models. Claude in Excel runs Claude, obviously the only one locked to a single model provider. Microsoft's Copilot Excel Agent routes between Claude and GPT. Shortcut AI uses a mix of Anthropic and OpenAI models with routing abstracted away. Smart move from Microsoft and Shortcut to stay model agnostic: the model layer is commoditizing fast, and the ability to swap providers without rearchitecting is a real advantage.
What's not equivalent is the tool architecture. And that's where the real differences live.
Let me show you what each agent actually has access to.
Claude in Excel: 14 tools (11 spreadsheet + 3 non-spreadsheet)
Claude has the most opinionated tool design. Each operation gets its own tool with a specific schema:
Each tool has a typed schema. set_cell_range takes a cells parameter: a 2D array where each cell object can contain value, formula, note, cellStyles, and borderStyles. Plus allow_overwrite, explanation (shown in the UI), copyToRange (for pattern expansion with formula translation), and resizeHeight/resizeWidth. The tool validates every parameter before executing. If something is wrong, it returns a structured error, not a JavaScript stack trace.
Microsoft's Copilot Excel Agent: 2 tools, raw power
Microsoft took a radically different approach. Two tools total.
Every spreadsheet operation funnels through one generic tool that generates and executes raw Office.js. Write a value? Generate Office.js. Create a chart? Generate Office.js. Format cells? Generate Office.js. The tool schema itself is remarkably minimal: a single program parameter of type string. No timeout, no error handling mode, no return format options. The code string is the entire interface.
The system prompt does impose structure: a progressive pattern of load initial state, apply changes, verify results, return confirmation. But within that template, the actual operations are freeform JavaScript.
This makes Copilot the most token efficient architecture of the three for simple tasks. A single tool call can pack an entire section of a financial model into one `
` block: headers, values, formulas, formatting, all in one shot. Fewer LLM round trips, lower latency. But it comes at a real cost.
The problem with running everything through raw code generation is threefold. First, the agent has to produce syntactically correct Office.js every single time. No typed parameters, no schema validation, no structured error messages. When something fails, you get a JavaScript stack trace instead of a clear explanation of which parameter was wrong. Second, there is no way to enforce safety at the tool level. Overwrite protection, input validation, range checking: all of that has to live in the generated code itself or in the system prompt, neither of which is reliable across millions of agent sessions. Third, debugging is harder. A 40-line Office.js script that fails on line 32 gives you far less signal than a structured tool call that rejects bad input before execution.
That said, the architecture is straightforward to improve. There is nothing fundamentally complex about adding structured tools, parameter validation, or tool-level safety on top of the existing Office.js execution layer. Microsoft owns both Excel and the Office.js API surface. That is a massive platform advantage, and closing the gap is engineering work, not a research problem.
Shortcut AI: 11 tools, one generic + rich helpers
Shortcut sits in the middle, but with an interesting twist. They also have one generic execution tool for spreadsheets, but they've built a rich helper API on top of it. Plus ten support tools for everything else.
The key architectural insight: where Claude has 13 separate tools for different spreadsheet operations, Shortcut has 1 generic execute_code tool with a rich TypeScript API layered on top. It's architecturally closer to Microsoft's raw Office.js approach, but with much better developer ergonomics.
Shortcut also has the best UX of the three. Plan mode breaks complex requests into structured steps before executing. Queries are queued, so you can fire off multiple requests without waiting. And follow-up interactions present structured options that let you steer without re-prompting. Small details individually, but in practice they make multi-step workflows noticeably smoother than the other two.
Here's what the same operation looks like across all three:
WRITING "Revenue" TO CELL A1
Claude:
Tool: set_cell_range
Params: { sheet: "Sheet1", range: "A1", values: [["Revenue"]] }
→ Tool validates, writes, returns formula_results
Microsoft:
Tool: ExcelAgent_excel_interact_document
Code: await Excel.run(async (ctx) => {
const sheet = ctx.workbook.worksheets.getItem("Sheet1");
sheet.getRange("A1").values = [["Revenue"]];
await ctx.sync();
});
→ Raw JavaScript, no validation layer
Shortcut:
Tool: execute_code
Code: async function execute() {
const sheet = await workbook.getSheet("Sheet1");
await sheet.setCell("A1", "Revenue");
}
→ Helper API wraps Office.js, cleaner but still generic
Three ways to write "Revenue" to a cell. The difference isn't the result. It's what happens when something goes wrong, and how many tokens it costs to get there.
Microsoft's approach is the most token efficient for simple operations: one tool call can write an entire section of a financial model. Claude's structured tools add overhead per operation but provide validation and automatic verification at every step, which pays off in complex models where a single wrong cell reference cascades across 50 formulas. Shortcut sits in between: cleaner developer ergonomics than raw Office.js, but without Claude's tool-level safety. For a quick formatting task, Copilot wins on speed. For a 500-row DCF with cross-sheet references, Claude's architecture is better suited to catch errors before they compound.
Most people assume Excel AI agents can see your spreadsheet. They can't.
When you open Claude in Excel and type "analyze this data," Claude doesn't receive a snapshot of your workbook. It receives a tiny metadata summary. Sheet names, dimensions, which sheet is active, what cell you have selected. That's it. No values. No formulas. No formatting.
WHAT CLAUDE ACTUALLY RECEIVES PER MESSAGE
This is lazy loading. Claude knows a sheet called "DCF" exists with 45 rows and 12 columns, but it has no idea what's in any of those cells until it explicitly calls get_cell_ranges or get_range_as_csv.
Microsoft's agent does the opposite. It gets eager loaded with a preview of actual values from the used ranges:
WHAT MICROSOFT'S AGENT RECEIVES PER MESSAGE
Shortcut uses lazy loading like Claude: sheet names and used ranges, no values upfront.
The tradeoff is real:
For a quick "what's in cell B5?" question, eager loading wins. The agent already has the value. For a complex financial model with 50 sheets and 100K rows, laz