Building an AI Assistant for Business Central: 61 tools, hardcoded write whitelists, and Preview-and-Confirm

Six months ago we shipped the AI Assistant for Business Central. It lets a BC user ask any question about their data, or trigger any action, in natural language. Claude does the reasoning; the assistant calls the appropriate BC API; nothing reaches the database without explicit user confirmation.

This is what we got right, and what nearly went wrong.

The shape of the system

The assistant exposes 61 tools to Claude. 44 of them are read queries spanning the standard Business Central table set — customers, vendors, items, sales orders, purchase orders, invoices, journals, dimensions, the chart of accounts. 17 are write operations covering posting, status transitions, item maintenance and journal entry.

Each tool has:

A Zod schema for the input
A versioned name (CreateSalesOrderV1, PostPurchaseInvoiceV2) so we can evolve safely
An explicit BC API endpoint it maps to
A list of fields it may write to — hardcoded, in AL

The model can call any tool it has been given. It cannot author a write to a field outside the whitelist for that tool. We did not enforce this through a prompt instruction; we enforced it at the AL layer, on the BC side, after the JSON payload is parsed.

Why the prompt isn't the security boundary

This is the most important paragraph in this post.

Early in the build we had a tempting design: tell Claude in the system prompt which fields it could write to, and trust it to comply. Claude follows instructions reliably; in test, it did the right thing. We rejected this anyway.

The reason is that the prompt is not a security boundary. A future Claude release could relax adherence by a few percent. A jailbreak in a downstream user prompt could leak. A model context shift mid-conversation could cause the assistant to "remember" the wrong rule. None of these are likely; all of them are non-zero.

We pushed the whitelist down to AL. The model can author a write to any field it wants. The BC code drops every field outside the whitelist before the API call leaves AL. If Claude tries to set Customer.PriceListCode, and that field is not on the whitelist for SetCustomerCreditLimitV1, the field disappears. No log shaming, no escalation; just silent drop with an audit trail.

This is the only design that survives the next decade of model releases without rework.

Preview-and-Confirm: the second guard

Every write — even one that passes the whitelist — sits in a Preview-and-Confirm screen before it executes. The user sees:

The exact records that will be affected
The exact field values that will be set
The exact BC API endpoint that will be called
A diff against current values for updates

The user accepts, rejects, or edits the preview. If they edit, the edited values go through the same whitelist on the way out. If they reject, the cancellation is logged the same as a confirmation, with timestamp and user.

We considered making Preview-and-Confirm optional for low-risk operations. We decided against it. The cognitive cost to the user is one extra click; the safety property — every BC write is human-authorised — is worth more than the click.

Tool design that helps the model

Claude is good at picking the right tool when the tool names are descriptive and the schemas are precise. A few principles that worked:

Verbose names — SetCustomerCreditLimitV1, not UpdateCustomer. The model picks more accurately when names imply scope.
One concern per tool — we resisted "SuperWriteCustomer" tools that took dozens of optional fields. Each business action is a tool.
Mandatory dimensions — every tool requires the user's BC company code and posting date. Claude can fill these from context; passing them explicitly removes ambiguity.
Idempotency keys — every write tool accepts a client-generated key so retries are safe.

What read tools do that write tools don't

Read tools return data; write tools propose changes. The Preview-and-Confirm flow only fires on writes. This means read latency is low — a customer-lookup query round-trips in under a second.

For reads, we cache aggressively. Customer master, vendor master, item master, chart of accounts — all cached at the conversation level with a 5-minute TTL. Subsequent questions in the same conversation that touch the same master data don't pay the BC API round-trip.

The model tier mix

Routing is by intent classification, not by tool. An incoming user message goes through a tiny Haiku classifier that decides:

Read + simple lookup → Haiku
Read + multi-step reasoning → Sonnet
Write of any kind → Sonnet
Cross-tenant aggregation, audit summary → Opus

About 20% of traffic ends up on Haiku, 70% on Sonnet, 10% on Opus. Cost per conversation averages under $0.04 at current Anthropic list pricing, before cache hits.

What we'd build differently

Two things, looking back.

First, we wired conversations into the BC database too quickly. We should have run a thin shadow log to a separate Postgres for the first month — debugging conversation patterns when they're co-mingled with the BC schema is harder than it needed to be.

Second, we should have written the tool catalogue before writing any tools. We grew it organically and ended up with 61 tools instead of the ~40 we'd have designed top-down. We'll consolidate in V2.

But the field-whitelist-at-AL decision, and the no-exceptions Preview-and-Confirm rule, were both right. Those two together are what let us point at the product and say: it's safe, and it stays safe.

Building an AI Assistant for Business Central: 61 tools, hardcoded write whitelists, and Preview-and-Confirm

The shape of the system

Why the prompt isn't the security boundary

Preview-and-Confirm: the second guard

Tool design that helps the model

What read tools do that write tools don't

The model tier mix

What we'd build differently

Keep reading.

Camera-first warehouse execution: betting against barcode scanners

Above the Cues: why we built a Role Center charting dashboard instead of pushing Power BI

Idempotent POS posting in Business Central: how we made retry safe by construction

Have a project that touches what you just read?