AI agents in the back office: hype, reality, and what's actually shipping

Every vendor selling into investment ops in 2026 has an "AI agent." Most of those agents are wrappers around large language models that read a CSV and write a summary. That's fine for some things. It is not what ops actually needs.

Here's a more useful framing. Sort agentic claims into three buckets. And ask the vendor which bucket their product lives in.

Bucket 1: Read and summarize

Agent reads a document, summarizes it, maybe extracts structured data. Genuinely useful for things like 10-K extraction or transcript parsing. Easy to demo. Easy to ship. The technology has been good enough since GPT-4.

If a vendor's pitch is "our AI reads your reports," they're in this bucket. Verify by asking: what happens when the extraction is wrong? Most products in this bucket have no useful answer.

Bucket 2: Recommend and route

Agent looks at a state, considers candidate actions, and recommends one. But a human takes the action. Think of a reconciliation tool that proposes a resolution and a human approves. Or a compliance tool that flags a trade and a CCO acts.

Bucket 2 is where most useful enterprise AI lives today. The agent reduces cognitive load and time-to-context without taking on the risk of being wrong. The economics are real: even at 60% recommendation acceptance, your team handles 2.5x the volume.

A test for bucket-2 systems Ask: "Show me the agent's reasoning, the evidence it used, and the alternatives it considered." If the answer is a confidence score and nothing else, the product is bucket 1 wearing bucket-2 clothing.

Bucket 3: Act autonomously

Agent takes irreversible action on its own. Settles a trade. Closes a break. Sends an email to a counterparty.

Bucket 3 is where most marketing decks live and where almost no shipping product is. The reason isn't technical, agents can do this. The reason is operational risk. The downside of a wrong action is asymmetric, and the audit story is hard.

The few places bucket 3 makes sense in ops:

Actions that are reversible and idempotent
Actions with extremely tight tolerances that have to happen in milliseconds
Actions where the agent's track record is statistically dominant, e.g., it has resolved the same break shape 200 times correctly and the cost of a wrong action is low

Where most vendors actually are

Most "agentic" products are bucket 1 with a confidence score, marketed as bucket 3. The integration burden falls on you because the agent can't actually do anything in your systems, it can only describe what it would do.

What we shipped at ForeStrat is a deliberate bucket-2 system. The agent reasons about breaks, surfaces recommendations with full evidence, and acts only when:

The break shape has been resolved correctly more than N times before
The resolution is reversible
The fund has explicitly opted in to autonomous resolution for that break class

For everything else, a human approves. That's not a limitation; it's a feature. Audit trail intact, regulatory story clean, ops team in control.

What to ask vendors

Three questions that cut through the marketing layer:

What does the agent do when it's wrong? Real answer: it shows confidence, surfaces evidence, routes to a human. Bad answer: silence or "our accuracy is 99%."
Can I see the agent's reasoning trace? Real answer: yes, here it is, including alternatives considered. Bad answer: "it uses our proprietary model."
What does the audit log look like? Real answer: every action with timestamp, evidence, model version, and the human who approved. Bad answer: there isn't one.

If a vendor can't answer those, they're not ready for production ops. Probably not ready for UAT either.

More notes from the team, subscribe via LinkedIn or email demo@forestrat.ai.