Audit is a word that quietly changed meaning over the last two years, and almost nobody has noticed.
For most of software history, audit was a defensive function. Something bad happened - or might have happened - and you went to the logs to figure out what. The logs existed for engineers, and they were treated as such: ephemeral, rotated, sampled, sometimes off entirely. If audit was a serious requirement, you bolted on a separate logging path with longer retention and stricter access. The serious version was usually called "compliance," and it was usually a project nobody wanted.
This view of audit assumes a few things. It assumes the actor is human or human-written code. It assumes the action was deterministic - somebody intended exactly what happened, and if they didn't, that's a bug to fix in the code. It assumes the question "what did the system do?" can be answered approximately, because the people asking it already have a strong prior about what the system was supposed to do.
None of those assumptions hold once an agent is in the loop.
What changes when the actor is probabilistic
When an agent takes an action, the question "what did our software do?" stops being a question about code paths and starts being a question about decisions.
Decisions made by something that is not deterministic. Decisions made on top of a model whose internal state is partially opaque even to the people who built it. Decisions that depend on context that may not exist anywhere outside that one moment. Decisions that, run again with the same inputs, might come out slightly different.
You cannot reconstruct those decisions from logs after the fact, because the relevant state - the prompt, the context, the model version, the tool descriptions, the user's last six messages, the agent's internal scratchpad - was never written down anywhere. The application logs the call. The provider logs the response. Nobody logs the why, because in traditional software the why is in the source code, and you can read it.
In agentic software, the why is gone the moment the request finishes, unless something captured it on the way through.
This is the first thing that has to change. Audit is no longer something you reconstruct. It is something you emit, as a first-class output of the system, at the moment the action happens. If the audit record is not produced by the same machinery that produced the action, it is not an audit record. It is a guess.
What the record has to contain
Once you accept that audit has to be emitted, not reconstructed, the next question is what it has to contain. This is where most teams set the bar too low.
The bare minimum is the call and the response: at 14:32:11, the system called POST /api/customers/{id}/refund with payload { amount: 150.00 }, and got back { status: "ok" }. Most APIs already log this, in some form. Most teams stop there.
That record is useless for the questions that actually matter.
It does not say why the system made that call. It does not say what the user asked for. It does not say what the agent decided. It does not say what governance ran on that decision, or what the verdict was, or whether a human approved it, or who that human was, or how long they took, or what they were looking at when they clicked yes. It does not say which model produced the decision, or which version, or which agent, or which application invoked the agent. It does not say which credential was used to make the call, or who configured that credential, or when.
It is, in other words, the answer to the wrong question. It tells you what happened; it tells you nothing about whether what happened was supposed to happen.
A real audit record for the agentic era has to capture the whole chain. Intent - what the agent or its caller said it was trying to do. Decision - what the system resolved that intent into, including which capability, which provider, and which arguments. Evaluation - what policy ran, what the verdict was, and which rule produced that verdict. Approval - if a human gated the action, who, when, with what context. Execution - the actual outbound call, with the credential reference (never the credential itself). Response - what came back, structured. Outcome - succeeded, failed, deferred, refused.
This is more data than most systems collect. It is also less data than people fear. Each individual piece is small. The cost is in deciding to collect it consistently and in storing it in a way that lets you ask questions later.
The append-only constraint
There is one more property that changes the meaning of the record entirely, and most teams resist it the first time they hear it. The audit log has to be append-only.
Not "we don't usually edit it." Not "we have permissions set up so most people can't change it." Append-only as a hard property of the storage layer. Once a record is written, the only legal operation on it is reading. Not amendment, not deletion, not "fix that typo," not even a small administrative correction.
This is an uncomfortable constraint because every other database in your system is mutable, and append-only databases feel inconvenient. They also feel paranoid. Who exactly are we worried about?
The answer is no one in particular. The point of append-only is not that you suspect your team will tamper with the records. The point is that an audit log whose contents can change is not evidence. It is just another database. Any record you can edit, you can be asked to edit. Any record you can edit, an attacker who reaches your systems can edit. And any record you can edit, a future you, under pressure, with the best of intentions, will eventually edit "just to clean it up." Append-only removes the question. The record is what was written. If the record was wrong, the correction is a new record that references the original. The history is intact.
This is also what makes the audit log useful for things beyond compliance. When you can trust that records were not retroactively changed, you can use them as the source of truth for analytics, for billing, for incident reconstruction, for product decisions. The same property that makes the log defensible to a regulator makes it trustworthy to your own team.
Why this matters now
The temptation, reading all of this, is to treat it as a future problem. Right now the agent is small, the actions are bounded, the team knows what it's doing, the audit can wait until the system gets serious.
The trap is that the system will get serious before you notice. By the time someone in the company asks the question that requires the record - show me every action this agent took on customer X over the last quarter - the relevant data, if it was not captured at the time, is not coming back. You can apologize. You cannot reconstruct.
The other trap is that the cost of designing the system for proper audit goes up sharply once the system is in production. Adding append-only audit to a system that has been running for a year is not a feature, it is a migration. Doing it on day one is a few decisions about where the records live and what they contain. Doing it on day three hundred is a quarter-long project with a steering committee.
The audit trail is one of the few parts of the agentic stack where there is no clever shortcut and no graceful retrofit. You either capture the chain at the time of the action, or you do not have it.
We capture it. Not as a side effect. As the point.