Skip to main content

Runs

A run represents a single test session within an environment. Starting a run captures a “before” snapshot, and ending it captures an “after” snapshot to compute the diff.

Run Lifecycle

1

Start Run

Takes a “before” snapshot of the environment state
run = client.start_run(envId=env.environmentId)
2

Agent Execution

Your agent makes API calls that modify the environment
3

Compute Diff

Compares before/after states to produce a diff
result = client.diff_run(runId=run.runId)

Run Properties

PropertyDescription
runIdUnique identifier
statusrunning, completed, failed
beforeSnapshotSnapshot ID before agent execution
afterSnapshotSnapshot ID after agent execution

Diffs

A diff is the computed difference between the before and after states of an environment.
import requests
from agent_diff import AgentDiff

client = AgentDiff()
#Add envs 
# or pass explicitly: AgentDiff(api_key="", base_url="https://api.agentdiff.dev")


# Create sandbox.
env = client.init_env(templateService="slack", templateName="slack_default", impersonateUserId="U01AGENBOT9")
run = client.start_run(envId=env.environmentId)

# Post message to #general 
response = requests.post(
    f"{client.base_url}/api/env/{env.environmentId}/services/slack/chat.postMessage",
    headers={"Authorization": f"Bearer {client.api_key}"},
    json={"channel": "C01ABCD1234", "text": "Hello!"}
)

# Get diff
diff = client.diff_run(runId=run.runId)
pprint(diff.model_dump())

# Cleanup
client.delete_env(envId=env.environmentId)

Output Structure

{
  "inserts": [
    {
      "__table__": "messages",
      "message_id": "1732645891.000200",
      "channel_id": "C01GENERAL99",
      "user_id": "U01AGENBOT9",
      "message_text": "Hello World!"
    }
  ],
  "updates": [
    {
      "__table__": "channels",
      "before": { "last_message_at": null },
      "after": { "last_message_at": "2025-11-26T15:31:31" }
    }
  ],
  "deletes": []
}

Diff Types

Inserts

New records created by the agent

Updates

Existing records modified (shows before/after)

Deletes

Records removed by the agent

How Diffs Are Captured

Agent Diff uses PostgreSQL logical replication to capture every change:
  1. WAL Capture: All database writes are logged to the Write-Ahead Log
  2. wal2json: Converts WAL entries to JSON format
  3. Change Journal: Filtered and stored per environment/run
  4. Diff Computation: Aggregated into inserts/updates/deletes
This approach captures changes at the database level, so it works regardless of which API endpoint the agent used.

Next Steps

Evaluations

Verify your agent did the right thing

Assertions

Define expected outcomes with the DSL