Runs
A run represents a single test session within an environment. Starting a run captures a “before” snapshot, and ending it captures an “after” snapshot to compute the diff.
Run Lifecycle
Start Run
Takes a “before” snapshot of the environment state run = client.start_run( envId = env.environmentId)
Agent Execution
Your agent makes API calls that modify the environment
Compute Diff
Compares before/after states to produce a diff result = client.diff_run( runId = run.runId)
Run Properties
Property Description runIdUnique identifier statusrunning, completed, failedbeforeSnapshotSnapshot ID before agent execution afterSnapshotSnapshot ID after agent execution
Diffs
A diff is the computed difference between the before and after states of an environment.
import requests
from agent_diff import AgentDiff
client = AgentDiff()
#Add envs
# or pass explicitly: AgentDiff(api_key="", base_url="https://api.agentdiff.dev")
# Create sandbox.
env = client.init_env( templateService = "slack" , templateName = "slack_default" , impersonateUserId = "U01AGENBOT9" )
run = client.start_run( envId = env.environmentId)
# Post message to #general
response = requests.post(
f " { client.base_url } /api/env/ { env.environmentId } /services/slack/chat.postMessage" ,
headers = { "Authorization" : f "Bearer { client.api_key } " },
json = { "channel" : "C01ABCD1234" , "text" : "Hello!" }
)
# Get diff
diff = client.diff_run( runId = run.runId)
pprint(diff.model_dump())
# Cleanup
client.delete_env( envId = env.environmentId)
Output Structure
{
"inserts" : [
{
"__table__" : "messages" ,
"message_id" : "1732645891.000200" ,
"channel_id" : "C01GENERAL99" ,
"user_id" : "U01AGENBOT9" ,
"message_text" : "Hello World!"
}
],
"updates" : [
{
"__table__" : "channels" ,
"before" : { "last_message_at" : null },
"after" : { "last_message_at" : "2025-11-26T15:31:31" }
}
],
"deletes" : []
}
Diff Types
Inserts New records created by the agent
Updates Existing records modified (shows before/after)
Deletes Records removed by the agent
How Diffs Are Captured
Agent Diff uses PostgreSQL logical replication to capture every change:
WAL Capture : All database writes are logged to the Write-Ahead Log
wal2json : Converts WAL entries to JSON format
Change Journal : Filtered and stored per environment/run
Diff Computation : Aggregated into inserts/updates/deletes
This approach captures changes at the database level, so it works regardless of which API endpoint the agent used.
Next Steps
Evaluations Verify your agent did the right thing
Assertions Define expected outcomes with the DSL