Skip to main content
POST
/
api
/
platform
/
evaluateRun
Evaluate Run
curl --request POST \
  --url https://api.example.com/api/platform/evaluateRun \
  --header 'Content-Type: application/json' \
  --data '
{
  "runId": "<string>"
}
'
{
  "runId": "<string>",
  "status": "<string>",
  "passed": true,
  "score": {},
  "failures": [
    {}
  ]
}

Request

POST /api/platform/evaluateRun

Body Parameters

runId
string
required
Run ID from startRun

Example Request

curl -X POST https://api.agentdiff.dev/api/platform/evaluateRun \
  -H "X-API-Key: ad_live_sk_..." \
  -H "Content-Type: application/json" \
  -d '{
    "runId": "run-xyz789"
  }'

Response

runId
string
Run identifier
status
string
Final status: "passed" or "failed"
passed
boolean
Whether all assertions passed
score
object
Score details
failures
array
List of failed assertions with details

Example Response (Passed)

{
  "runId": "run-xyz789",
  "status": "passed",
  "passed": true,
  "score": {
    "passed": 2,
    "total": 2,
    "percent": 100.0
  },
  "failures": []
}

Example Response (Failed)

{
  "runId": "run-xyz789",
  "status": "failed",
  "passed": false,
  "score": {
    "passed": 1,
    "total": 2,
    "percent": 50.0
  },
  "failures": [
    "assertion#2 messages expected count 1 but got 0"
  ]
}

What Happens

  1. After snapshot: Current database state is captured
  2. Diff computed: Before/after states compared
  3. Assertions evaluated: Test assertions checked against diff
  4. Results stored: Results saved for later retrieval
  5. Replication stopped: WAL capture ends
Use this when you have a test with assertions. For just getting the diff without assertions, use diffRun instead.

Errors

ErrorStatusDescription
run_not_found404Run doesn’t exist
run_already_completed400Run already evaluated
no_test_defined400No test associated with this run

SDK Usage

result = client.evaluate_run(runId=run.runId)

print(f"Passed: {result.passed}")
print(f"Score: {result.score['percent']}%")
if result.failures:
    for f in result.failures:
        print(f"  Failed: {f}")