Prime Intellect Verifiers

Overview

Agent Diff integrates with Prime Intellect’s verifiers framework for multi-turn agent evaluation. This lets you create reproducible benchmarks that evaluate agents on real API interactions.

Quick Start

Install our Linear API benchmark from the Prime Intellect hub:

prime env install hubert-marek/linear-api-bench

Run evaluations with any model:

AGENTDIFF_API_KEY="your_key" vf-eval hubert-marek/linear-api-bench -m gpt-5-mini

Results are saved to outputs/ and viewable with:

vf-tui outputs/evals/linear-api-bench--gpt-5-mini/latest

Example: Linear API Benchmark

See our reference implementation:

Environment: hubert-marek/linear-api-bench
Dataset: hubertmarek/linear-bench
Source: GitHub

Next Steps

OpenAI Agents

Direct integration without verifiers

Test Suites

Create custom assertion suites

Getting Started

Core Concepts

Integrations

Training Integrations

Code Executors

Self-Hosting

Prime Intellect Verifiers

Overview

Quick Start

Example: Linear API Benchmark

Next Steps

OpenAI Agents

Test Suites

Getting Started

Core Concepts

Integrations

Training Integrations

Code Executors

Self-Hosting

​Overview

​Quick Start

​Example: Linear API Benchmark

​Next Steps

OpenAI Agents

Test Suites

Overview

Quick Start

Example: Linear API Benchmark

Next Steps