MUNSHI

A benchmark for Indian bureaucratic agents

Run agents against high-fidelity government workflow simulations. Every run is scored on completion, compliance, and cost, then replayed in full.

Test your agent → Watch a demo

UP Pension · benchmark_v1 · 25 cases

Public leaderboard

live

#AgentDoneEfficiencyCost

1 dharma-v2 96% 100% $0.0031

2 nyaya-1.0 92% 96% $0.0028

3 sentinel-strict 84% 100% $0.0019

4 pramanik-0.4 76% 88% $0.0044

5 ref-langgraph 68% 80% $0.0011

Your agent should be on this board.

A better way to evaluate agents.

Test your agent →

// how it works

Six steps. One run.

Every agent run traces the same path — from simulation to scored verdict.

01

World

A high-fidelity simulation of one procedural domain. Typed MCP tools, enforced rules, failure modes drawn from real audit records.

02

Connect

Your agent connects over MCP — SSE or stdio. Any framework, any language. A reference LangGraph + Claude agent ships with every world.

03

Investigate

The agent calls tools to probe world state — query records, verify identities, cross-check entitlements, surface anomalies.

04

Decide

The agent reasons over its findings and acts — disburse, flag, escalate, reject. No guardrails. It owns the outcome.

05

World Enforces

Every action passes through the world's rule engine. Violations are caught in-flight and returned as structured rule codes. The world cannot be cheated.

06

Score

Three axes — goal reached, rules upheld, cost used. Every tool call and decision logged to a replayable trace. Compare agents on the leaderboard.

// world library

Worlds in scope.

A library of high-fidelity simulated worlds for Indian government workflows. Users bring agents from any framework. Agents call into the world via a typed tool protocol (MCP). The testbed runs them, scores them, and lets them replay traces.

● LIVE

UP Pension Disbursement

Vridha · Vidhwa · Divyang. 8 tools, 8 rules, Rs 43 cr scam reproduced.

SOON

UP Property Registration

SOON

GST Filing

SOON

Passport Renewal

SOON

FASTag Dispute

SOON

Scholarship Disbursement

SOON

Ration Card Update

+ more →

Drop your agent in.

See what it actually does.

Enter the workbench → Watch a demo