Skip to content

Judge HTTP API

Judge 服务器提供 REST API,用于提交代码归档、轮询评测结果、以及管理交互式游戏会话。通过 sforge serve --port 8080 启动。

基础 URL

http://localhost:8080

所有端点均以 /api/v1 为前缀。

认证机制

Judge API 使用基于 Token 的会话模型。在提交之前,客户端必须先注册会话以获取 Token。Token 编码了任务 ID、运行 ID 和自增的轮次计数器。

接口列表

任务发现

列出从 tasks/ 目录加载的所有可用任务。

http
GET /api/v1/tasks

响应:

json
[
  {"task_id": "ad_placement_optimization", "name": "Ad Placement Optimization"},
  {"task_id": "gitlet", "name": "Gitlet"},
  {"task_id": "tinykv", "name": "TinyKV"}
]

会话注册

注册新会话并获取 Token,用于后续提交。

http
POST /api/v1/register
Content-Type: application/json

{
  "task_id": "ad_placement_optimization",
  "run_id": "run-001"
}

响应:

json
{
  "token": "a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4"
}

Token 是一个 32 字符的十六进制字符串,内部维护任务 ID、运行 ID 以及两个独立的计数器:Agent 提交(agent-1agent-2、...)和自动评测提交(auto-1auto-2、...)。

提交归档

提交代码归档进行评测。服务器从 Token 中解析任务 ID 和运行 ID,并分配递增的轮次 ID。

http
POST /api/v1/submit
Content-Type: multipart/form-data

token: a1b2c3d4e5f6a1b2c3d4e5f6a1b2c3d4
archive: @solution.tar.gz
kind: agent
字段类型必填说明
tokenstring/register 获取的会话 Token
archivefile.tar.gz 格式的解决方案归档
kindstring"agent"(默认)或 "auto"。决定递增哪个轮次计数器。

响应:

json
{
  "submission_id": "a1b2c3d4e5f6",
  "round_id": "agent-1",
  "status": "queued"
}

提交异步处理——服务器立即返回 status: "queued"。通过 result 接口轮询结果。

获取结果

轮询提交的评测结果。

http
GET /api/v1/result/{submission_id}

响应(排队中/运行中):

json
{
  "submission_id": "a1b2c3d4e5f6",
  "status": "running",
  "report": null,
  "error": null
}

响应(已完成):

json
{
  "submission_id": "a1b2c3d4e5f6",
  "status": "completed",
  "report": {
    "total_tests": 50,
    "passed": 45,
    "failed": 5,
    "errors": 0,
    "pass_rate": 0.9,
    "score": null,
    "valid": true,
    "summary": "45/50 tests passed. Failed: test_forward, test_backward, ...",
    "test_details": [...]
  },
  "error": null
}

响应(错误):

json
{
  "submission_id": "a1b2c3d4e5f6",
  "status": "error",
  "report": null,
  "error": "Container exited with code 137 (OOM killed)"
}

状态值:

状态说明
queued已接收提交,等待处理
runningJudge 容器正在运行测试
completed评测完成,报告可用
error评测失败(容器崩溃、超时等)

运行历史

获取某次运行会话的完整提交历史,包括最佳分数的选定结果。

通过 Token 查询:

http
GET /api/v1/history?token=<token>

通过运行 ID 查询:

http
GET /api/v1/runs/{run_id}/history?task_id=<task_id>

响应:

json
{
  "run_id": "run-001",
  "best_pass_rate": 0.95,
  "best_score": null,
  "best_round": "agent-3",
  "agent_submissions": 4,
  "auto_submissions": 2,
  "entries": [
    {
      "type": "submission",
      "status": "completed",
      "submission_id": "abc123",
      "task_id": "ad_placement_optimization",
      "round": "agent-1",
      "pass_rate": 0.8,
      "score": null,
      "passed": 40,
      "failed": 10,
      "total_tests": 50,
      "valid": true,
      "summary": "40/50 tests passed. Failed: ..."
    }
  ]
}

best_* 字段根据任务配置的选择策略计算(参见Benchmark 与任务接入)。

游戏接口

对于 game_mode: true 的任务,Judge 服务器在专用容器中管理交互式游戏会话。

创建新游戏

http
POST /api/v1/game/{run_id}/{task_id}/new
Content-Type: application/json
{}

响应:

json
{
  "session_id": "abc123def456",
  "observation": "You are standing in a dark room...",
  "score": 0,
  "peak_score": 0,
  "max_score": 100,
  "done": false,
  "moves": 0
}

执行动作

http
POST /api/v1/game/{run_id}/{task_id}/{session_id}/step
Content-Type: application/json

{"action": "go north"}

响应:

json
{
  "session_id": "abc123def456",
  "observation": "You enter a dimly lit hallway...",
  "score": 10,
  "peak_score": 10,
  "max_score": 100,
  "done": false,
  "moves": 1
}

查询状态

http
GET /api/v1/game/{run_id}/{task_id}/{session_id}/status

响应:

json
{
  "session_id": "abc123def456",
  "score": 10,
  "peak_score": 10,
  "max_score": 100,
  "done": false,
  "moves": 1
}

关闭会话

http
POST /api/v1/game/{run_id}/{task_id}/{session_id}/close

响应:

json
{
  "session_id": "abc123def456",
  "final_score": 25,
  "peak_score": 30,
  "max_score": 100,
  "moves": 15
}

关闭所有会话

关闭指定运行和任务的所有活跃游戏会话。

http
POST /api/v1/game/{run_id}/{task_id}/close-all

响应:

json
{"closed": 3}

游戏会话有 10 分钟的空闲超时。当游戏结束(done: true)或被显式关闭时,会话会自动归档。系统最多支持 200 个并发游戏会话。

curl 示例

注册、提交和轮询

bash
# 1. 注册会话
TOKEN=$(curl -s -X POST http://localhost:8080/api/v1/register \
  -H "Content-Type: application/json" \
  -d '{"task_id": "ad_placement_optimization", "run_id": "run-001"}' \
  | jq -r '.token')

echo "Token: $TOKEN"

# 2. 提交归档
SUBMISSION_ID=$(curl -s -X POST http://localhost:8080/api/v1/submit \
  -F "token=$TOKEN" \
  -F "archive=@solution.tar.gz" \
  -F "kind=agent" \
  | jq -r '.submission_id')

echo "Submission: $SUBMISSION_ID"

# 3. 轮询结果(重复直到 status 为 "completed" 或 "error")
curl -s http://localhost:8080/api/v1/result/$SUBMISSION_ID | jq .

# 4. 查看运行历史
curl -s "http://localhost:8080/api/v1/history?token=$TOKEN" | jq .

轮询循环

bash
while true; do
  STATUS=$(curl -s http://localhost:8080/api/v1/result/$SUBMISSION_ID | jq -r '.status')
  echo "Status: $STATUS"
  if [ "$STATUS" = "completed" ] || [ "$STATUS" = "error" ]; then
    curl -s http://localhost:8080/api/v1/result/$SUBMISSION_ID | jq .
    break
  fi
  sleep 5
done