Supported Agents
SForge manages different Code Agents through a plugin-style agent registry. To run EdgeBench evaluations, simply pick a built-in agent (e.g., claude-code) and provide an API key.
Agent Registry
| Agent | CLI Name | API Key Env | Model Env | Default Model | Stop Hook | Auto-Resume |
|---|---|---|---|---|---|---|
| Claude Code | claude-code | ANTHROPIC_AUTH_TOKEN | ANTHROPIC_MODEL | -- | Yes | Yes |
| Codex | codex | CODEX_API_KEY | CODEX_MODEL | -- | Yes | Yes |
Use the --agent flag to select an Agent:
sforge run --task ad_placement_optimization --agent claude-code
sforge run --task ad_placement_optimization --agent codexAgent Configuration
API Key
Set via the SFORGE_AGENT_API_KEY environment variable. The harness maps this to the agent-specific env var (e.g., ANTHROPIC_AUTH_TOKEN for Claude Code, CODEX_API_KEY for Codex):
SFORGE_AGENT_API_KEY="sk-xxxx" sforge run --task ad_placement_optimization --agent claude-codeAPI Endpoint
Set a custom API endpoint with SFORGE_AGENT_API_BASE_URL (for proxies or private deployments):
SFORGE_AGENT_API_BASE_URL="https://your-proxy.com/v1" \
SFORGE_AGENT_API_KEY="sk-xxxx" \
sforge run --task ad_placement_optimization --agent claude-codeModel Override
Override the model using the --model CLI flag or the SFORGE_AGENT_MODEL environment variable:
sforge run --task ad_placement_optimization --agent claude-code --model claude-opus-4-8Extra Environment Variables
Pass additional environment variables to the agent container using SFORGE_AGENT_EXTRA_ENV:
SFORGE_AGENT_EXTRA_ENV="KEY1=VAL1,KEY2=VAL2" \
sforge run --task ad_placement_optimization --agent claude-codeHow Agents Work
The complete Agent lifecycle is:
- Create container: create a Docker container from the Work image and inject environment variables such as API key and Judge URL
- Install Agent runtime: run the Agent's
install_cmdsinside the container, such as installing Node.js or npm packages - Install evaluation tools:
sforge-submit: installed into/usr/local/bin/; the Agent calls this command to submit code- Stop Hook (optional): prevents the Agent from exiting too early
- Auto-eval daemon (optional): periodically evaluates in the background
- Generate enhanced prompt: combine the original task description with evaluation instructions and strategy suggestions
- Run Agent: execute the Agent's
run_cmd; the Agent starts working inside the container - Collect results: after the Agent times out or finishes, read the best score from the state file
During the process, the Agent can call sforge-submit at any time to submit code and receive feedback. The background auto-eval daemon also periodically submits evaluations.
Stop Hook
Stop Hook is an important SForge mechanism that prevents the Agent from exiting too early.
How It Works
When an Agent such as Claude Code decides the task is complete and tries to exit, the Stop Hook intercepts the exit request and returns a blocking signal asking the Agent to keep working. This ensures that the Agent can fully use the allocated time and continue improving the code.
Supported Agents
| Agent | Hook Type | Description |
|---|---|---|
claude-code | Claude Code Stop Hook | Registered via .claude/settings.json |
codex | Codex Stop Hook | Registered via /etc/codex/hooks.json |
Disabling the Stop Hook
If you do not need Stop Hook, for example during debugging, disable it with --disable-stop-hook:
sforge run --task ad_placement_optimization --agent claude-code --disable-stop-hookAuto-Resume
Auto-resume handles abnormal agent exits (API disconnects, transient errors, etc.) that bypass the stop hook. When the agent process dies unexpectedly before the timeout, the harness automatically re-launches it using the agent's native session resume mechanism.
How It Works
- Agent exits abnormally (not due to timeout)
- Harness detects the early exit
- Agent is re-launched with its resume command (e.g.,
claude --continuefor Claude Code) - The remaining timeout budget is passed to the resumed session
- The agent picks up from its last conversation state and continues working
Supported Agents
| Agent | Resume Mechanism |
|---|---|
claude-code | claude --continue -p "Continue working." |
codex | codex exec resume --last "Continue working." |
Safety Guards
- If the agent exits in under 1 second, the harness assumes a systematic failure and stops retrying
- Maximum of 100 resume attempts per run
Disabling Auto-Resume
sforge run --task ad_placement_optimization --agent claude-code --disable-auto-resumeUsing Third-Party Models
Both Claude Code and Codex support third-party models for evaluation — point SFORGE_AGENT_API_BASE_URL at a compatible API endpoint and set the model name with --model.
Claude Code has internal multi-tier model routing (opus/sonnet/haiku tiers, subagent calls) and context window management, so using it with a third-party model requires additional configuration for cache optimization, model routing variables, and context window settings. See Single Task (Docker) — Using a Third-Party Model for details.
Custom Agents
SForge currently supports one way to add a custom agent: implement it in the SForge source tree and register it in the agent factory. There is no separate runtime plugin/module loading path.
Steps
- Create a new file under
sforge/harness/agent/, for examplemy_agent.py. - Define a class that extends
Agentfromsforge.harness.agent.base. - Set the required class attributes:
name-- CLI name used by--agentinstall_cmds-- commands run inside the work container to install the agent runtimerun_cmd-- command template used to start the agent; it should read{prompt_file}api_key_env-- environment variable expected by the agent for its API key
- Optionally set
api_base_env,model_env,default_model,stop_hook, andresume_cmd. - Register the class in
sforge/harness/agent/factory.pyby adding it to_REGISTRY.
Minimal example:
from sforge.harness.agent.base import Agent
class MyAgent(Agent):
name = "my-agent"
install_cmds = [
"pip install my-agent-cli",
]
run_cmd = 'my-agent run --prompt "$(cat {prompt_file})"'
api_key_env = "MY_AGENT_API_KEY"
model_env = "MY_AGENT_MODEL"
resume_cmd = 'my-agent resume --prompt "Continue working."'Then register it:
from sforge.harness.agent.my_agent import MyAgent
_REGISTRY = {
# ... existing agents ...
"my-agent": MyAgent,
}After registration, run it with:
SFORGE_AGENT_API_KEY="..." sforge run --task ad_placement_optimization --agent my-agentIf your agent needs custom environment mapping, command formatting, stop-hook installation, or resume behavior, override the corresponding methods on Agent such as augment_env(), format_run_cmd(), or install_stop_hook().