| Shell | Model | Backend | Hash | Format | Test | Status | Reason | Trajectory | Invoked | Match | Warmup (s) | Measured (s) | Details |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| continue-cli | local/gemma4:e2b | ollama | 7fbdbf8f5e45a75bb122155ed546e765b4d9c53a1285f62fd9f506baa1c5a47e | markdown | Web Search JSON Ranked | FAIL | answered without invoking tool | clean | no | 0% | 83.491 | 17.053 | details |
| continue-cli | local/gemma4:e2b | ollama | 7fbdbf8f5e45a75bb122155ed546e765b4d9c53a1285f62fd9f506baa1c5a47e | markdown | Weekly Plan Next Week | PASS | clean | yes | 100% | 13.042 | 27.928 | details |