| Shell | Model | Backend | Hash | Format | Test | Status | Reason | Trajectory | Invoked | Match | Warmup (s) | Measured (s) | Details |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| continue-cli | local/qwen3:8b | ollama | 500a1f067a9f782620b40bee6f7b0c89e17ae61f686b92c24933e4ca4b2b8b41 | markdown | Web Search JSON Ranked | FAIL | answered without invoking tool | clean | no | 0% | 99.187 | 36.447 | details |
| continue-cli | local/qwen3:8b | ollama | 500a1f067a9f782620b40bee6f7b0c89e17ae61f686b92c24933e4ca4b2b8b41 | markdown | Weekly Plan Next Week | PASS | clean | yes | 100% | 29.228 | 88.604 | details |