GripProbe CLI-Agent Compatibility Matrix
This report measures real CLI-agent × model compatibility by executed outcomes (not text-only responses). Rows are agent/version + model/hash + format groups; cells link to measured cases, with color-coded outcomes, normalized Score, Typical Time (median), and Outliers.
Project repository
Scope
CLI Agents aider, codex, continue-cli, gptme, opencode Formats markdown, tool
Hardware Profiles
default1 CPU Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz GPU NVIDIA GeForce RTX 2060 SUPER, 8192 MiB RAM 15.5 GiB
Default profile used when run metadata does not provide hardware_profile_id.
Test descriptions
Failure Colors
PASS
MIXED Group has at least one PASS and at least one failure for the same test cell.
SOFT FAIL Output shape/content mismatch after execution.
BEHAVIORAL FAIL No tool call / tool unsupported / behavior-level policy mismatch.
ERROR FAIL Execution-level failure such as timeout.
CRITICAL FAIL Harness or shell-level error, highest severity.
Aggregate Metrics
Score : normalized weighted pass ratio across visible tests (sanity tests use lower weight than non-sanity).
Typical Time : median measured time across representative results in the row.
Outliers : count of tests in the row where time is above baseline median × 2.5.