Generated 2026-06-13T02:13:09.662Z · window 30d (2026-05-14T02:13:09.662Z → 2026-06-13T02:13:09.662Z) · schema v1.1
Gate 1 build FAILING — fix before dogfood metrics matter
| Criterion | Status | Current | Target |
|---|---|---|---|
| 1. Extraction works | PASS | 79 live organic (0 extract-stored events, 79 total organic entries) | >= 5 organic entries |
| 2. Dedup / hygiene works | PASS | OK | 0 exact duplicates, 0 placeholder entries |
| 3. Interception fires | PASS | 3675 this week | >= 10/week |
| 4. Interception accurate | PASS | 92% precision (11/12 classified surfaced hints, 49/3675 surfaced total) | >= 70% of classified surfaced hints are relevant |
| 5. Non-blocking | PASS | OK (timeout enforced) | 100% < 3s |
| 7. Evolution works | PASS | 1 T0 + 1 probationary T1 + 2 mature T1 | >= 1 principle |
| 8. Memory shrinks | PASS | Yes — evolution reduced entries | Entries decrease after evolution |
| 9. Novel coverage | PASS | 1 principle(s) with novel-case evidence (1/1 holdout matches) | >= 1 principle matches unseen case |
| Criterion | Status | Current | Target |
|---|---|---|---|
| 6. Error recurrence drops | FAIL | Insufficient data (need 2+ weeks of mistake telemetry) | >= 30% reduction |
| 10. Cost stable | FAIL | Insufficient data (need 2+ weeks of cost telemetry) | No material increase (>10%) |
| Q1 mistake avoidance | PENDING |
| Q2 novel coverage | PENDING |
| Q3 memory shrinks | PENDING |
| Q4 auto-narrow | PENDING |
Precision over events since the pre-surface relevance gate went live (2026-06-09T14:26:45.929Z). Sidesteps the 7-day rolling window so a fix is visible in ~1–2 days.
| Precision | 88.0% |
| Relevant | 88 |
| Irrelevant | 12 |
| Classified | 100 / 30 min |
| Gate drops | 32752 (5251 events) |
| Reason | Count |
|---|---|
| wrong_task | 7 |
| wrong_repo | 3 |
| wrong_language | 2 |
| Tier | Count | % |
|---|---|---|
| T0 — New (never surfaced) | 44 | 11.5% |
| T1 — Bootstrap (surfaced 1-3x) | 37 | 9.7% |
| T2 — Active (has follows) | 43 | 11.3% |
| T3 — Dying (surfaced >3x, 0 follows) | 257 | 67.5% |
| Type | Count | % |
|---|---|---|
| runtime | 129 | 33.9% |
| other | 71 | 18.6% |
| user-correction | 69 | 18.1% |
| log | 53 | 13.9% |
| test | 46 | 12.1% |
| review | 12 | 3.1% |
| workflow | 1 | 0.3% |
| Metric | Coverage | % |
|---|---|---|
| project_slug | 321/381 | 84.3% |
| structured conditions | 253/381 | 66.4% |
| lang (not "all") | 259/381 | 68.0% |
| judgment | 381/381 | 100.0% |
| Collection | Total | T0 | T1 | T2 | T3 | Top type |
|---|---|---|---|---|---|---|
| principles | 1 | 0 | 0 | 1 | 0 | workflow (1) |
| behavioral | 44 | 1 | 2 | 8 | 33 | user-correction (39) |
| selfqa | 336 | 43 | 35 | 34 | 224 | runtime (128) |
| Band | Surfaced | Followed | Ignored | Noise | Precision |
|---|---|---|---|---|---|
| 0.50-0.65 | 20 | 2 | 1 | 2 | 60.0% |
| 0.65-0.70 | 30 | 8 | 0 | 1 | 88.9% |
| 0.70-0.75 | 93 | 17 | 2 | 0 | 100.0% |
| 0.75-0.80 | 578 | 8 | 4 | 1 | 92.3% |
| 0.85-1.00 | 78 | 0 | 1 | 0 | 100.0% |
| Collection | Surfaced | Followed | Ignored | Noise | Precision |
|---|---|---|---|---|---|
| experience-behavioral | 595 | 8 | 5 | 1 | 92.9% |
| experience-selfqa | 164 | 43 | 4 | 6 | 88.7% |
| experience-principles | 78 | 0 | 1 | 0 | 100.0% |
| static-rules | 0 | 0 | 0 | 2 | 0.0% |
| Framework | Surfaced | Followed | Ignored | Noise | Precision |
|---|---|---|---|---|---|
| unknown | 709 | 41 | 8 | 6 | 89.1% |
| any | 109 | 5 | 2 | 3 | 70.0% |
| react | 8 | 4 | 0 | 0 | 100.0% |
| muonroi-ui-engine | 7 | 1 | 0 | 0 | 100.0% |
| next | 2 | 0 | 0 | 0 | — |
| muonroi-dotnet | 1 | 0 | 0 | 0 | — |
| dotnet | 1 | 0 | 0 | 0 | — |
| Runtime | Surfaced | Followed | Ignored | Noise | Precision |
|---|---|---|---|---|---|
| api | 745 | 22 | 8 | 2 | 93.8% |
| unknown | 56 | 1 | 0 | 2 | 33.3% |
| codex-windows | 26 | 28 | 2 | 5 | 85.7% |
| claude-code | 10 | 0 | 0 | 0 | — |
| Reason | Count |
|---|---|
| wrong_task | 6 |
| wrong_language | 1 |
| wrong_repo | 2 |
| Window | Surfaced | Followed | Ignored | Noise | No response |
|---|---|---|---|---|---|
| 7d | 837 | 51 | 10 | 9 | 767 |
| 30d | 837 | 51 | 10 | 9 | 767 |
| ID | Coll | Tier | Conf | Surf | Ignore% | Framework | Last noise reasons | Principle |
|---|---|---|---|---|---|---|---|---|
| 482890cb | behavioral | T1 | 78.0% | 2045 | 100.0% | — | — | Use Sonnet for GSD execute agents; reserve Opus for verify/r… |
| b74ed6d0 | behavioral | T1 | 78.0% | 1830 | 100.0% | — | — | Sub-agents (Sonnet/Opus general-purpose) sometimes hang at f… |
| 5cd47239 | selfqa | T2 | 70.0% | 1826 | 100.0% | — | — | Human-centric agent personalization — personality model from… |
| a5c9dcf1 | selfqa | T2 | 70.0% | 1168 | 100.0% | — | — | Key findings from v1 pipeline — agents need real tool access… |
| c1463ff1 | behavioral | T1 | 78.0% | 1106 | 100.0% | — | — | User wants GSD skills executed automatically without confirm… |
| ae235b4b | selfqa | T2 | 70.0% | 883 | 100.0% | — | — | Experience Engine current state — Model Router, why+scope sc… |
| 07061c01 | selfqa | T2 | 70.0% | 775 | 100.0% | — | — | Need to create MRuleFlowExecuteController in muonroi-buildin… |
| 630548cd | selfqa | T2 | 70.0% | 678 | 100.0% | — | — | Milestone v1.5 — 2 test projects (Service+Aggregate) with 3 … |
| 597f184e | selfqa | T2 | 70.0% | 581 | 100.0% | — | — | 4-repo open-core SaaS topology under D:\sources\Core — premi… |
| 5bba43ac | selfqa | T2 | 70.0% | 496 | 100.0% | — | — | The nested muonroi-building-block has its OWN .planning (a d… |
| 9df5692b | selfqa | T2 | 70.0% | 319 | 100.0% | — | — | In this GSD workspace, code lives in the nested muonroi-buil… |
| 820ceb97 | behavioral | T1 | 78.0% | 218 | 100.0% | — | — | D:/sources/Core is a workspace folder — each sub-directory i… |
| 0fd9e018 | selfqa | T2 | 70.0% | 193 | 100.0% | — | — | Core Architectural & Domain Decisions - Tab-based Port Area … |
| 270911d9 | behavioral | T1 | 78.0% | 148 | 100.0% | — | — | test gemini rule |
| 6ff5efeb | selfqa | T2 | 70.0% | 131 | 100.0% | — | — | Correct path to edit/commit tcis.tos.common is D:\\sources\\… |
| e9e823bc | selfqa | T2 | 70.0% | 124 | 100.0% | — | — | Muonroi.Pdf product charter — commercial open-core HTML/CSS→… |
| 6520e3c5 | behavioral | T1 | 78.0% | 98 | 100.0% | — | unspecified | Prefer dispatching plan-writing, research, and other large-c… |
| 7cd0dc3f | behavioral | T1 | 78.0% | 93 | 100.0% | — | — | Strict rules for E2E testing muonroi UI engine library via e… |
| 79be8b3c | behavioral | T1 | 78.0% | 82 | 100.0% | — | — | User requires deep understanding of Muonroi ecosystem before… |
| fcde19e6 | selfqa | T2 | 70.0% | 76 | 100.0% | — | — | E2E UI/UX audit findings from Playwright capture — 13 issues… |
Per-session hint trace. Click a session row to expand its hints, click a hint to see the tool action(s) that triggered it.