Findings (Plain-English)

Market Synthesis was evaluated on 97 historical macro-financial dislocation events.
The system reached 91.8% headline accuracy and 92.8% adjusted accuracy after one label correction.

What worked

  1. Prompt simplification fixed truncation problems.
    Flat JSON with short fields eliminated output cut-offs and improved reliability.
  2. Confidence anchors improved scoring.
    Explicit score bands improved calibration and directional consistency.
  3. Role-specific model assignment improved outcomes.
    Haiku as debater + Sonnet as judge outperformed all-Sonnet in this setup.

Calibration and risk profile

False positives cluster around near-miss stress events, not fully quiet markets.

Where it struggled

Common miss categories:

For full narrative and event-by-event discussion, see the repository analysis: development/results/FINDINGS.md