Technical Summary
Architecture
Market Synthesis uses adversarial orchestration:
- Advocate (Haiku 4.5) argues for dislocation.
- Counterpoint (Haiku 4.5) argues against.
- Judge (Sonnet 4.5) synthesizes arguments and outputs:
- direction (
long,short,neutral) - confidence score (0–100)
- direction (
Evaluation setup
- Dataset: 97 historical events (1971–2024)
- Additional negatives: 19 quiet-period events
- Thresholding: actionable predictions at high-confidence range
- Metrics: direction accuracy, threshold accuracy, Brier score, FPR on negatives
Run snapshot
From the latest full-run analysis:
| Metric | Value |
|---|---|
| Headline accuracy | 91.8% |
| Adjusted accuracy | 92.8% |
| Brier score | 0.047 |
| Quiet-period FPR (all 19) | 36.8% |
| Quiet-period FPR (boring subset) | ~15% |
Key technical insights
- Output schema design is performance-critical.
Overly nested response formats caused truncation and degraded performance. - Score rubrics improve confidence calibration.
Explicit score anchors reduced under-confidence and improved decision usefulness. - Model personality-role fit matters.
Decisive models for advocacy and nuanced models for adjudication worked best.
Source files
development/detectors/adversarial_detector.pydevelopment/shared/bedrock_client.pydevelopment/tests/backtest.pydevelopment/tests/experiment_runner.pydevelopment/results/run3_analysis.md