Guides
Creating Markets That Score Well
A practical overview of what makes a high-quality prediction market definition, and how to consistently achieve top ratings.
The Core Principle
A well-defined market is one that an LLM agent can resolve unambiguously by reading web pages. Every field in the market definition should work toward that goal: a clear extraction prompt, authoritative sources that will contain the answer, and expected results that cover the realistic outcomes.
1. Write Extraction Prompts, Not Yes/No Questions
The prompt should ask the agent to extract a specific piece of data from the source pages. Avoid yes/no framing — it's less verifiable and more ambiguous.
"Did the Celtics win the 2024 NBA Finals?"
expected_results: ["Yes"] — fragile, no way to verify what "Yes" means
"Who won the 2024 NBA Finals?"
expected_results: ["Boston Celtics", "Dallas Mavericks"] — verifiable, exhaustive
For more detail, see Writing Good Prompts.
2. List All Plausible Outcomes
The expected_results field should contain every realistic outcome the market could resolve to. This gives the resolution agent a clear set of values to match against.
Sports finalList both teams: ["Kansas City Chiefs", "San Francisco 49ers"]
Award ceremonyList all nominees: ["Oppenheimer", "Poor Things", "Killers of the Flower Moon", ...]
Numeric thresholdUse answer_type "number" with a comparison operator: expected_results: [100000], comparison_operator: ">"
3. Choose Sources That Contain the Answer
Sources must be pages the resolution agent can actually fetch and read. They need to contain the specific information required — not just be "about" the topic in general.
For the full breakdown, see Source Selection.
4. Avoid Blocklisted Domains
Some domains are blocklisted due to unreliable data, retroactive content changes, or frequent availability issues. Using a blocklisted source will cause a gate failure and your market will score 0.
Common domains to avoid: sites known for manipulated data, spam domains, and sources with poor reliability track records. Use the Source Blocklisted criterion docs for details.
5. Use Sources With Track Records
Sources that have been used successfully in previously resolved markets score higher on the Source History criterion. Sources with 10+ successful resolutions get full marks; new sources start at 0%.
Stick to well-known, established data providers that others have used before. Official league websites, major news outlets, and authoritative data aggregators tend to have the best history scores.
6. Verify Sources Are Accessible
Every source URL must be reachable and return readable content. If any source fails the Source Reachability gate, your market scores 0.
- >Test each URL in a browser before submitting
- >Supported formats: HTML, JSON, plain text, PDF
- >No login walls, CAPTCHAs, or JavaScript-only rendering
7. Set Realistic Agreement Thresholds
The min_agreement value should balance reliability against practicality. Setting it too high risks resolution failure if one source is down; setting it too low reduces confidence.
| Sources | Recommended min_agreement |
|---|---|
| 2 | 2 |
| 3-4 | 2-3 |
| 5+ | 3-4 (majority) |
8. Get the Timing Right
The resolution window (resolution_start to resolution_end) must cover the period when sources will actually contain the result.
- >Start after the event concludes
- >Allow enough time for sources to publish results (at least a few hours)
- >Don't set resolution_start before the event happens
Each of these principles maps directly to one or more of Caliber's rating criteria. A market that follows all of them will naturally score well across the board. If you're unsure about a specific aspect, the individual criteria pages explain exactly how each dimension is scored and what thresholds to aim for.