Guides

Creating Markets That Score Well

A practical overview of what makes a high-quality prediction market definition, and how to consistently achieve top ratings.


The Core Principle

A well-defined market is one that an LLM agent can resolve unambiguously by reading web pages. Every field in the market definition should work toward that goal: a clear extraction prompt, authoritative sources that will contain the answer, and expected results that cover the realistic outcomes.

1. Write Extraction Prompts, Not Yes/No Questions

The prompt should ask the agent to extract a specific piece of data from the source pages. Avoid yes/no framing — it's less verifiable and more ambiguous.

Weak

"Did the Celtics win the 2024 NBA Finals?"

expected_results: ["Yes"] — fragile, no way to verify what "Yes" means

Strong

"Who won the 2024 NBA Finals?"

expected_results: ["Boston Celtics", "Dallas Mavericks"] — verifiable, exhaustive

For more detail, see Writing Good Prompts.

2. List All Plausible Outcomes

The expected_results field should contain every realistic outcome the market could resolve to. This gives the resolution agent a clear set of values to match against.

Sports final

List both teams: ["Kansas City Chiefs", "San Francisco 49ers"]

Award ceremony

List all nominees: ["Oppenheimer", "Poor Things", "Killers of the Flower Moon", ...]

Numeric threshold

Use answer_type "number" with a comparison operator: expected_results: [100000], comparison_operator: ">"

3. Choose Sources That Contain the Answer

Sources must be pages the resolution agent can actually fetch and read. They need to contain the specific information required — not just be "about" the topic in general.

>4+ sources for the highest source count scores
>No paywalls — the agent cannot log in
>Diverse providers — don't use 4 sites that all pull from the same API
>Stable URLs — no session tokens or temporary links

For the full breakdown, see Source Selection.

4. Avoid Blocklisted Domains

Some domains are blocklisted due to unreliable data, retroactive content changes, or frequent availability issues. Using a blocklisted source will cause a gate failure and your market will score 0.

Common domains to avoid: sites known for manipulated data, spam domains, and sources with poor reliability track records. Use the Source Blocklisted criterion docs for details.

5. Use Sources With Track Records

Sources that have been used successfully in previously resolved markets score higher on the Source History criterion. Sources with 10+ successful resolutions get full marks; new sources start at 0%.

Stick to well-known, established data providers that others have used before. Official league websites, major news outlets, and authoritative data aggregators tend to have the best history scores.

6. Verify Sources Are Accessible

Every source URL must be reachable and return readable content. If any source fails the Source Reachability gate, your market scores 0.

  • >Test each URL in a browser before submitting
  • >Supported formats: HTML, JSON, plain text, PDF
  • >No login walls, CAPTCHAs, or JavaScript-only rendering

7. Set Realistic Agreement Thresholds

The min_agreement value should balance reliability against practicality. Setting it too high risks resolution failure if one source is down; setting it too low reduces confidence.

SourcesRecommended min_agreement
22
3-42-3
5+3-4 (majority)

8. Get the Timing Right

The resolution window (resolution_start to resolution_end) must cover the period when sources will actually contain the result.

  • >Start after the event concludes
  • >Allow enough time for sources to publish results (at least a few hours)
  • >Don't set resolution_start before the event happens

Each of these principles maps directly to one or more of Caliber's rating criteria. A market that follows all of them will naturally score well across the board. If you're unsure about a specific aspect, the individual criteria pages explain exactly how each dimension is scored and what thresholds to aim for.