# Caliber > Institutional-grade ratings for prediction markets Caliber is a market rating system that evaluates prediction market definitions using both static rules and LLM-based semantic analysis. It assigns credit-rating-style grades (Aaa to C) based on market quality criteria. ## Creating High-Quality Markets A well-defined market is one an LLM agent can resolve unambiguously by reading web pages. Follow these principles: ### Write extraction prompts, not yes/no questions Frame prompts to extract a factual value. List all plausible outcomes in expected_results. - Bad: "Did the Celtics win the 2024 NBA Finals?" with expected_results: ["Yes"] - Good: "Who won the 2024 NBA Finals?" with expected_results: ["Boston Celtics", "Dallas Mavericks"] Yes/no markets are a last resort — they are fragile and hard to verify. ### List all plausible outcomes expected_results should cover every realistic result: - Sports final: both teams - Award ceremony: all nominees - Numeric threshold: use answer_type "number" with a comparison_operator ### Choose sources that contain the answer - Provide 4+ sources for the best source_count scores (5+ for maximum) - Sources must be publicly accessible (no paywalls, no login) — source_reachability is a gate criterion, failure = score 0 - Sources must directly contain the specific data needed — not just be "about" the topic (source_relevancy, 20% weight) - Use diverse providers (don't use 4 sites pulling from the same upstream API) - Use stable URLs (no session tokens) - Supported content types: HTML, JSON, plain text, PDF - Test each URL before submitting — if any returns an error, the entire market fails ### Avoid blocklisted domains source_blocklisted is a gate criterion — using a blocklisted domain means score 0. Avoid: .gov domains, reuters.com, bloomberg.com, wsj.com, nytimes.com (paywalled), wikipedia.org, and any domains known for unreliable or manipulated data. ### Use sources with track records Sources that have been used in previously resolved markets score higher on source_history (10% weight). Sources with 10+ successful resolutions get full marks; new/unknown sources start at 0%. Stick to well-known official league sites, major news outlets, and established data aggregators. ### Set realistic agreement thresholds min_agreement should balance reliability against practicality (source_agreement, 20% weight): - Score formula: min(100, (agreement% / 80) × 100) — 80% agreement = perfect score - 2 sources: min_agreement 2 (100% agreement → score 100) - 3-4 sources: min_agreement 2-3 - 5 sources: min_agreement 4 (80% agreement → score 100) ### Get the timing right temporal_soundness (10% weight) checks whether sources will contain the answer during the resolution window: - resolution_start should be after the event concludes - Allow time for sources to publish results - Use sources with date/time indicators in URLs when possible - Don't set resolution_start before the event happens ## API Base URL: https://api.caliberratings.xyz Full OpenAPI specification: https://api.caliberratings.xyz/openapi.yaml ### Endpoints - GET /v0/ratings/bands - List rating bands (Aaa to C with score ranges) - GET /v0/ratings/criteria - List rating criteria and weights - POST /v0/ratings/markets - Rate a market definition - GET /v0/ratings/markets/{id} - Get a previously rated market by ID - GET /v0/ratings/markets/latest - List recent ratings (cursor pagination via ?cursor=&limit=) - GET /v0/ratings/bands/{bandId}/badge.svg - SVG badge for a rating band (?size, ?theme, ?width, ?height) - GET /v0/ratings/markets/{id}/badge.svg - SVG badge for a rated market (?size, ?theme, ?width, ?height) ### Rating Bands | Rating | Score Range | Definition | |--------|-------------|------------| | Aaa | 90-100 | Exceptional definition quality; highly reliable and unambiguous | | Aa | 80-89 | Very strong; minor weaknesses | | A | 70-79 | Strong; some limitations | | Baa | 60-69 | Adequate; moderate ambiguity or risk | | Ba | 50-59 | Speculative; notable weaknesses | | B | 35-49 | Weak; high risk of poor resolution | | Caa | 20-34 | Very weak; high likelihood of problematic resolution | | Ca | 10-19 | Highly unreliable; severe structural issues | | C | 0-9 | Structurally broken or guaranteed to fail resolution | ### Market Definition Schema ```json { "prompt": "string (required) - The extraction question to resolve", "answer_type": "string | number (required)", "expected_results": "string[] | number[] (required) - All plausible outcome values, homogeneous type", "comparison_operator": "== | != | > | < | >= | <= (optional, default ==)", "source_urls": ["string[] (required) - URLs the agent will fetch for resolution data"], "min_agreement": "integer (required) - Minimum sources that must agree", "resolution_start": "ISO datetime (required) - When resolution period begins", "resolution_end": "ISO datetime (required) - When resolution period ends" } ``` ### Rating Criteria Gate criteria (must pass): - source_reachability - URLs must be accessible - source_blocklisted - No blocklisted sources Score criteria (weighted): - source_count (20%) - Number of sources provided (5+ for max score) - source_agreement (20%) - Agreement ratio (80%+ agreement = perfect score) - source_relevancy (20%) - Whether sources contain the specific data needed - source_history (10%) - Track record of source domains in previous markets - prompt_subjectivity (10%) - Prompt clarity, specificity, and objectivity - temporal_soundness (10%) - Whether sources will have data during resolution window ### Rate Limits The POST /v0/ratings/markets endpoint is rate limited to 10 requests per minute per IP address. Exceeding this limit returns a 429 status code. Wait at least 60 seconds before retrying. If you need higher limits, contact @Somnia_Network on X (https://x.com/Somnia_Network). ### Example Request ```bash curl -X POST https://api.caliberratings.xyz/v0/ratings/markets \ -H "Content-Type: application/json" \ -d '{ "prompt": "Who won the 2026 Australian Grand Prix?", "answer_type": "string", "expected_results": ["Oscar Piastri", "Max Verstappen", "Lando Norris"], "comparison_operator": "==", "source_urls": [ "https://www.formula1.com/en/results/2026/races/1279/australia/race-result", "https://racingnews365.com/2026-f1-australian-grand-prix-results", "https://www.bbc.com/sport/formula1/2026-australian-grand-prix-results", "https://www.espn.com/f1/race/_/id/2026-australia" ], "min_agreement": 3, "resolution_start": "2026-03-23T06:00:00Z", "resolution_end": "2026-03-24T00:00:00Z" }' ``` ## Generating Markets with AI To generate market definitions with an LLM, use these three components: ### System Prompt Pass this as the system message: ``` You generate prediction market definitions based on context provided by the user (e.g. a news story, event details, or a topic). Each market will be resolved by an LLM agent that reads web pages and extracts data. Key rules: - **Extraction prompts**: Ask the agent to extract a specific fact. NEVER use yes/no questions. Good: "Who won the 2024 NBA Finals?" with expected_results: ["Boston Celtics", "Dallas Mavericks"] Bad: "Did the Celtics win?" with expected_results: ["Yes"] - **expected_results**: List ALL plausible outcomes, not just one - **source_urls**: Provide 4-5 diverse, publicly accessible sources that will contain the specific answer. No paywalls, no wikipedia. - **min_agreement**: Set to majority of sources (e.g. 4 for 5 sources, targeting 80% agreement) - **answer_type**: "string" for names/categories, "number" for quantities/prices - **comparison_operator**: "==" (equals), "!=" (not equals), ">" (greater than), ">=" (greater than or equal), "<" (less than), "<=" (less than or equal) - **resolution_start/end**: Must be AFTER the event concludes. Allow time for sources to publish results. ``` ### Response Schema Pass this as the JSON schema for structured output: ```json { "type": "object", "properties": { "markets": { "type": "array", "items": { "type": "object", "properties": { "prompt": { "type": "string" }, "answer_type": { "type": "string", "enum": [ "string", "number" ] }, "expected_results": { "oneOf": [ { "type": "array", "items": { "type": "string" }, "minItems": 1 }, { "type": "array", "items": { "type": "number" }, "minItems": 1 } ] }, "comparison_operator": { "type": "string", "enum": [ "==", "!=", ">", ">=", "<", "<=" ] }, "source_urls": { "type": "array", "items": { "type": "string" }, "minItems": 4 }, "min_agreement": { "type": "number" }, "resolution_start": { "type": "string" }, "resolution_end": { "type": "string" } }, "required": [ "prompt", "answer_type", "expected_results", "comparison_operator", "source_urls", "min_agreement", "resolution_start", "resolution_end" ] } } }, "required": [ "markets" ] } ``` ### User Prompt Replace the placeholder with your context (a news article, event details, etc.) and send as the user message: ``` Generate one or more prediction market definitions based on the following context. Return them as a JSON array under a "markets" key. {{PASTE YOUR CONTEXT HERE — e.g. a news article, event details, or topic description}} ``` ## Website The website provides an interactive form to test market definitions and view detailed rating breakdowns with criteria scores and evidence.