# Caliber

> Institutional-grade ratings for prediction markets

Caliber is a market rating system that evaluates prediction market definitions using both static rules and LLM-based semantic analysis. It assigns credit-rating-style grades (Aaa to C) based on market quality criteria.

## Creating High-Quality Markets

A well-defined market is one an LLM agent can resolve unambiguously by reading web pages. Follow these principles:

### Write extraction prompts, not yes/no questions

Frame prompts to extract a factual value. List all plausible outcomes in expected_results.

- Bad: "Did the Celtics win the 2024 NBA Finals?" with expected_results: ["Yes"]
- Good: "Who won the 2024 NBA Finals?" with expected_results: ["Boston Celtics", "Dallas Mavericks"]

Yes/no markets are a last resort — they are fragile and hard to verify.

### List all plausible outcomes

expected_results should cover every realistic result:
- Sports final: both teams
- Award ceremony: all nominees
- Numeric threshold: use answer_type "number" with a comparison_operator

### Choose sources that contain the answer

- Provide 4+ sources for the best source_count scores (5+ for maximum)
- Sources must be publicly accessible (no paywalls, no login) — source_reachability is a gate criterion, failure = score 0
- Sources must directly contain the specific data needed — not just be "about" the topic (source_relevancy, 20% weight)
- Use diverse providers (don't use 4 sites pulling from the same upstream API)
- Use stable URLs (no session tokens)
- Supported content types: HTML, JSON, plain text, PDF
- Test each URL before submitting — if any returns an error, the entire market fails

### Avoid blocklisted domains

source_blocklisted is a gate criterion — using a blocklisted domain means score 0.
Avoid: .gov domains, reuters.com, bloomberg.com, wsj.com, nytimes.com (paywalled), wikipedia.org, and any domains known for unreliable or manipulated data.

### Use sources with track records

Sources that have been used in previously resolved markets score higher on source_history (10% weight). Sources with 10+ successful resolutions get full marks; new/unknown sources start at 0%. Stick to well-known official league sites, major news outlets, and established data aggregators.

### Set realistic agreement thresholds

min_agreement should balance reliability against practicality (source_agreement, 20% weight):
- Score formula: min(100, (agreement% / 80) × 100) — 80% agreement = perfect score
- 2 sources: min_agreement 2 (100% agreement → score 100)
- 3-4 sources: min_agreement 2-3
- 5 sources: min_agreement 4 (80% agreement → score 100)

### Get the timing right

temporal_soundness (10% weight) checks whether sources will contain the answer during the resolution window:
- resolution_start should be after the event concludes
- Allow time for sources to publish results
- Use sources with date/time indicators in URLs when possible
- Don't set resolution_start before the event happens

## API

Base URL: https://api.caliberratings.xyz
Full OpenAPI specification: https://api.caliberratings.xyz/openapi.yaml

### Endpoints

- GET /v0/ratings/bands - List rating bands (Aaa to C with score ranges)
- GET /v0/ratings/criteria - List rating criteria and weights
- POST /v0/ratings/markets - Rate a market definition
- GET /v0/ratings/markets/{id} - Get a previously rated market by ID
- GET /v0/ratings/markets/latest - List recent ratings (cursor pagination via ?cursor=&limit=)
- GET /v0/ratings/bands/{bandId}/badge.svg - SVG badge for a rating band (?size, ?theme, ?width, ?height)
- GET /v0/ratings/markets/{id}/badge.svg - SVG badge for a rated market (?size, ?theme, ?width, ?height)

### Rating Bands

| Rating | Score Range | Definition |
|--------|-------------|------------|
| Aaa | 90-100 | Exceptional definition quality; highly reliable and unambiguous |
| Aa | 80-89 | Very strong; minor weaknesses |
| A | 70-79 | Strong; some limitations |
| Baa | 60-69 | Adequate; moderate ambiguity or risk |
| Ba | 50-59 | Speculative; notable weaknesses |
| B | 35-49 | Weak; high risk of poor resolution |
| Caa | 20-34 | Very weak; high likelihood of problematic resolution |
| Ca | 10-19 | Highly unreliable; severe structural issues |
| C | 0-9 | Structurally broken or guaranteed to fail resolution |

### Market Definition Schema

```json
{
  "prompt": "string (required) - The extraction question to resolve",
  "answer_type": "string | number (required)",
  "expected_results": "string[] | number[] (required) - All plausible outcome values, homogeneous type",
  "comparison_operator": "== | != | > | < | >= | <= (optional, default ==)",
  "source_urls": ["string[] (required) - URLs the agent will fetch for resolution data"],
  "min_agreement": "integer (required) - Minimum sources that must agree",
  "resolution_start": "ISO datetime (required) - When resolution period begins",
  "resolution_end": "ISO datetime (required) - When resolution period ends"
}
```

### Rating Criteria

Gate criteria (must pass):
- source_reachability - URLs must be accessible
- source_blocklisted - No blocklisted sources

Score criteria (weighted):
- source_count (20%) - Number of sources provided (5+ for max score)
- source_agreement (20%) - Agreement ratio (80%+ agreement = perfect score)
- source_relevancy (20%) - Whether sources contain the specific data needed
- source_history (10%) - Track record of source domains in previous markets
- prompt_subjectivity (10%) - Prompt clarity, specificity, and objectivity
- temporal_soundness (10%) - Whether sources will have data during resolution window

### Rate Limits

The POST /v0/ratings/markets endpoint is rate limited to 10 requests per minute per IP address. Exceeding this limit returns a 429 status code. Wait at least 60 seconds before retrying. If you need higher limits, contact @Somnia_Network on X (https://x.com/Somnia_Network).

### Example Request

```bash
curl -X POST https://api.caliberratings.xyz/v0/ratings/markets \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Who won the 2026 Australian Grand Prix?",
    "answer_type": "string",
    "expected_results": ["Oscar Piastri", "Max Verstappen", "Lando Norris"],
    "comparison_operator": "==",
    "source_urls": [
      "https://www.formula1.com/en/results/2026/races/1279/australia/race-result",
      "https://racingnews365.com/2026-f1-australian-grand-prix-results",
      "https://www.bbc.com/sport/formula1/2026-australian-grand-prix-results",
      "https://www.espn.com/f1/race/_/id/2026-australia"
    ],
    "min_agreement": 3,
    "resolution_start": "2026-03-23T06:00:00Z",
    "resolution_end": "2026-03-24T00:00:00Z"
  }'
```

## Generating Markets with AI

To generate market definitions with an LLM, use these three components:

### System Prompt

Pass this as the system message:

```
You generate prediction market definitions based on context provided by the user (e.g. a news story, event details, or a topic).

Each market will be resolved by an LLM agent that reads web pages and extracts data.

Key rules:
- **Extraction prompts**: Ask the agent to extract a specific fact. NEVER use yes/no questions.
  Good: "Who won the 2024 NBA Finals?" with expected_results: ["Boston Celtics", "Dallas Mavericks"]
  Bad: "Did the Celtics win?" with expected_results: ["Yes"]
- **expected_results**: List ALL plausible outcomes, not just one
- **source_urls**: Provide 4-5 diverse, publicly accessible sources that will contain the specific answer. No paywalls, no wikipedia.
- **min_agreement**: Set to majority of sources (e.g. 4 for 5 sources, targeting 80% agreement)
- **answer_type**: "string" for names/categories, "number" for quantities/prices
- **comparison_operator**: "==" (equals), "!=" (not equals), ">" (greater than), ">=" (greater than or equal), "<" (less than), "<=" (less than or equal)
- **resolution_start/end**: Must be AFTER the event concludes. Allow time for sources to publish results.
```

### Response Schema

Pass this as the JSON schema for structured output:

```json
{
  "type": "object",
  "properties": {
    "markets": {
      "type": "array",
      "items": {
        "type": "object",
        "properties": {
          "prompt": {
            "type": "string"
          },
          "answer_type": {
            "type": "string",
            "enum": [
              "string",
              "number"
            ]
          },
          "expected_results": {
            "oneOf": [
              {
                "type": "array",
                "items": {
                  "type": "string"
                },
                "minItems": 1
              },
              {
                "type": "array",
                "items": {
                  "type": "number"
                },
                "minItems": 1
              }
            ]
          },
          "comparison_operator": {
            "type": "string",
            "enum": [
              "==",
              "!=",
              ">",
              ">=",
              "<",
              "<="
            ]
          },
          "source_urls": {
            "type": "array",
            "items": {
              "type": "string"
            },
            "minItems": 4
          },
          "min_agreement": {
            "type": "number"
          },
          "resolution_start": {
            "type": "string"
          },
          "resolution_end": {
            "type": "string"
          }
        },
        "required": [
          "prompt",
          "answer_type",
          "expected_results",
          "comparison_operator",
          "source_urls",
          "min_agreement",
          "resolution_start",
          "resolution_end"
        ]
      }
    }
  },
  "required": [
    "markets"
  ]
}
```

### User Prompt

Replace the placeholder with your context (a news article, event details, etc.) and send as the user message:

```
Generate one or more prediction market definitions based on the following context. Return them as a JSON array under a "markets" key.

{{PASTE YOUR CONTEXT HERE — e.g. a news article, event details, or topic description}}
```

## Website

The website provides an interactive form to test market definitions and view detailed rating breakdowns with criteria scores and evidence.