Reproducible · versioned · public

Methodology

How we measure which businesses AI assistants recommend, how we compute confidence, and where the limits are.

1. Sampling

For each (city, category) we run a fixed prompt set against four LLMs (Claude, ChatGPT, Gemini, Perplexity), N times per (prompt × model) to measure intra-model variance. Total scan slots = prompts × models × samples. The current active prompt sets are versioned below.

2. Intent buckets

Prompts are grouped into intent buckets (broad authority, specific need, process, style/approach, conversational). Aggregating within a bucket and exposing per-bucket counts preserves the signal that "best lawyer for X" and "most aggressive litigator" measure different things — averaging them would flatten useful information.

3. Citation extraction

Each LLM response is parsed by a small extractor (GPT-4o-mini) into a JSON array of business names. Names are normalized (suffixes like "LLP" / "PC" / "& Associates" stripped) and matched against a pre-harvested business database for the (city, category) via Levenshtein-distance fuzzy match. Mentions that don't match any harvested firm are tracked separately and used to drive Google Places reverse-lookup.

4. Ranking + confidence

Share = citation_count / n_slots. 95% CI uses the Wilson score interval (correct for binomial proportions at small N). Consensus score = (n_models_cited / 4) × log(citations + 1) — rewards both breadth across LLMs and depth within them. Rank order uses share, with consensus as the tiebreaker.

Tier classification:

🟢 Consensus — CI lower bound > 10%. Signal clearly above noise floor.
🟡 Mid-tier — share > 3% AND cited by ≥2 LLMs. Real signal, but CI spans the noise floor.
⚪ Below threshold — neither condition met. Published but flagged.

5. Immutable snapshots

Each scan run produces a snapshot. Snapshots are never updated or deleted — the leaderboard you're viewing is always the most recent. Rank trends are reconstructed from the snapshot history; methodology changes increment the version and snapshots from different versions are not directly comparable.

6. AI assistants tested

Claude — anthropic/claude-sonnet-4.6
ChatGPT — openai/gpt-4o
Gemini — google/gemini-2.5-pro
Perplexity — perplexity/sonar-pro

Each model receives each prompt independently — no memory, no system instructions, no conversation history. Each prompt is run N times to surface intra-model variance.

7. Active prompt sets

Each active prompt set is published as a versioned record (slug + version + bucket breakdown). Per-bucket prompt counts disclose the sampling design; verbatim prompt text is not published — it's the proprietary editorial calibration that distinguishes the Atlas from a generic GEO scraper.

divorce-attorneys v2.0 (5 buckets) divorce-attorneys · v2.0

Process

5 prompts

Specific Need

5 prompts

Conversational

5 prompts

Style Approach

5 prompts

Broad Authority

5 prompts

8. Archived methodology versions

divorce-attorneys · v1.0 — 10 prompts · created Jun 25, 2026

9. Corrections + takedowns

If your firm is misrepresented or you want to request removal, contact [email protected]. We do not alter rankings on request, but we will correct factual errors (wrong website, wrong name) within 48 hours.

10. Reproducibility

Every citation traces to a verbatim LLM response. Every snapshot points at a versioned prompt set and the source scan run IDs. Read-only JSON API at /api/leaderboards/{metro}/{industry}/{category}.json.