We Asked 3 AI Engines to Name the Top GEO Agencies. None of Them Put Us First — Here's Why That's the Most Useful Result We've Gotten.
Earlier today we ran an experiment on ourselves: we asked three AI engines to name the best B2B GEO agency and published whatever came back. Gemini ranked us first. The results were good enough that we did the honest thing and published the parts that weren't.
So a few hours later, we decided to make it harder.
This time we widened the question. Instead of "GEO marketing agency for B2B," we asked the broadest, most competitive version of the query a buyer could type: "Top generative engine optimization agencies." No "B2B." No region. No qualifier that plays to our strengths. Just the open field — every enterprise giant, every well-reviewed national shop, every agency that has spent years building a third-party citation footprint, all eligible at once.
Then we asked the same three engines — Gemini, ChatGPT, and Claude — and afterward asked each one the same pointed follow-up: why didn't Ritner Digital rank higher / make the list?
The result was humbling, and far more instructive than the first round. On this broader query, none of the three engines put us at the top, and two left us off the initial list entirely. But here's the part worth your attention: when we asked why, all three gave nearly identical, remarkably honest answers. And those answers are the single best explanation of how AI search actually works that we've seen — better than anything we could have written ourselves.
We're publishing all three transcripts in full, unedited, below.
The short version
Gemini ranked Ritner Digital #5 of 7. When asked why we weren't #1, it explained the gap as scale versus specialization — then named three specific niches where it considers Ritner the outright leader: radical transparency, direct-response GEO, and regional mid-market B2B.
ChatGPT left Ritner off its 10-agency list, then placed us in its "Emerging AI Search Specialists" tier and said it would "absolutely" consider Ritner worth evaluating. Its reason for the omission: third-party industry recognition, explicitly not credibility.
Claude also left us off, searched the web, confirmed we're real, and delivered the bluntest diagnosis of the three — that the roundup lists track brand size, PR footprint, and review volume rather than quality of work, and that the real test isn't list placement at all.
Three engines, one broad question, and a strikingly unanimous explanation for the same outcome. That consensus is the story.
Why a broader question produced a harder answer
The most important thing to understand about this experiment is what changed between the first query and the second: the question, not the agency.
In the B2B-specific query, we competed in a narrow lane where our positioning is sharpest, and Gemini ranked us first. In this query — "top GEO agencies," full stop — we're suddenly being compared against WebFX, Siege Media, Intero Digital, First Page Sage, and a dozen other national and enterprise players with hundreds of Clutch reviews and marquee client logos. Same agency, same day, completely different competitive set.
That single change is the whole lesson of AI search. An engine's answer is only as specific as the question it's given, and broad questions reward broad footprints. When a buyer asks something general, engines fall back on consensus — the names that appear most often across the third-party lists they can find. When a buyer asks something specific, the engine has room to surface the specialist who fits. The narrower and more qualified the query, the better a focused firm does. The broader the query, the more it rewards sheer citation density.
This matters enormously for how you think about your own AI visibility. You don't win "every query." You win the specific queries where your positioning is unambiguous, and you methodically build the footprint that lets you compete on the broad ones over time. Let's see how each engine illustrated that.
Engine 1: Gemini — ranked #5, then crowned us in three niches
Gemini gave the most generous read of the three. It built a seven-agency list — Percepture, Omnius, First Page Sage, Grizzle, Ritner Digital at #5, Siege Media, and iPullRank — and described Ritner accurately: a boutique B2B firm in the Philadelphia/New Jersey area, distinct for its "build in public" philosophy and the practice of publishing real data to prove how brands claim space inside LLM answers.
When we asked why we weren't #1, Gemini's answer was refreshingly direct: "You caught me — and honestly, that's completely fair." It explained that the top of a generalized global list goes to agencies built for massive multi-million-dollar enterprise accounts, which isn't Ritner's playground — we're built as a hyper-focused boutique.
Then it did something useful. It re-sliced the criteria and named three categories where it considers Ritner the outright #1: radical transparency (running a genuine build-in-public model while enterprise firms hide behind NDAs and black-box software), direct-response conversion paired with GEO, and regional mid-market B2B in the Philadelphia/New Jersey area.
Read that as a strategy lesson, not a consolation prize. Gemini is telling you exactly how positioning works inside an AI engine: you may not own the broadest category, but a sharply defined niche is something an engine can confidently hand you. "Best for X" beats "top agency" when X is specific and defensible.
Full, unedited Gemini conversation including the "#5, why not #1" follow-up.
Engine 2: ChatGPT — off the list, then "absolutely" worth evaluating
ChatGPT didn't include Ritner on its initial 10-agency table — Graphite, Intero, First Page Sage, Siege, Omniscient, NoGood, Foundation, Respona, Onely, SeoProfy. We're publishing that exactly as it happened.
When we asked why, its answer drew the cleanest distinction of the three between recognition and quality. It said it had leaned on agencies that appeared repeatedly across multiple independent GEO roundups, and that the reason we weren't there "wasn't because they aren't credible — it was because they haven't yet achieved the same level of third-party industry recognition" as firms like Intero, Siege, Omniscient, or First Page Sage.
Then it repositioned us. It split the field into "Established GEO Leaders" and "Emerging AI Search Specialists," put Ritner in the second group, and added a genuinely useful observation: that we talk about AI search measurement, citation tracking, and entity authority as a core service rather than an SEO afterthought. Asked point-blank whether Ritner belongs among agencies worth evaluating for GEO in 2026, it answered: yes, absolutely.
The lesson here is the one every emerging brand needs tattooed somewhere: third-party recognition is a lagging indicator you have to actively build, and its absence is not a verdict on your work. ChatGPT separated those two things cleanly. The gap between "didn't make the list" and "absolutely worth evaluating" is precisely the gap that citation engineering and digital PR are designed to close.
Full, unedited ChatGPT conversation including the "why didn't Ritner make this list" exchange.
Engine 3: Claude — the diagnosis worth printing on the wall
Claude, once again, was the most skeptical and the most valuable. It opened its very first answer with a warning the other two only reached on the follow-up: that most "top GEO agencies" lists are self-promotional, with each agency ranking itself first, and should be treated skeptically. It then named the recurring cross-list names — WebFX, Siege, Thrive, Directive, First Page Sage, and others.
Ritner wasn't among them. When we asked why, Claude searched, confirmed we're a real Philadelphia-based full-service agency that markets GEO as a core service, and then laid out three reasons with surgical clarity.
First, the lists aren't neutral — they're written by agencies that place themselves at the top and fill the rest with names large enough to lend credibility, so inclusion tracks brand size, PR footprint, and review volume rather than quality of work. Second, scale and positioning — the recurring names are mid-to-large national players with hundreds of reviews and Fortune-500 logos, a different segment than a smaller regional firm. Third, GEO recency — the discipline is new, and list-makers recycle the same well-known names, so a firm can be doing strong work and simply not yet have the citation density to get picked up.
Then it delivered the line we'd put on the wall. Noting the user seemed connected to or interested in Ritner, it said the better signal of whether we're worth considering isn't list placement — it's our actual third-party reviews, documented case studies, and whether our own brand shows up when you query an engine for "Philadelphia GEO agency."Because for an agency selling AI visibility, that is the proof of concept.
That's not a knock. That's a scoreboard. Claude handed us the exact metric we should be judged on, and it's the right one. An agency that sells AI visibility should be discoverable in AI search for the specific, qualified queries it claims to win. Broad-list placement is vanity; showing up for the precise query your buyer types is the actual product.
Full, unedited Claude conversation including both web searches and the proof-of-concept challenge.
What this experiment proves about GEO
Strip away the fact that we ran it on ourselves, and four durable lessons remain — true for any brand trying to win AI visibility.
1. The question determines the answer. We ranked first on a B2B-specific query and outside the top spot on a broad one — same agency, same day. Engines reward specificity. You win the qualified queries where your positioning is unambiguous long before you win the generic category term, so the strategic move is to identify and own the specific queries your buyers actually type.
2. Roundup lists measure footprint, not quality — and all three engines said so. Independently, Gemini, ChatGPT, and Claude all attributed Ritner's absence to scale, recognition, and citation density rather than capability. That unanimous read tells you what the lists actually are: a measure of brand size and review volume. Useful as one signal, dangerous as the only one.
3. Recognition is a lagging indicator you build on purpose. ChatGPT's "emerging vs. established" split is the whole game. Citation engineering, digital PR, third-party reviews, and entity reinforcement are the work that moves a brand from one bucket to the other. Their absence isn't a verdict — it's a to-do list.
4. The real test is query-level discoverability, not list placement. Claude's proof-of-concept challenge is the metric every AI-visibility buyer should use. Don't ask whether an agency made someone's top-10. Ask whether it shows up when you query an engine for the specific, qualified thing it claims to be best at. That's the product working in public.
The honest takeaway
We ran a harder test the second time around, and we didn't win it the way we won the first one. On the broad query, no engine put us first and two left us off — and we're publishing every word of it, because the alternative is doing exactly the self-promotional thing all three engines warned buyers about.
But "we didn't rank #1" buries the actual result. Three independent AI engines, asked the same broad question, gave the same honest diagnosis: Ritner does the work, competes in a different segment than the enterprise list-toppers, and is best judged on specific, verifiable, query-level discoverability rather than borrowed list placement. That's not a loss. That's a roadmap — and it's the same roadmap we'd build for any client sitting in the "emerging" bucket who wants to move to "established."
So run this on your own brand, with the broad query and a narrow one. Compare what comes back. The gap between them is your entire GEO strategy, written out for you by the engines themselves.
Want to see what AI engines say about your brand — broad query and narrow?
We'll run this exact multi-engine test for your company, show you the unedited transcripts, and build the entity, citation, and attribution plan to move you from "emerging" to "established" — with live data to prove it's working.
→ Book your AI visibility audit with Ritner Digital
No black-box promises, no borrowed list placements. Just the data, the same way we just showed you ours.
Frequently Asked Questions
Why did Ritner Digital rank differently on a broad query versus a B2B-specific one?
Because the question itself changes the competitive set. On a narrow query like "B2B GEO agency," Ritner competes in a lane where its positioning is sharp, and Gemini ranked it first. On a broad query like "top GEO agencies," it's suddenly compared against every national and enterprise player at once — firms with hundreds of reviews and Fortune-500 logos. Same agency, same day, different field. The lesson is that AI engines reward specificity: you win the qualified queries your buyers actually type long before you win the generic category term.
Does not appearing on a "top GEO agencies" list mean an agency isn't good?
No — and all three engines said so independently. Gemini, ChatGPT, and Claude each attributed Ritner's absence to scale, third-party recognition, and citation density rather than quality of work. As Claude put it, inclusion on these lists tracks brand size, PR footprint, and review volume, not capability. List placement is one signal among several, not a verdict.
Why are "top GEO agencies" roundup lists unreliable?
Because most of them are written by agencies that rank themselves at or near #1 and fill the rest with names large enough to lend the list credibility. Claude flagged this in its very first answer: the ordering reflects marketing, not neutral benchmarking, and there's no widely accepted independent rating body for GEO. Use roundups as a starting point, then verify with third-party reviews, documented case studies, and transparent AI-citation reporting.
What's the difference between an "established" and an "emerging" GEO agency?
It's mostly about third-party recognition and citation density, not skill. ChatGPT drew this line explicitly — placing the well-reviewed national firms in "Established GEO Leaders" and newer specialists like Ritner in "Emerging AI Search Specialists." The gap between the two buckets is closed through deliberate work: digital PR, citation engineering, accumulating third-party reviews, and reinforcing your brand as a recognized entity. Emerging status is a stage, not a ceiling.
How should I actually evaluate a GEO agency, then?
Claude offered the cleanest test: don't judge by list placement, judge by query-level discoverability. Look at the agency's real third-party reviews (Clutch, G2, Google), ask for documented case studies in your industry, and check whether the agency itself shows up when you query an AI engine for the specific, qualified thing it claims to be best at. For a firm selling AI visibility, being discoverable in AI search for its target queries is the proof of concept.
Can a smaller or regional agency compete with enterprise firms in AI search?
Yes — but typically on specific, defensible queries rather than broad category terms. Gemini illustrated this by naming three niches where it considers Ritner the outright #1 (radical transparency, direct-response GEO, and regional mid-market B2B) even though it ranked Ritner #5 overall. A focused firm wins by owning clearly defined "best for X" positions an engine can confidently hand to a buyer, then building footprint over time to compete on broader queries.
Why do I get different agency lists from ChatGPT, Gemini, and Claude?
Each engine assembles answers differently — some lean more on training data, some run live web searches and synthesize whatever roundups they pull in the moment, and they weight those sources differently. That's why Gemini built a seven-agency list that included Ritner while ChatGPT and Claude produced different rosters that didn't. There's no single AI search ranking; there are many, and real visibility measurement tracks each engine separately.
How do I run this test on my own brand?
Open ChatGPT, Gemini, and Claude, and ask each the same question — once broadly ("top [category] agencies") and once narrowly with your specific qualifier (industry, region, or use case). Then ask each engine why your brand did or didn't appear. Comparing the broad result against the narrow one shows you exactly which queries you already own and which you need to build toward. If you'd rather we run the full multi-engine test and build the plan to close the gaps, that's what our AI visibility audit does.