AI Recommendations Are Random — But Visibility Isn't: What the SparkToro Study Actually Means for Your Brand

By Driftspear TeamApril 4, 2026

AI SEOAI VisibilityContent OptimizationSearch Strategy

Ask ChatGPT, Claude, or Google's AI to recommend brands in a category 100 times, and you have less than a 1-in-100 chance of seeing the same list twice

In January 2026, Rand Fishkin dropped the most important piece of primary research the AI visibility industry has seen so far. The headline finding was jarring: ask ChatGPT, Claude, or Google's AI to recommend brands in a category 100 times, and you have less than a 1-in-100 chance of seeing the same list twice. Want the same list in the same order? You're looking at roughly 1-in-1,000 odds.

The study, conducted with Patrick O'Donnell of Gumshoe.ai, involved 600 volunteers running 12 different prompts across three major AI platforms a combined 2,961 times. The prompts spanned categories from chef's knives and headphones to cancer care hospitals and digital marketing consultants. Volunteers copied the AI responses into survey forms, where the results were normalized into ordered brand lists and compared for overlap, ordering, and repetition.

The reactions were predictable. Some marketers declared AI visibility tracking a waste of money. Others used it to justify ignoring AI search entirely. Both reactions miss the point. Fishkin's research doesn't prove AI visibility is unmeasurable. It proves that most of the industry is measuring it wrong — and that the metrics worth tracking are different from what most people assume.

What the research actually found

The core finding is worth repeating because it's genuinely striking: AI tools produce different brand recommendation lists nearly every single time. Even when the prompt is identical, the models generate a fresh response by sampling from a probability distribution. The output changes on every run.

List length varied wildly too. Some responses named two or three options. Others provided ten or more. There's no stable "top 10" to speak of — the AI doesn't maintain a ranked index the way Google does.

The research team also tested what happens when real humans write their own prompts for the same underlying intent. They asked 142 people to write prompts about buying headphones for a traveling family member. The average semantic similarity across all those prompts was just 0.081 — meaning the prompts were about as similar to each other as "Kung Pao Chicken" is to "peanut butter." People don't ask AI the same way twice.

This matches separate findings that amplify the instability picture. Ahrefs published data in late 2025 showing that Google's AI Mode and AI Overviews cite different sources 87% of the time for the same query — meaning even within a single platform, different AI features pull from different wells.

If you're running a single AI prompt once a week and treating the result as your "AI ranking," you're capturing noise.

The part most people stopped reading before they reached

Here's where the narrative flips. Despite the chaos in ordering, the pool of brands that AI recommended was far more stable than the lists themselves.

When the SparkToro team looked across hundreds of runs for the same intent, the top brands in each category appeared in 55% to 77% of responses — regardless of how the prompt was phrased. Sony, Bose, and Apple showed up across nearly every headphone recommendation run. Ramp appeared consistently in B2B fintech prompts. City of Hope appeared in 69 out of 71 responses for West Coast cancer care hospitals.

The consideration set was real, even when the rank order was random.

Category size mattered. In tight markets — cloud computing providers, regional service businesses, niche B2B tools — the set of brands AI considered was small and consistent. In broad categories like science fiction novels or brand design agencies, the set was larger and more scattered. But even in broad categories, leaders emerged with meaningfully higher appearance rates than the pack.

This is the insight that matters: AI doesn't maintain a hierarchy. It draws from a consideration set every time it responds. You're either in that set or you're not. And if you're in it, you show up with some measurable frequency that can be tracked, compared, and improved.

Why this is how AI models are designed to work

The inconsistency Fishkin documented isn't a bug. It's the fundamental architecture of large language models.

LLMs are probability engines. Every response is generated by sampling from a distribution of likely token sequences. When a model is highly confident about an entity's relevance to a query — because it has encountered that brand repeatedly in authoritative contexts across its training data and real-time retrieval — that entity sits at a high probability weight in the distribution. It appears consistently.

When the model is uncertain about an entity — maybe it has seen it in some contexts but not others, or the signals are mixed — that entity sits at a lower weight. It gets included in some responses and excluded from others. Not randomly, exactly, but probabilistically. The model doesn't have enough confidence to commit.

This is why thinking of AI responses as a ranked list, the way we think of Google search results, is fundamentally wrong. Google produces a deterministic ranking from a scored index. AI produces a probabilistic sample from a confidence distribution. The underlying mechanism is different, and the metrics we use to evaluate it need to be different too.

An important study from Authoritas, published in late 2025, provides a compelling test of this model. Researchers investigated a case where a UK company created 11 entirely fictional "experts" — made-up names, AI-generated headshots, fabricated credentials. They seeded these personas into more than 600 press articles across UK media. Then they checked: would AI models treat these fake entities as real experts?

The answer was definitive. Across nine AI models and 55 topic-based questions, zero fake experts appeared in any recommendation. Not one. Despite having substantial press coverage (one layer of signal), the fabricated personas had no entity graph presence, no community corroboration, no independent third-party validation. One layer of signal wasn't enough for AI to develop the confidence needed for consistent inclusion.

The brands that appear reliably in AI responses aren't just "more famous." They have dense, corroborated presence across multiple knowledge layers — what they've done, what others say about them, how they're categorized, and whether those signals agree with each other. AI confidence is a function of signal breadth, not just signal volume.

What this means for how you measure AI visibility

The SparkToro research draws a clear line between the metrics that work and the metrics that don't. Here's the practical breakdown.

Ranking position is dead as a metric. Any tool reporting your "position" in AI recommendation lists is providing essentially random data points. Fishkin was blunt about this: claiming to track ranking position in AI is providing statistically meaningless snapshots of probabilistic outputs. If your current AI visibility tool tells you "you're #3 in ChatGPT for project management software," that number will be different the next time the same prompt runs. And the next time. And the next.

The industry has spent an estimated $100 million or more annually on AI tracking products. A significant portion of that spend is going toward rank-position metrics that the data cannot support. Organizations currently paying for those reports should ask hard questions about methodology.

Visibility percentage is the metric that holds up. How often does your brand appear across many runs of similar prompts? This is what survived statistical scrutiny in Fishkin's research. A brand that shows up in 70% of headphone recommendation runs has meaningfully higher visibility than one that shows up in 15% — and that gap is consistent, measurable, and actionable.

The key word is "many." A single prompt run tells you almost nothing. The SparkToro team found that 60-100 runs start producing meaningful patterns, though optimal sample sizes likely vary by category. Narrow categories may stabilize faster; broad ones need more data.

Sentiment and framing are the second dimension. Showing up is step one. What AI says about you when it mentions you is step two. A brand that appears in 60% of responses but is consistently described with caveats ("good but expensive," "popular but has reliability issues") is in a very different position than one described favorably. Most tracking today focuses on presence alone and ignores the qualitative layer entirely. That's a gap.

Platform-by-platform tracking is non-negotiable. Data from Superlines found that the same brand can see citation volumes differ by up to 615x between different AI platforms. A brand dominating ChatGPT mentions might be nearly invisible in Google's AI features or vice versa. Each platform draws from different sources, weights signals differently, and cites at different rates. ChatGPT cites sources in roughly 29% of responses. Gemini does so at about 30%. Claude sits around 9%. Google AI Overviews cite at just 3%. Treating "AI" as a monolith is like treating "social media" as a single channel — the platform-level differences are massive.

How brands actually get into the consideration set

If ranking position is meaningless but consideration-set inclusion is real and measurable, the strategic question becomes: what determines whether AI includes your brand?

The answer is not prompt engineering. It's not a content hack. It's not adding FAQ schema to your blog posts (though that doesn't hurt). It's building the kind of multi-layered entity authority that gives AI models high confidence about your relevance to a category.

Think of it as three interlocking layers of signal:

Entity graph presence — does AI know what your brand is? This is about structured, consistent identification across knowledge sources. Your website, LinkedIn, Crunchbase, Wikipedia (if applicable), industry directories, and review platforms should all describe your brand consistently. When sources disagree about what you do, what category you belong to, or how you're positioned, AI confidence drops. The brands with the clearest entity definitions get the most consistent inclusion.

Document graph presence — is your brand discussed in the places AI trusts? Roughly 85% of brand mentions in AI answers come from third-party pages, not owned domains. That means earned media coverage, industry publication mentions, analyst roundups, review site profiles, and community discussions carry disproportionate weight. A 2026 analysis found that earned media placements generate significantly more AI citations than equivalent content published on a brand's own properties. Same information, different origin — and AI treats them very differently.

Concept graph encoding — is your brand consistently associated with your category? When AI encounters your brand, does it reliably connect you to the problems your customers care about? This is built through the accumulation of content, conversation, and coverage that links your name to specific use cases, topics, and solutions. Brands that have invested in deep topical authority — comprehensive content, community participation, consistent positioning — build stronger concept-level associations than brands that have spread themselves thin.

The brands with near-universal AI visibility aren't succeeding in one of these layers. They've built presence across all three. That's what creates the confidence level that pushes a brand from "occasionally mentioned" to "reliably included."

The right takeaway isn't nihilism — it's precision

When the SparkToro study first circulated, the loudest takes fell into two camps: "AI visibility is unmeasurable, stop wasting money" and "this changes nothing about our approach." Both are wrong.

The study proved that the most common metric in AI tracking — ranking position — is noise. That's a genuine and important finding. Companies spending money to track where they "rank" in AI responses should reallocate.

But the study also showed that the consideration set is real, that visibility percentage is statistically valid, and that some brands appear far more consistently than others for reasons that can be understood and influenced. That's not nihilism. That's a more honest foundation to build a strategy on.

For growth and marketing teams, the practical shifts are:

Change what you measure. Stop asking "where do we rank in AI?" and start asking "how often do we appear, across how many prompt variations, on which platforms, and what do the models say about us when we show up?" This requires more data and a different methodology than most tracking tools currently provide — but it produces numbers you can actually make decisions from.

Change where you invest. The path to consideration-set inclusion runs through brand authority, not content optimization alone. Your website is the foundation, but the signals that create AI confidence live largely off your site — in earned media, third-party reviews, community discussions, and consistent entity descriptions across the web. If your entire AI visibility strategy is on-site content changes, you're working on maybe 15% of the signal surface.

Start with a baseline. You can't improve what you don't measure, and most companies have never systematically checked what AI says about them. Run your top customer prompts across ChatGPT, Claude, Perplexity, Google's AI features, and Gemini. Do it at scale — not once, but dozens of times per prompt. Document where your brand appears, where it doesn't, what's said about you when you do appear, and how you compare to competitors. That's your starting line.

The SparkToro study did the AI visibility industry a favor by burning down the metrics that don't work. What's left standing — visibility percentage, consideration-set analysis, sentiment tracking, multi-platform comparison — is a more honest and more useful framework. The brands that adopt it now will have a compounding advantage over those still chasing phantom rankings.

The Complete Guide to AI SEO: How to Optimize Your Business for ChatGPT and Claude

View All Posts

AI Recommendations Are Random — But Visibility Isn't: What the SparkToro Study Actually Means for Your Brand

What the research actually found

The part most people stopped reading before they reached

Why this is how AI models are designed to work

What this means for how you measure AI visibility

How brands actually get into the consideration set

The right takeaway isn't nihilism — it's precision

Related Articles

The Complete Guide to AI SEO: How to Optimize Your Business for ChatGPT and Claude

GEO vs SEO: What Every Business Needs to Know About Generative Engine Optimization

How AI Models Choose Which Businesses to Recommend: The Complete Guide