INVISIBLECOMPETITORAI Referral Intelligence

How AI Search Works

How AI engines decide who to recommend.

When your customer asks ChatGPT or Gemini “who's the best chauffeur in Southampton” or “which accountant should I use in Bristol”, they get back a short list of names. Three or four businesses, sometimes more. That list isn't random. It's the output of a process most people don't see.

This page is for buyers who want to understand the process before they buy. By the end, you'll know what AI engines actually weigh when they pick businesses, what we've learned from running over 100 scans on small UK businesses, and what's still uncertain.

Three rules for this page: cite the research, name the limits, don't sell where it's not honest. No jargon. Real answers to real questions.

01 · The mechanics

How the engines pick names

When you ask Google something, it returns ten links and lets you pick. When you ask ChatGPT something, it picks for you. That's the core difference, and it changes everything about how businesses get found.

Modern AI engines work in two phases. First, they retrieve relevant information from a mix of sources: their training data, real-time web search, and structured data they can access directly. Then they generate an answer that combines what they've found into a coherent response.

The retrieval phase is where businesses get included or excluded. Engines look for sources they can interpret quickly and trust enough to surface. A business that's machine-readable, well-described, and referenced by other credible sources gets pulled in. A business that's hidden behind decorative websites, missing structured data, and unmentioned elsewhere gets filtered out.

The generation phase is where the engine decides who to feature first. From the businesses it's retrieved, it builds an answer that satisfies the question. Some get named in the body. One usually gets the top spot. The rest get omitted, even though they were technically retrieved.

Each engine works slightly differently. ChatGPT and Perplexity lean heavily on real-time web retrieval. Gemini integrates more directly with Google's existing index and knowledge graph. Claude uses its training data plus web search where available. Despite the technical differences, they all share the same fundamental problem: too many businesses to surface, too little time to evaluate, so they reach for the ones easiest to verify.

That's the gap. Not “are you a good business” (the engine can't tell). Not “do you have good reviews” (that's evidence, not the primary signal). The gap is: “can I quickly verify what this business is, what it does, and whether other credible sources agree it's relevant to this question.”

A note on what's changing. AI search is two years old as a real commercial channel. The signals AI engines weight have shifted twice in 2025 alone. The scan you can run on this site reflects what matters in mid-2026, based on the most recent research and our own data. We re-evaluate the signals quarterly. When something material shifts, the scan updates. The page you're reading is updated when the underlying mechanics do.

02 · The signals

What actually gets you named

Eight signals matter. Some matter more than people expect. Some matter less. Here's what AI engines actually weigh, based on published research (Princeton's GEO study, Ahrefs' schema research, SE Ranking's citation analysis) and over 100 scans we've run on real UK small businesses.

Schema markup

Schema is code added to your website that tells AI engines exactly what kind of business you are. Without it, AI has to guess from the text on the page. With it, the engine reads a structured description: “This is a chauffeur service. Located in Southampton. Owned by [X]. Operates these vehicles. Provides these services.”

Organization schema, LocalBusiness schema, FAQPage schema, Article schema, BreadcrumbList schema. Each one solves a specific interpretation problem. Most small business sites have none of these. The ones that do are easier for AI to pick up.

llms.txt

A new standard, emerging in late 2024 and now widely accepted by AI tools. It's a text file at the root of your domain that gives AI a structured brief about your business. What you do, what pages matter, what content to prioritise, what to avoid.

Most small business sites don't have one yet. Adding it costs nothing technical but the businesses that do are ahead of the curve in how they're parsed.

FAQ content and question-based pages

AI engines prefer content structured as questions and answers because that's how their users interact with them. A business site with detailed FAQ pages, structured with FAQPage schema, gives the engine ready-made content to cite when answering customer questions.

Research from Frase and SE Ranking shows mixed effects: FAQ schema isn't a magic switch. But FAQ content itself, well-written and clearly answering real buyer questions, does correlate with citation in our scan data.

Entity clarity

This is the most underrated signal. AI engines need to know exactly who you are, where you operate, and what you do, with enough confidence to recommend you over a competitor.

Entity clarity comes from three places: a clear About page with specific details (founders, history, credentials), consistent descriptions of your business across the web (your website, Google Business Profile, directories, social profiles), and links to authoritative profiles that confirm your existence (LinkedIn, Companies House, Wikidata where applicable).

In our scan data, entity clarity correlated more strongly with mention rate than schema markup did. AI engines mention businesses they can identify confidently. Without identity clarity, you're a fuzzy candidate.

Third-party citations and consensus

AI engines weight businesses that other credible sources reference. Press coverage, industry publications, Reddit discussions, directory listings, supplier mentions. The more independent sources confirm your existence and category, the more confident the engine becomes in surfacing you.

This is why Reddit appears in 40% of AI citations across most categories. The engines aren't just reading your site, they're reading what other people say about your category and whether your name comes up.

Content depth and freshness

Engines prefer sites with substantive content over thin ones. Not volume for the sake of it, but real depth on the topics you're known for. A chauffeur service with three short pages gets less weight than one with detailed pages about their service area, their fleet, their booking process, their experience.

Freshness matters less than people think for service businesses. Updating once a quarter is usually enough. Daily blogging isn't necessary unless you're competing in a fast-moving content category.

Crawlability and AI bot access

If AI engines can't read your site, none of the above matters. Many small business sites accidentally block AI crawlers via robots.txt or anti-bot protection. Verifying that GPTBot, ClaudeBot, PerplexityBot, and Google-Extended can access your pages is the prerequisite to everything else.

Search Console and Bing Webmaster Tools

Both engines that feed AI tools (Google for Gemini and ChatGPT search, Bing for Copilot and others) need to be connected to your site properly. Sitemap submitted. Indexing verified. Errors addressed.

This isn't an AI-specific signal but it's an AI-prerequisite. The sites that get cited are the sites that the underlying search engines can crawl, parse, and index reliably.

That's the eight. Some are technical, some are content, some are operational. None of them individually moves the needle. Together, they're what gets a business surfaced when AI is asked who to recommend.

03 · The data

What we've learned from 100+ scans

Most of the writing on AI search optimisation comes from theory or small studies. We've run our own AI Referral Check on over 100 UK small businesses across categories: trades, professional services, clinics, salons, retailers, specialist services. The patterns are consistent enough to share.

Most businesses are invisible

The single most common result is a 0% top-pick rate with 0-15% mention rate. Not a few. Most. Around 70% of UK small businesses we've scanned aren't named in any meaningful way by ChatGPT, Claude, or Gemini for queries they should be relevant to.

This isn't because they're bad businesses. It's because the technical and entity signals required to be picked up aren't in place.

Niche specialists outperform generalists

The businesses that do score well share a pattern: they're highly specific about what they do. A chauffeur service that specialises in “Mercedes V-Class luxury wedding hire in Hampshire” tends to get named more than a generalist “Hampshire chauffeur service”, provided the schema and entity work supports it.

AI engines find it easier to recommend a clear specialist than a generic operator. The recommendation has higher confidence because the match is tighter.

Entity clarity beats FAQ presence

Conventional wisdom says FAQ schema is the key to AI citation. Our data didn't confirm that. FAQ presence had a weak correlation with mention rate. Entity clarity, measured by the AI's ability to identify the business from its About page, Google Business Profile, and consistent descriptions, had a much stronger correlation.

This matches Carolyn Shelby's framing at SMX Munich 2026: “AI doesn't discover new brands, it selects from known entities.” The work is to be a known entity first, then make it easy to cite.

Engines disagree more than you'd expect

The same business, scanned across ChatGPT, Claude, Gemini, and Perplexity, often gets very different treatment. One engine might mention you regularly, another might never. This isn't random. It usually reflects which sources each engine prioritises and what data sources it has access to.

A practical consequence: optimising for one engine doesn't optimise for all. The Sprint addresses signals that matter across all four engines rather than tuning for one.

What changed in the last six months

The signals we'd have prioritised in late 2025 aren't quite the same as 2026. llms.txt went from a fringe proposal to a widely adopted standard in eight months. The relative weight of schema markup softened in some categories as engines got better at parsing unstructured content. Entity clarity, by contrast, became more important as AI engines started cross-checking businesses against multiple sources before recommending.

We re-test these assumptions every time we run a fresh batch of scans. The patterns above are what's true at the time of writing. They might not be true in twelve months. The scan reflects current research and current data.

Most fixes are quick

The technical foundations we deploy in a Sprint (schema, llms.txt, About-page entity work, Search Console setup) take hours, not weeks. The content fixes (FAQ pages, About content) take a bit more time because they need to be drafted in voice. But none of this is complex engineering. Most small business sites are 80% of the way to being properly readable by AI. The 20% that's missing is the 20% that matters.

The honest gap: while the technical work is fast, AI engines themselves take 60-90 days to re-crawl, re-evaluate, and update their recommendations. The Sprint deploys what's missing. The engines catch up on their own timeline.

04 · The application

How the Sprint applies this

The £497 AI Referral Sprint is the operational version of everything above.

In three working days we deploy the eight signals on your site:

Schema markup, applied to the relevant pages
llms.txt, configured and deployed
5 buyer-question pages, drafted in your voice with FAQ schema
About and team pages, structured for AI to identify you clearly
Search Console and Bing Webmaster Tools, set up and verified
AI bot access, verified across the major crawlers
Sitemap, optimised and submitted

Then we wait. AI engines take 60-90 days to re-crawl your site, re-evaluate, and update their recommendations. We re-scan at 90 days so you can see what moved.

This is the gap between work and outcome. The work is fast and controllable. The outcome is slow and partly outside any provider's control. Anyone telling you they can guarantee citations in 3 days isn't being honest about how AI engines work.

What we promise is the work. What we measure is the outcome. The gap between the two is the engines' update cycle, which we don't control but do account for.

If your business is bigger or more complex than the standard Sprint (multi-site, regulated, agency white-label, ongoing competitive pressure), the same principles apply but the scope extends. We'd talk about that on the walkthrough call, or you can read about Consulting first.

See the AI Visibility Engine →

05 · How we keep up

Tracking what changes

AI search isn't a settled field. The signals that matter today aren't necessarily the signals that mattered last year, and they might not be the signals that matter next year. The scan stays current because we actively track the work across three streams:

Research and publications.We monitor Princeton's GEO research, the major SEO publications, and industry conference output from SMX, BrightonSEO, and the AI search-specific events. New studies get evaluated against our scan data.

Engine announcements. ChatGPT, Claude, Gemini and Perplexity publish irregular updates about how their recommendation systems work. We track these and adjust which signals the scan weights when they matter.

First-party scans.We re-run baseline scans on a sample set of businesses regularly. This is partly verification (does work we've deployed still move the score?), partly research (are the engines now weighting things differently?).

What this means for buyers: the AI Referral Score you get today reflects what's current. The Sprint deploys what's current. The methodology behind both updates as the field does.

06 · The honest bit

What we don't know yet

A few things this page hasn't claimed:

We can't guarantee top-pick status.AI engines weigh dozens of factors, including some that are dynamic (real-time popularity, seasonal shifts, news cycles) and some that are categorical (competitiveness varies hugely by industry and location). The work moves you closer. It doesn't guarantee the top.

The 60-90 day timeline is an average.Some engines update faster. Some categories take longer. Some businesses see movement within 30 days. Others take 120. We re-scan at 90 days because that's the point where you'd reasonably expect meaningful movement, but it's not a hard deadline.

Some businesses don't benefit much from this work.Highly commoditised local services where the buyer doesn't research before contacting (for example, emergency plumbers at 2am) get less from AI search than considered purchases where customers research thoroughly. Honest answer: if your business is commodity emergency response, the Sprint will help but not transform your customer flow.

Our scan dataset has limits.100+ businesses is meaningful, but it's not a randomised sample. Most scanned businesses came to the tool because they suspected they had a problem. That biases the “most are invisible” finding upward. The pattern holds, but we acknowledge the data isn't perfect.

We're learning too. Every Sprint teaches us something about what moves the score in different categories. The methodology updates as we learn. The 90-day re-scan is partly diagnostic for our own understanding, not just verification for you.

That's the honest position. The work is real, the research is clear, the data backs it up, and the gaps in our knowledge are named openly.

Peter Fraher

Founder, Invisible Competitor

peter@invisiblecompetitor.com

Ready to run the scan?

60 seconds, no signup. See exactly where you stand against your competitors in AI search.

Run the free check →

Last updated: 11 June 2026