I keep seeing the same uncomfortable pattern: the Google dashboard looks fine, but the AI answers are sending buyers somewhere else.
A marketing lead opens Search Console and sees green arrows. Rankings up. Clicks steady. Maybe even a few page-one terms to show in the next leadership meeting.
Then someone asks ChatGPT, Perplexity, or Gemini who to shortlist in the category, and the model names two competitors first. Sometimes you appear near the bottom. Sometimes you don't appear at all.
That gap is what a share-of-answer audit measures. A share-of-answer audit is a manual measurement of how often your brand is named when AI systems like ChatGPT, Perplexity, and Gemini answer category-relevant questions — the AI equivalent of share of voice for organic search.
You're not trying to prove that AI search has replaced Google. You're trying to answer a simpler question: when a buyer asks an AI system who matters in your category, do you show up?
This is the 90-minute version. Spreadsheet, free model accounts, 30 prompts. No platform required. By the end, you'll have a first-pass baseline for your brand, a competitor comparison, and a one-page summary you can bring into the next meeting without waving your hands.
A quick caveat before we start: AI responses are non-deterministic. Two people asking the same question can get different answers, and one prompt can trigger a fan-out of sub-queries behind the scenes. This audit is a directional diagnostic, not a stable measurement. The point is to find the obvious gaps fast, then decide whether the category deserves deeper tracking.

Keep it short on purpose
Most teams don't avoid answer engine optimization (AEO) because it's impossible. They avoid it because the first measurement feels annoying.
Ninety minutes fixes that. It gives you enough time to see the obvious gaps, but not enough time to build a dashboard nobody asked for.
The first pass should feel a little rough. That's fine. You'll learn more from 30 well-chosen prompts than from a 200-prompt spreadsheet filled with vague category terms.
Get the baseline. Find the misses. Decide what deserves a bigger audit later.
What you need
You need a spreadsheet, access to a few AI systems, and one uninterrupted block of time.
For most B2B SaaS audits, I'd start with ChatGPT, Perplexity, and Gemini. Add Claude if your buyers skew technical or if your category gets discussed heavily by developers.
That's the kit.
Don't buy a platform for the first pass. Tools like Peec AI, Profound, and AthenaHQ make sense once you know what you want to track. A manual audit first makes you harder to fool later.
The 90-minute playbook
0–15: Build the prompt set
You're going for 30 prompts across three intent tiers. The mix matters more than the count.
Tier 1: Category discovery. These are buyers who don't know you yet. They're asking about the category. For a fintech API company:
- "What's the best fintech API for KYC in 2026?"
- "Which fintech APIs are compliant with EU AMLD6?"
- "Best alternatives to Plaid for embedded finance in Europe."
- "Who are the leading payment orchestration providers for SaaS?"
- "What fintech API should a Series A startup use?"
Tier 2: Comparison. These are buyers comparing named options.
- "[Your brand] vs [competitor], which is better for B2B SaaS?"
- "[Competitor] vs [competitor], which has better developer experience?"
- "Is [your brand] cheaper than [competitor]?"
- "Best alternatives to [dominant competitor] for European SaaS teams."
- "Which [category] vendor is best for a small marketing team?"
Tier 3: Validation. These are buyers checking risk before they talk to sales.
- "Is [your brand] safe for production use?"
- "What do developers actually say about [your brand]?"
- "Has [your brand] had any security incidents?"
- "What's the pricing structure of [your brand]?"
- "What are the biggest complaints about [your brand]?"
Drop the prompts into your sheet. Keep the wording ugly if that's how buyers would ask it. Polished prompts make polished lies.
15–60: Run the prompts
Run each prompt against three models. One run per model per prompt is enough for a first snapshot, not a stable benchmark. If a prompt matters commercially, plan to rerun it over several weeks before treating the number as real.
You'll still get noise. Across 90 responses, the obvious patterns usually show up.
The first time we did this manually, the spreadsheet got ugly fast. That's normal. Don't try to make the system perfect on pass one. You're hunting for the misses that show up again and again.
Use these columns:
| Column | What to capture |
|---|---|
| Prompt ID | 001, 002, 003 |
| Prompt | The exact question you asked |
| Tier | Discovery, comparison, or validation |
| Model | ChatGPT, Perplexity, Gemini, Claude |
| Run number | 1 for the first pass |
| Date | The day you ran it |
| Brand mentioned? | Yes or no |
| Brand rank | 1 if first, 2 if second, 0 if missing |
| Competitors mentioned | Names in the order they appear |
| Citations | URLs, domains, or "none" |
| Sentiment | Positive, neutral, negative, or wrong |
| Wrong facts | Pricing, market, product, location, ICP |
| Action | Page update, listicle outreach, source correction, monitor |
A sample row might look like this:
| Prompt | Model | Brand mentioned? | Brand rank | Competitors mentioned | Citations | Action |
|---|---|---|---|---|---|---|
| Best KYC API for European SaaS companies | Perplexity | No | 0 | Plaid, Stripe Identity, Onfido | G2, vendor docs, fintech blog | Pitch comparison page and third-party listicles |
A few shortcuts from doing this the messy way first:
- Open one browser profile per model.
- Keep your prompts in a scratch doc.
- Set a 15-minute timer per model.
- Paste the full response only when something looks interesting.
- If you're short on time, skip sentiment and come back to it.
The annoying part nobody tells you
The answers will shift. Citations will disappear. One model will give you a strange answer that makes no sense. Another will invent a product detail you corrected on your site six months ago.
Don't panic. Don't turn the first audit into a debate about whether the model was "right."
Mark the weird response, keep moving, and look for repeated patterns. One bad answer is noise. The same missing brand across five buyer-intent prompts is a problem.
60–75: Score and summarize
Calculate these numbers.
Share of answer. Prompts where you were named, divided by total responses across all models. If you appear in 18 of 90 responses, your share of answer is 20%.
Share of answer by tier. Start here. This is usually where the real problem shows up. Most B2B SaaS companies show up on validation prompts because the model can find their own site. The trouble starts in discovery and comparison prompts, where third-party sources matter more.
Competitor share of answer. Run the same calculation for your top competitors. A 20% share could be fine if the category leader sits at 25%. It's painful if the leader sits at 70%. The number needs context before it means anything.
Position-weighted score. A first-place mention is worth more than a last-place mention. Use a simple weighting:
- First mention = 3 points
- Middle mention = 2 points
- Last mention = 1 point
- Missing = 0 points
Then divide your score by the maximum possible score. Don't over-engineer it. You're looking for direction, not a peer-reviewed metric.
75–90: Write the one-pager
Five bullets. No deck. No 14-tab workbook. Use this structure:
- Our share of answer is X% across 30 prompts in 3 models.
- We're strongest on [tier] at Y% and weakest on [tier] at Z%.
- The top-cited competitor is [competitor] at A%.
- We keep missing from [pattern: best-of listicles, comparison prompts, regional queries, implementation questions].
- Our first 30-day move is [one specific bet].
A finished version might read like this:
Our share of answer is 18% across 30 prompts in ChatGPT, Perplexity, and Gemini.
We show up on validation prompts, but we're nearly invisible on category discovery.
Competitor A appears in 52% of responses and gets cited from three third-party listicles we're missing from.
The first 30-day move is to build one comparison page, update our category page for direct answers, and pitch the three listicles that already appear in AI citations.
Ship the one-pager before you close the laptop. The point of the audit is the decision it forces, and you only force it if the number lands in front of someone who can act on it.

Position-Weighted Mention Scoring
Points awarded
Position weighting surfaces whether you're the default answer or an afterthought.
Source: LoudFace share-of-answer methodology
How to read the results
A few patterns I see in first audits at LoudFace.
"We show up in Tier 3 but not Tier 1"
You have bottom-of-funnel content, but no category presence. Fix it in two places: your category-level pages and the third-party "best of" pages your buyers already trust. The on-site work is faster. The third-party work usually matters more.
"We show up on Perplexity but not ChatGPT"
Perplexity often surfaces fresh-source gaps faster because it shows citations clearly on every answer. ChatGPT can behave differently depending on whether the response leans on live web retrieval or older patterns from how the web described you months ago.
If the gap stays open after months of fresh third-party coverage, the model still doesn't have a clear picture of who you are, what category you belong in, or why you deserve to appear next to bigger names.
"Our position is always last"
You're known, but you're not the default. Look upstream at how your site describes the company. Your web team can check the schema, but the bigger issue is usually the one-liner. If your description sounds like four competitors stitched together, the model has no reason to rank you higher.
"Our competitor is cited three times more than us"
Pull every response where they appear and you don't. The pattern usually reduces to a finite list of listicles, podcasts, reviews, partner pages, or industry publications. Those are your outreach targets for the next 90 days. Stop guessing about citations and start working the list.
"We get cited, but with wrong facts"
The models learned about you from stale sources. Update your own site first. Then correct owned profiles, directories, partner pages, and public listings you can legitimately update. If Wikipedia is relevant, follow its conflict-of-interest rules. Don't treat it like a company profile you can edit into shape.
Common mistakes in a first audit
A few patterns that derail otherwise good audits.
Vanity prompts are the most common one. If you ask "what's the best [your exact category as you describe it]," the model will cite you because the wording came from your homepage. Use the language buyers use, even when it's uglier.
Running a single model is the next trap. Your story across ChatGPT, Perplexity, and Gemini will differ. One model gives you a clue. Three models give you a pattern.
Skipping competitors is the one I see most often on a tight first audit. Your number means little without theirs. Competitor data tells you whether you have a positioning problem, a coverage problem, or both.
Scoring only on yes or no hides too much. A last-place, neutral mention on a comparison prompt is closer to a loss than a win. Track position and sentiment from day one if you can.
Treating the first audit as a final answer. One run tells you whether you can appear. Repeated runs tell you how stable that appearance is. Start manual, then decide whether the trend deserves automation.
What to do after the audit
The baseline is only useful if you act on it. In the week after your first audit, work through this list.
Fix the five worst high-intent prompts
Pick the prompts with the strongest commercial intent where you didn't appear. Each one needs one of four fixes:
- a new page
- a rewrite of an existing page
- a third-party placement
- a source correction
Your job is to match the fix to the reason you were missing. Don't write a new page when every cited source is a third-party listicle.
Set up a weekly re-run
Same 30 prompts. Same models. Same scoring. The trendline matters more than the snapshot. One bad week means little. Three bad weeks in a row means something changed upstream.
Scale the prompt set after 30 days
Grow to 50 prompts, then 100. Add regions if you sell internationally. Add run depth if leadership wants a more stable number.
Use these as rough planning bands from our own LoudFace audits, not industry benchmarks. Your category may behave very differently:
- Category leader: 60–80%
- Challenger: 25–45%
- New entrant: 10–20% in the first 90 days, 30%+ inside year one
For reference, one public B2B SaaS audit dataset from VisibleIQ reported the average company appearing in around 16% of buying-intent prompts. Treat that as a sanity check, not a target.
If your number is above the band for your stage, you may be underrating your AEO position in sales calls. If it's below, you finally have a measurable problem instead of a vague fear.
Don't automate too early
If your first instinct is to buy a monitoring tool before you've fixed the obvious gaps, you're paying to watch yourself lose. Clean up the first 30 prompts manually. Then automate.
Share-of-Answer Planning Bands by Company Stage
Stage
These are rough planning bands from LoudFace audits, not industry benchmarks. Your category may behave differently.
Source: LoudFace client audits across fintech, payroll, and developer tools categories
Average Share of Answer for B2B SaaS
A sanity check, not a target: most companies start well below where they want to be.
Source: VisibleIQ public B2B SaaS audit dataset
Where this method comes from
This playbook draws on three sources. First, manual audits we've run for LoudFace clients across categories like fintech APIs, payroll, and developer tools. Second, the way current AI visibility platforms (Peec AI, Profound, AthenaHQ) structure their measurement around prompts, models, position, and sentiment. Third, public research on AI citations, including Ahrefs' analysis of 26,283 ChatGPT source URLs (which found "best X" lists were the single most prominent page type, and that brands were more likely to be cited through third-party sources than their own domain) and a follow-up Ahrefs study of 863,000 keywords showing that only 38% of AI Overview citations now come from Google's top 10 results, down from 76% a year earlier.
The takeaway from all three: AI visibility is real, measurable, and only loosely tied to traditional rankings. A manual first audit is a fair way to find out where you stand.
Want a second read on your share of answer? Book a discovery call and we'll run your prompt set against your category and send the report back. For broader context on the discipline, see the complete guide to AEO in 2026 or our ranked list of AEO agencies for B2B SaaS.
Google AI Overview Citations from Top-10 Organic Results
Share of AI Overview citations from Google top 10 (%)
AI visibility is only loosely tied to traditional rankings — a gap that a share-of-answer audit is designed to expose.
Source: Ahrefs study of 863,000 keywords






