Can I automate a share-of-answer audit?

Yes. Peec AI, Profound, and AthenaHQ can run this loop on a schedule across more models and more prompts. Run the 90-minute version first anyway. You'll evaluate those platforms better once you know what the measurement feels like by hand.

Do I need to run all five models?

No. Three is enough for the first pass. More models make the spreadsheet look more serious, but they won't fix a lazy prompt set. Start with the models your buyers are most likely to use. For most B2B SaaS teams, that means ChatGPT and Perplexity first, then Gemini or Claude depending on the category.

Should I include branded prompts?

Yes, but keep them in the validation tier. They'll usually show close to 100% mention rate on your own brand. The useful part is factual accuracy. If the model gets your pricing, ICP, product scope, or location wrong, you've found a cleanup job.

What if my category is tiny?

Zoom out one level. If nobody asks 'best fintech API for KYC in Portugal,' they may ask 'best KYC provider for EU startups.' Start where the buyer language actually lives, then narrow once you have signal.

Will I get rate-limited running 30 prompts across three models?

Usually no. Thirty prompts across a few accounts is small enough that you should be fine. If one model slows you down, skip it and note the gap. Don't lose the whole audit because one tab got annoying.

How often should I re-run the audit?

Weekly while you're actively fixing pages, placements, and sources. Every two weeks once the trendline stabilizes. Monthly when you move into maintenance.

Share-of-Answer Audit: 90-Minute Playbook for B2B

I keep seeing the same uncomfortable pattern: the Google dashboard looks fine, but the AI answers are sending buyers somewhere else.

A marketing lead opens Search Console and sees green arrows. Rankings up. Clicks steady. Maybe even a few page-one terms to show in the next leadership meeting.

Then someone asks ChatGPT, Perplexity, or Gemini who to shortlist in the category, and the model names two competitors first. Sometimes you appear near the bottom. Sometimes you don't appear at all.

That gap is what a share-of-answer audit measures. A share-of-answer audit is a manual measurement of how often your brand is named when AI systems like ChatGPT, Perplexity, and Gemini answer category-relevant questions — the AI equivalent of share of voice for organic search.

You're not trying to prove that AI search has replaced Google. You're trying to answer a simpler question: when a buyer asks an AI system who matters in your category, do you show up?

This is the 90-minute version. Spreadsheet, free model accounts, 30 prompts. No platform required. By the end, you'll have a first-pass baseline for your brand, a competitor comparison, and a one-page summary you can bring into the next meeting without waving your hands.

A quick caveat before we start: AI responses are non-deterministic. Two people asking the same question can get different answers, and one prompt can trigger a fan-out of sub-queries behind the scenes. This audit is a directional diagnostic, not a stable measurement. The point is to find the obvious gaps fast, then decide whether the category deserves deeper tracking.

Your Search Console can look fine while AI systems route buyers to competitors.

Keep it short on purpose

Most teams don't avoid answer engine optimization (AEO) because it's impossible. They avoid it because the first measurement feels annoying.

Ninety minutes fixes that. It gives you enough time to see the obvious gaps, but not enough time to build a dashboard nobody asked for.

The first pass should feel a little rough. That's fine. You'll learn more from 30 well-chosen prompts than from a 200-prompt spreadsheet filled with vague category terms.

Get the baseline. Find the misses. Decide what deserves a bigger audit later.

What you need

You need a spreadsheet, access to a few AI systems, and one uninterrupted block of time.

For most B2B SaaS audits, I'd start with ChatGPT, Perplexity, and Gemini. Add Claude if your buyers skew technical or if your category gets discussed heavily by developers.

That's the kit.

Don't buy a platform for the first pass. Tools like Peec AI, Profound, and AthenaHQ make sense once you know what you want to track. A manual audit first makes you harder to fool later.

The 90-minute playbook

0–15: Build the prompt set

You're going for 30 prompts across three intent tiers. The mix matters more than the count.

Tier 1: Category discovery. These are buyers who don't know you yet. They're asking about the category. For a fintech API company:

"What's the best fintech API for KYC in 2026?"
"Which fintech APIs are compliant with EU AMLD6?"
"Best alternatives to Plaid for embedded finance in Europe."
"Who are the leading payment orchestration providers for SaaS?"
"What fintech API should a Series A startup use?"

Tier 2: Comparison. These are buyers comparing named options.

"[Your brand] vs [competitor], which is better for B2B SaaS?"
"[Competitor] vs [competitor], which has better developer experience?"
"Is [your brand] cheaper than [competitor]?"
"Best alternatives to [dominant competitor] for European SaaS teams."
"Which [category] vendor is best for a small marketing team?"

Tier 3: Validation. These are buyers checking risk before they talk to sales.

"Is [your brand] safe for production use?"
"What do developers actually say about [your brand]?"
"Has [your brand] had any security incidents?"
"What's the pricing structure of [your brand]?"
"What are the biggest complaints about [your brand]?"

Drop the prompts into your sheet. Keep the wording ugly if that's how buyers would ask it. Polished prompts make polished lies.

15–60: Run the prompts

Run each prompt against three models. One run per model per prompt is enough for a first snapshot, not a stable benchmark. If a prompt matters commercially, plan to rerun it over several weeks before treating the number as real.

You'll still get noise. Across 90 responses, the obvious patterns usually show up.

The first time we did this manually, the spreadsheet got ugly fast. That's normal. Don't try to make the system perfect on pass one. You're hunting for the misses that show up again and again.

Use these columns:

Column	What to capture
Prompt ID	001, 002, 003
Prompt	The exact question you asked
Tier	Discovery, comparison, or validation
Model	ChatGPT, Perplexity, Gemini, Claude
Run number	1 for the first pass
Date	The day you ran it
Brand mentioned?	Yes or no
Brand rank	1 if first, 2 if second, 0 if missing
Competitors mentioned	Names in the order they appear
Citations	URLs, domains, or "none"
Sentiment	Positive, neutral, negative, or wrong
Wrong facts	Pricing, market, product, location, ICP
Action	Page update, listicle outreach, source correction, monitor

A sample row might look like this:

Prompt	Model	Brand mentioned?	Brand rank	Competitors mentioned	Citations	Action
Best KYC API for European SaaS companies	Perplexity	No	0	Plaid, Stripe Identity, Onfido	G2, vendor docs, fintech blog	Pitch comparison page and third-party listicles

A few shortcuts from doing this the messy way first:

Open one browser profile per model.
Keep your prompts in a scratch doc.
Set a 15-minute timer per model.
Paste the full response only when something looks interesting.
If you're short on time, skip sentiment and come back to it.

The annoying part nobody tells you

The answers will shift. Citations will disappear. One model will give you a strange answer that makes no sense. Another will invent a product detail you corrected on your site six months ago.

Don't panic. Don't turn the first audit into a debate about whether the model was "right."

Mark the weird response, keep moving, and look for repeated patterns. One bad answer is noise. The same missing brand across five buyer-intent prompts is a problem.

60–75: Score and summarize

Calculate these numbers.

Share of answer. Prompts where you were named, divided by total responses across all models. If you appear in 18 of 90 responses, your share of answer is 20%.

Share of answer by tier. Start here. This is usually where the real problem shows up. Most B2B SaaS companies show up on validation prompts because the model can find their own site. The trouble starts in discovery and comparison prompts, where third-party sources matter more.

Competitor share of answer. Run the same calculation for your top competitors. A 20% share could be fine if the category leader sits at 25%. It's painful if the leader sits at 70%. The number needs context before it means anything.

Position-weighted score. A first-place mention is worth more than a last-place mention. Use a simple weighting:

First mention = 3 points
Middle mention = 2 points
Last mention = 1 point
Missing = 0 points

Then divide your score by the maximum possible score. Don't over-engineer it. You're looking for direction, not a peer-reviewed metric.

75–90: Write the one-pager

Five bullets. No deck. No 14-tab workbook. Use this structure:

Our share of answer is X% across 30 prompts in 3 models.
We're strongest on [tier] at Y% and weakest on [tier] at Z%.
The top-cited competitor is [competitor] at A%.
We keep missing from [pattern: best-of listicles, comparison prompts, regional queries, implementation questions].
Our first 30-day move is [one specific bet].

A finished version might read like this:

Our share of answer is 18% across 30 prompts in ChatGPT, Perplexity, and Gemini.
We show up on validation prompts, but we're nearly invisible on category discovery.
Competitor A appears in 52% of responses and gets cited from three third-party listicles we're missing from.
The first 30-day move is to build one comparison page, update our category page for direct answers, and pitch the three listicles that already appear in AI citations.

Ship the one-pager before you close the laptop. The point of the audit is the decision it forces, and you only force it if the number lands in front of someone who can act on it.

A three-tier stacked sculpture standing in for the three intent tiers of a share-of-answer audit — category discovery, comparison, and validation — Thirty prompts across three intent tiers gives you directional signal in a single session.

Position-Weighted Mention Scoring

Points awarded

3 pts

First mention

2 pts

Middle mention

1 pts

Last mention

0 pts

Not mentioned

Position weighting surfaces whether you're the default answer or an afterthought.

Source: LoudFace share-of-answer methodology

How to read the results

A few patterns I see in first audits at LoudFace.

"We show up in Tier 3 but not Tier 1"

You have bottom-of-funnel content, but no category presence. Fix it in two places: your category-level pages and the third-party "best of" pages your buyers already trust. The on-site work is faster. The third-party work usually matters more.

"We show up on Perplexity but not ChatGPT"

Perplexity often surfaces fresh-source gaps faster because it shows citations clearly on every answer. ChatGPT can behave differently depending on whether the response leans on live web retrieval or older patterns from how the web described you months ago.

If the gap stays open after months of fresh third-party coverage, the model still doesn't have a clear picture of who you are, what category you belong in, or why you deserve to appear next to bigger names.

"Our position is always last"

You're known, but you're not the default. Look upstream at how your site describes the company. Your web team can check the schema, but the bigger issue is usually the one-liner. If your description sounds like four competitors stitched together, the model has no reason to rank you higher.

"Our competitor is cited three times more than us"

Pull every response where they appear and you don't. The pattern usually reduces to a finite list of listicles, podcasts, reviews, partner pages, or industry publications. Those are your outreach targets for the next 90 days. Stop guessing about citations and start working the list.

"We get cited, but with wrong facts"

The models learned about you from stale sources. Update your own site first. Then correct owned profiles, directories, partner pages, and public listings you can legitimately update. If Wikipedia is relevant, follow its conflict-of-interest rules. Don't treat it like a company profile you can edit into shape.

Common mistakes in a first audit

A few patterns that derail otherwise good audits.

Vanity prompts are the most common one. If you ask "what's the best [your exact category as you describe it]," the model will cite you because the wording came from your homepage. Use the language buyers use, even when it's uglier.

Running a single model is the next trap. Your story across ChatGPT, Perplexity, and Gemini will differ. One model gives you a clue. Three models give you a pattern.

Skipping competitors is the one I see most often on a tight first audit. Your number means little without theirs. Competitor data tells you whether you have a positioning problem, a coverage problem, or both.

Scoring only on yes or no hides too much. A last-place, neutral mention on a comparison prompt is closer to a loss than a win. Track position and sentiment from day one if you can.

Treating the first audit as a final answer. One run tells you whether you can appear. Repeated runs tell you how stable that appearance is. Start manual, then decide whether the trend deserves automation.

What to do after the audit

The baseline is only useful if you act on it. In the week after your first audit, work through this list.

Fix the five worst high-intent prompts

Pick the prompts with the strongest commercial intent where you didn't appear. Each one needs one of four fixes:

a new page
a rewrite of an existing page
a third-party placement
a source correction

Your job is to match the fix to the reason you were missing. Don't write a new page when every cited source is a third-party listicle.

Set up a weekly re-run

Same 30 prompts. Same models. Same scoring. The trendline matters more than the snapshot. One bad week means little. Three bad weeks in a row means something changed upstream.

Scale the prompt set after 30 days

Grow to 50 prompts, then 100. Add regions if you sell internationally. Add run depth if leadership wants a more stable number.

Use these as rough planning bands from our own LoudFace audits, not industry benchmarks. Your category may behave very differently:

Category leader: 60–80%
Challenger: 25–45%
New entrant: 10–20% in the first 90 days, 30%+ inside year one

For reference, one public B2B SaaS audit dataset from VisibleIQ reported the average company appearing in around 16% of buying-intent prompts. Treat that as a sanity check, not a target.

If your number is above the band for your stage, you may be underrating your AEO position in sales calls. If it's below, you finally have a measurable problem instead of a vague fear.

Don't automate too early

If your first instinct is to buy a monitoring tool before you've fixed the obvious gaps, you're paying to watch yourself lose. Clean up the first 30 prompts manually. Then automate.

Share-of-Answer Planning Bands by Company Stage

Stage

Category leader70%

Challenger35%

New entrant (year one)30%

New entrant (first 90 days)15%

These are rough planning bands from LoudFace audits, not industry benchmarks. Your category may behave differently.

Source: LoudFace client audits across fintech, payroll, and developer tools categories

Average Share of Answer for B2B SaaS

16%

Average appearance rate in buying-intent AI prompts

A sanity check, not a target: most companies start well below where they want to be.

Source: VisibleIQ public B2B SaaS audit dataset

Where this method comes from

This playbook draws on three sources. First, manual audits we've run for LoudFace clients across categories like fintech APIs, payroll, and developer tools. Second, the way current AI visibility platforms (Peec AI, Profound, AthenaHQ) structure their measurement around prompts, models, position, and sentiment. Third, public research on AI citations, including Ahrefs' analysis of 26,283 ChatGPT source URLs (which found "best X" lists were the single most prominent page type, and that brands were more likely to be cited through third-party sources than their own domain) and a follow-up Ahrefs study of 863,000 keywords showing that only 38% of AI Overview citations now come from Google's top 10 results, down from 76% a year earlier.

The takeaway from all three: AI visibility is real, measurable, and only loosely tied to traditional rankings. A manual first audit is a fair way to find out where you stand.

Want a second read on your share of answer? Book a discovery call and we'll run your prompt set against your category and send the report back. For broader context on the discipline, see the complete guide to AEO in 2026 or our ranked list of AEO agencies for B2B SaaS.

Google AI Overview Citations from Top-10 Organic Results

Share of AI Overview citations from Google top 10 (%)

76%

One year ago

38%

Now

AI visibility is only loosely tied to traditional rankings — a gap that a share-of-answer audit is designed to expose.

Source: Ahrefs study of 863,000 keywords

How to Run a Share-of-Answer Audit on Your Category in 90 Minutes