Why is my GEO score wrong?

Most GEO tools report a page-level score that averages across all content blocks on a page. But AI retrieval systems operate at the block level - RAG pipelines chunk your content into 134–167 word segments before scoring and selecting. A page with a GEO score of 68 might contain blocks scoring above 80 and blocks scoring below 30. The high-scoring ones get cited; the low-scoring ones get ignored. The page-level score hides the gap.

What is block-level scoring for AI citation?

Block-level scoring evaluates each individual content chunk - the 134–167 word segment that RAG retrieval systems embed and score - rather than the overall page. The Citation Probability Score® (CPS®) measures each block across five pillars: Content Structure, Fact Density, Answer Structure, Self-Containment, and Freshness Signals. This identifies exactly which paragraphs are being cited and which are being skipped, and why.

What is the optimal content block size for AI retrieval?

The optimal chunk size for AI retrieval, based on how most production RAG pipelines are tuned and validated against real citation behaviour, is 134–167 words. That's one strong paragraph, or three tight ones, or a complete standalone answer with a fact and a conclusion. Blocks shorter than 134 words lack sufficient context; blocks longer than 167 words dilute the semantic signal.

How does block-level scoring improve AI citation rates?

Block-level scoring identifies the specific paragraphs on your highest-value pages that are underperforming in AI retrieval. The fix is usually structural: leading with a direct answer, tightening word count into the 134–167 word range, adding one specific statistic, and removing cross-references that make the block context-dependent. These changes don't move page-level GEO scores much. They move actual citation rates significantly.

Analysis & Opinion

Why Your GEO Score Is Wrong (And What Block-Level Scoring Fixes)

Published: 26 March 2026 Author: Cited By AI® Reading time: 8 min

Originally published on LinkedIn Pulse. This is the canonical version.

Version 1.0 | Published 26 March 2026 | Last verified: 26 March 2026 | Source: citedbyai.info AI Visibility Intelligence

Most GEO dashboards give you a number. Some give you a trend line. A few give you a sentiment breakdown. None of them tell you which paragraph got cited - or why the one next to it didn't.

That's the gap. And it's not a minor oversight in how AI visibility tools are built. It's a fundamental misunderstanding of how AI retrieval actually works.

The unit of AI retrieval is not your page

When a large language model generates an answer that cites your content, it doesn't read your page the way a human does. It doesn't absorb the narrative arc, pick up the thesis in paragraph one, and carry it through to your conclusion.

It retrieves chunks.

Specifically: RAG (Retrieval-Augmented Generation) systems - the architecture behind Perplexity, ChatGPT's web search mode, Gemini, and most AI search surfaces you care about - break web content into discrete text segments before they ever generate a response. Each segment gets embedded, scored for relevance, and either pulled into the answer or discarded.

The operative chunk length, based on how most production RAG pipelines are tuned and validated against real citation behaviour, sits in the 134–167 word range. That's one strong paragraph, three tight ones, or a complete standalone answer with a fact and a conclusion.

Your page might have twelve of these chunks. Four might be strong. Eight might be invisible to AI retrieval - not because your content is bad, but because those sections aren't structured to survive the chunking process.

The core problem: A page with a Citation Probability Score® (CPS®) of 68 might contain three blocks scoring above 80 and five blocks scoring below 30. The high-scoring ones get cited. The low-scoring ones get ignored. Your dashboard shows you a 68 and tells you things are fine.

What page-level scoring actually measures

To be fair to the tools that report page-level scores: they're measuring something real. Domain authority signals, structured data presence, crawlability, freshness - these matter. Getting the technical foundations right is a precondition for citation, not an afterthought.

But page-level scoring is a blunt instrument when the question you're trying to answer is: which parts of my content is AI actually using?

Here's a concrete example. Say you're a B2B SaaS company. You have a 1,200-word service page covering what you do, who you serve, your pricing model, and a case study. A user asks Perplexity: "Which B2B SaaS companies offer transparent usage-based pricing?"

Your page is crawlable. It has schema. It's indexed. Your GEO score is respectable. But the section about pricing is 90 words buried between a header and a CTA. It doesn't open with a direct answer. It references a table that's image-rendered and therefore invisible to the retrieval system. The fact density - named figures, specific percentages - is low.

Perplexity retrieves the chunk. It scores low for the query. Your competitor's shorter, denser, more direct answer gets pulled instead. Your GEO dashboard doesn't show you this. It shows you the page score. You don't know the gap is there.

What the five block-level pillars actually capture

The Citation Probability Score® evaluates each content block - each retrievable chunk - across five pillars. Here's what each one measures and why it only makes sense at the chunk level, not the page level.

Content Structure
Is the chunk in the 134–167 word optimal range? Does it open with a direct answer rather than a throat-clearing setup? AI retrieval systems favour content that gets to the point immediately. A block that spends its first two sentences restating the question it's about to answer loses those 40 words to setup that contributes nothing to retrieval scoring.
Fact Density
How many named entities, statistics, and specific claims appear per 100 words? AI systems disproportionately retrieve content that contains verifiable, specific information. Same information, completely different retrieval probability depending on specificity.
Answer Structure
Does the block open with a declarative statement that directly addresses the query it might be matched to? Retrieval systems match your content to user queries at the chunk level. A block that buries its conclusion scores lower than one that states it upfront.
Self-Containment
Can this block stand alone — without the surrounding page — and still communicate a complete idea? If a chunk says "as mentioned above" or "see the table below" or "which we covered in the previous section," it fails in isolation. AI retrieval doesn't know what "above" is. The chunk is all it has.
Freshness Signals
Does the block contain date markers or recency language? This matters especially for Perplexity, which actively weights content freshness in its retrieval ranking. A statistic without a year attached is less citable than the same statistic with "as of Q1 2026."

These aren't abstract criteria. Each one maps to a specific, observable behaviour in how RAG pipelines score and select content. And each one only makes sense measured at the chunk level — because that's where the retrieval decision actually happens.

The specificity gap in practice

The Fact Density pillar is worth dwelling on because the gap between high-scoring and low-scoring content is most visible here.

Low CPS® — vague

"Pricing varies by usage and can be customised for enterprise clients."

No verifiable signal. AI retrieval skips it.

High CPS® — specific

"Pricing starts at $0.008 per API call, with volume discounts applied above 500,000 requests monthly."

Two verifiable signals. Same information. Cited.

Both may rank on Google. Only the second gets cited by ChatGPT. The difference isn't quality — it's specificity. Adding one named statistic per paragraph is the fastest Fact Density improvement available to most content teams.

The practical consequence

If you're using a GEO tool that scores your pages, you're optimising for the wrong unit. You might be improving overall page authority while the specific blocks matched to high-value queries stay broken.

This creates a pattern that's hard to diagnose without block-level data: your AI visibility metrics look reasonable, but your citation rate in competitive queries stays flat. You're not missing because you're invisible. You're missing because the wrong paragraphs are doing the work.

The fix isn't a complete content rewrite. Often the lift comes from restructuring three or four underperforming blocks per page: leading with a direct answer, tightening the word count into the optimal range, adding one specific statistic, cutting the cross-references that make the block context-dependent. Those changes don't move your page-level GEO score much. They move your actual citation rate significantly.

Why this matters more as AI search matures

Right now, most brands are optimising for presence — appearing in AI-generated responses at all. That's the right first step. But the category is maturing fast.

Perplexity already shows named citations with source attribution. ChatGPT's web search mode selects sources at the passage level. Google's AI Mode pulls specific excerpts, not whole pages. The precision of retrieval is increasing, which means the margin between a block that gets cited and one that doesn't is narrowing.

Page-level optimisation gets you into the game. Block-level optimisation determines whether you win the specific query that matters — the one a buyer is asking at the moment they're deciding between you and a competitor.

The brands building block-level granularity into their content now are the ones who'll own specific query clusters in AI responses twelve months from now. The ones relying on page-level scores will know their overall GEO health and not much else.

What to do with this

You don't need to rebuild your entire content library. Start with your highest-value pages — the ones that should be cited when someone asks a purchase-intent query in your category. Run each one through a block-level audit. Find the chunks that are underperforming. Fix the structure, density, and self-containment issues in those specific blocks.

Then measure citation rate on those queries, not page authority.

That's the feedback loop that actually tells you whether your GEO work is doing anything. Not a dashboard number. Not a trend line. A specific query, a specific block, a specific citation.

The test: Pick your three most important purchase-intent queries. Ask Perplexity each one. Note which paragraph on your site it cites — if it cites you at all. That paragraph is your highest-performing block. The ones it skips are your audit backlog.

Get a block-level CPS® audit

Free instant check at citedbyai.info. Full audits from £49.

Get Your Free Audit →

Why Your GEO Score Is Wrong (And What Block-Level Scoring Fixes)

The unit of AI retrieval is not your page

What page-level scoring actually measures

What the five block-level pillars actually capture

The specificity gap in practice

The practical consequence

Why this matters more as AI search matures

What to do with this

Related

Get a block-level CPS® audit