We often assume that when Google's AI (via Gemini) answers a query it “reads” and understands the entirety of the web pages it consults. That’s false. In reality, a large portion of your content is ruthlessly filtered out before it even reaches the language model. the latest analyses by Dan Petrovic, an AI SEO expert, Google uses a radical “extractive summarization” process that retains, on average, only one third of your text. Understanding this mechanism is now vital if you hope to appear in the AI’s answers.
Key takeaways:
- The filter is drastic: On average, only 32.16% of a page's content survives the "grounding" process to be submitted to AI.
- Relevance beats marketing: Factual information (prices, specifications, processes) passes the filter, while marketing fluff, navigation, and legal notices are removed.
- The compression effect: The more sources Google consults for an answer (the number of snippets N), the more it shortens the text taken from each site.
The 5 secret steps of the generation process
To grasp the issue, you need to understand what happens between the moment the user types their query and the moment the answer appears. It’s not a linear reading, but a five-step mechanical process:
- User prompt: The user asks their question.
- The Fan-out : Google launches several parallel queries (fan-out queries) to cover the topic.
- The slicing (Grounding): This is the critical step. The system generates shortened, cleaned versions of source pages. This is where content is filtered.
- Sending to the model: These fragments (snippets), and only them, are sent to the model as context.
- Generation: The model drafts the final response and adds the citations.
The central point is step 3. If your key information isn’t in that “anchor fragment”, it simply doesn’t exist for the AI.
The anatomy of survival: what remains and what disappears
Comparative analysis across several sites (like Owayo or OlikSport) shows huge disparities. Some sites have nearly 65% of their content preserved, others barely 20%.
'Green' content: what AI keeps
To maximize your chances, your content must provide a dense, factual answer. The elements that systematically survive the filter are:
- The main offer: The concrete description of what you sell or offer (e.g., 'manufacture of custom running jerseys').
- Customization options: Details about colors, designs, adding logos or text.
- Processes: The 'step-by-step' approach (e.g., how to use the 3D configurator, how to order).
- Numerical data: Prices, product technical specifications, precise lead times.
- Customer support: Help, contact, or specific FAQ mentions.
'Red' content: what AI removes
Conversely, the algorithm removes everything it deems to be “noise.” The following are systematically excluded:
- Navigation and structure: Menus, footers, and generic section titles.
- Empty promotional copy: Phrases like "Up to 50% off" or purely marketing, non-descriptive slogans.
- Off-topic categories: If the user searches for "running clothes", the AI will remove paragraphs about football or hockey jerseys that are on the same page.
- Verbatim customer reviews: The AI tends not to reproduce customers' exact quotes ("5/5, great product"), preferring to summarize the overall sentiment.
- Legal: Copyright notices, business addresses and terms and conditions.
The theory of compression: more sources, less text
A fascinating finding of this study concerns the engine’s “compression” behavior. There is a mathematical relationship between the number of sources used (N)L).
The data show that when Google increases the number of sources used to build its answer (going from 4 to 10 results, for example), it reduces the average length of each excerpt. It follows a power law (with an exponent beta ≈ 0.07).
In simple terms : the AI has an “attention budget.” If it must consult 10 sites to answer, it will read less text on your site than if it consulted only 3. It “compresses” the information to fit it into its context window. That means that the more a topic is competitive and requires sources, the more concise and dense your content needs to be to hope to be selected.
Optimizing for the "Grounding"
The main takeaway from this work for SEO is that optimization no longer depends solely on keywords, but on the information densityGoogle performs an "extractive summary".
Your goal is not for the AI to "read" your page, but for the right parts of your page to land in the context snippet. As the OlikSport example (64.79% coverage) versus Gobik (20.97%) shows, the structure of information plays a decisive role. Content structured around user questions, free of marketing fluff and rich in factual data, is statistically more likely to pass the "Grounding" filter.
Discover the Grounding Snippet Generator tool by Dan Petrovic
The article “AI Search: why Google ignores 70% of your content (and how to survive the filtering)” was published on the site Abondance.