Blog AI & SEO

Behind the scenes of ChatGPT Search: reverse engineering web.run, fan-outs, and the new visibility rules

When OpenAI changed the default model on March 4, the number of websites cited per answer dropped by one-fifth and never recovered. But this decline is only part of the story. We also reverse-engineered ChatGPT’s internal browsing tools, ran a honeypot-style experiment, reconstructed its system prompt, and released a new version of our ChatGPT Search Capture plugin.

What happened

On March 4, 2026, ChatGPT switched its default model from GPT-4o/5.2 to GPT-5.3 Instant. Result: the average number of unique domains cited per answer fell from 19.1 to 15.2, a drop of more than 20%. The number of unique URLs per answer followed the same trend, going from 24.1 to 19.1.

We tracked 400 daily prompts over 14 weeks, relying on monitoring data provided by Meteoria. All results are published as an eight-part interactive study on think.resoneo.com/chatgpt/5.3-5.4/.

Why it matters

ChatGPT has 900 million weekly active users. The citation surface in each answer hasn’t changed, but fewer websites share it. The same cake, but with fewer slices. This likely reflects a structural shift toward higher-authority sources, but it also means fewer overall winners. Sites that fail the selection lose visibility they previously had access to.

The Bigfoot Effect

We named this phenomenon in reference to the “Bigfoot update” identified by Dr. Pete (from Moz) in 2013, when Google sometimes let a single domain occupy the entire first page…

ChatGPT now retrieves fewer domains per answer, but the URLs-per-domain ratio has remained stable at 1.26. Crawl depth per domain hasn’t changed. What changed is the number of distinct websites that get a seat at the table.

GPT-5.4 Thinking further intensifies this concentration. The model uses operators site: to restrict searches to trusted domains and distributes its queries across often more than 10 “fan-out queries” per answer, each targeting a specific source.

Independent log analysis by Jérôme Salomon (Oncrawl) confirms this trend. Crawl volume from the ChatGPT-User bot has stabilized at a lower level since the move to 5.3. Some pages are simply no longer crawled. The cause goes beyond model updates: over 90% of ChatGPT’s weekly users are on a free plan, and the default experience triggers fewer web searches, uses fewer queries, and produces fewer citations.

How ChatGPT Search Really Works

Our study also presents a complete reverse engineering of ChatGPT’s internal search system, named web.run. Before 5.3, the model sent compact textual commands separated by pipes (|). After 5.3, it sends structured JSON objects with typed parameters. This is not a simple format change. It reflects a different architecture in how the model formulates and distributes its web operations.fast|query|recencyIt is not a simple format change. It reflects a different architecture in how the model formulates and distributes its web operations.

The web tool now supports 12 operations, up from 4 previously (plus a separate widget system, "genui"). Notably, it includes: search_query, open, find, click, screenshot, product_queryand specialized widgets for sports, finance, weather, etc. GPT-5.4 can chain 5 to more than 10 search rounds per response, refining its queries based on previous results. GPT-5.3 Instant typically settles for 2 or 3.

Google's traces remain visible: Google tracking markers (strlid) appear in product URLs, and the ID-to-token mappings of SearchAPI reveal the backend's dependence on third-party search providers, and on Google in the background.

A New Type of Fan-Out for Product Queries

We uncovered a previously undocumented type of fan-out: Browse rewritten queries. It appears exclusively on product queries, on 5.4 Instant, and is visible in the conversation code.

When a user asks a question like "best 3D printer to buy in 2026", ChatGPT starts by launching a single rewrite fan-out to build the complete list of candidate products. Then it launches a separate shopping fan-out for each individual product, retrieving specifications, reviews and prices one by one. Before 5.3, product searches were grouped into a single call. Each product now benefits from its own dedicated retrieval command.

ChatGPT-User Is the Retrieval Agent

Our honeypot experiment confirmed an important detail. When ChatGPT browses the web following a search during a conversation, it's the crawler ChatGPT-User, not OAI-SearchBot, that fetches page content. OpenAI describes OAI-SearchBot as the agent that builds ChatGPT's search index, but in practice the model relies on third-party scraping APIs to obtain search results, then sends ChatGPT-User to retrieve the actual content of the selected URLs.

The Blind Spot of Namespaces: ChatGPT's Flaw

This is perhaps our most surprising discovery.

The trail began with classic reverse engineering. We decompiled the ChatGPT mobile app, dissected the web client source code and sniffed network packets on both platforms. That gave us the names of internal tools and some calling conventions. Armed with these precise elements, we were able to ask the right questions to ChatGPT — and discovered that the model answered them without any restriction.

OpenAI has put real protections around its system prompts. But the internal tools configuration layer has none. The namespaces internal tool groups of ChatGPT—these groups of internal tools that the model can call during a conversation—are freely describable. As long as you avoid the words "system prompt", the model will disclose tool schemas, lists of operations, output channels and namespace structures with perfect consistency.

We published ready-to-use prompts that anyone can paste into ChatGPT to audit their internal environment. To verify whether the model was hallucinating these descriptions, we conducted a participatory study with dozens of users in separate sessions. Each participant received exactly the same tool names, the same parameter schemas, and the same lists of operations. The model describes its own tooling consistently, so it is reliable ^^

The study also includes a system prompt reconstructed by progressive extraction, accompanied by several notable pieces of information: Reddit is the only domain exempt from copyright-related word limits, there is a granular list of banned products, a “verbosity score” operates on a scale from 1 to 10, and a full advertising policy paragraph governs ad display by subscription level.

Practical Use: Conduct Your Own Crawlability Audit

The web.run syntax we documented is not just a technical curiosity. It works, and it opens a direct way to test how ChatGPT interacts with your content.

Here is a concrete example. You can force ChatGPT to search your domain and read specific pages by pasting JSON commands directly into a conversation.
First, trigger a targeted search on your site, then force it to fetch the first two retrieved results; next, ask it to return the title, the main topic, and the key points of each page.

Search for this query, then open the first two results and summarize what you find on each page.

Step 1 : Search:

{
"search_query": [
{ "q": "site:abondance.com seo" }
],
"response_length": "short"
}

Step 2 : Open the first two results:

{
"open": [
{ "ref_id": "turn0search0" },
{ "ref_id": "turn0search1" }
]
}

Step 3 : Give me a structured recap of what you found on each URL. For each page: the title, the main topic, and 3-5 key points.

What you get is a view of your content through ChatGPT’s eyes: what it can actually reach, what it extracts from it, and how it interprets your pages. If it can’t access a page, returns confused content, or completely misses your main messages, that’s a signal to act on.

To go further, RESONEO’s Chrome extension “ChatGPT Search Capture” (V3.3, free on the Chrome Web Store) lets you see the exact URLs retrieved during any ChatGPT conversation, including fan-out queries (except for 5.3 Instant, which now runs server-side), ref_ids, and model metadata. Combined with the manual JSON commands above, you get a lightweight but actionable extractability audit: which URLs it surfaces, and what it actually extracts from them.

Same Model Family, Different Citations

GPT-5.2, 5.3 and 5.4 share the same cutoff date (August 2025) and belong to the same GPT-5 family. Yet the same prompt sent to each produces different fan-out queries, retrieves different sources, and surfaces different passages in the final response.

Several layers of divergence act after pretraining: RLHF reward shaping, supervised fine-tuning data, system prompt configurations, and inference compute budgets. GPT-5.4 Pro explicitly receives more compute to “think more intensely,” and that alone can change which sources are cited.

That is why we recommend testing model by model. A single prompt can produce radically different citations depending on whether the user is on GPT-5.3 Instant, 5.4 Thinking, or 5.4 Extended. Free-plan users may also be silently routed to a slimmed-down model.

Two Types of AI Visibility

Our study presents an analysis framework separating the parametric visibility (what the model knows thanks to its training data, with search disabled) from the dynamic visibility (what it retrieves in real time, with search enabled).

Parametric Visibility: The E-E-A-T of LLMs

Parametric visibility is the E-E-A-T equivalent for large language models. It is the authority encoded across billions of training examples, shaped by press coverage, presence on Wikipedia, other major authoritative sites and the training corpus as a whole. It is stable and measurable via one-shot audits through an API.

Dynamic Visibility: Shifting Ground

Dynamic visibility, by contrast, is volatile. It depends on the model and requires continuous monitoring. It is closer to traditional SEO and can collapse overnight with a model update, as illustrated by the Bigfoot effect.

The Link Between the Two

The link between the two matters. The model formulates its web requests by targeting sources it already knows. A brand absent from parametric memory will not even be considered as a search candidate. Being unknown to the model means being invisible before the search even begins.

Cutoff date updates are the LLMs’ equivalent of the "Google Dance." When the cutoff date changes, parametric rankings are redistributed en masse. But this happens only about once a year, because retraining at that scale is extremely costly. The strategic window to influence what the model knows about your brand lies between two cutoff dates.

Dan Petrovic’s AI Brand Authority Index (DEJAN) illustrates large-scale parametric measurement. Our study complements it with a lighter, reproducible testing framework based on five prompts executed multiple times for a one-shot audit.

Further Reading

The full study (reverse-engineered documentation, honeypot experiments, DIY audit prompts and reconstructed system prompt) is available at think.resoneo.com/chatgpt/5.3-5.4/.

In Summary

ChatGPT Search is no longer a black box. This study maps its internal architecture, from the web.run tool that powers each search to the fan-out logic that decides which domains are retrieved and which are ignored.

The 20% drop in cited domains after the move to 5.3 shows how quickly the citation landscape can shift with a single model update. But the underlying problem is structural: ChatGPT concentrates its citations on fewer websites and applies a source-selection logic shaped by training data, post-training fine-tuning and system-prompt rules that vary from one model to another.

Tracking visibility in ChatGPT means understanding two distinct layers (parametric and dynamic), testing across multiple models, and monitoring a system whose internal tools can be documented but whose behavior can change overnight.

Thefull study provides the data, methodology and tools to get started.

The article “Behind the scenes of ChatGPT Search: reverse engineering web.run, fan-outs, and the new visibility rules” was published on the site Abundance.