Blog

LLMs.txt: the file AIs don’t want you to know about

What if the robots.txt file had a cousin dedicated to generative intelligences? That's exactly the idea behind LLMs.txt, a proposal from Jeremy Howard that could reshape the rules governing AI access to web content. No panic, no euphoria, just an evolution to watch closely.

Key takeaways:

  • LLMs.txt is a file intended to regulate generative AIs' access to web content.
  • It gives publishers a way to specify which sections of their site may or may not be accessed by AI crawlers.
  • Inspired by robots.txt, LLMs.txt is specifically aimed at data collectors used to train language models.
  • Although promising, its adoption and compliance by AI players remain to be monitored.

LLMs.txt: a new signpost for AIs

Why this file is a game-changer

Search engines have their rules. Since the 1990s, the robots.txt file robots.txt file allows websites to indicate what they do — or do not — want indexed. It's simple, effective, and a bit old-fashioned. But generative AIs like ChatGPT or Claude? They don't necessarily follow the same conventions.

The file LLMs.txt aims to fill that gap. In short, it would give publishers a way to say: “You can read this, but not that.” Or even: “Do not touch anything.” A kind of digital courtesy agreement, tailored for AI models.

A robots.txt for the LLM era?

The comparison is tempting, but not entirely accurate. Where robots.txt is (more or less) respected by Googlebot and the like, LLMs.txt is addressed directly to AI crawlersthe datasets used to train language models are different. We're talking about Common Crawl, LAION, or even the data collectors used by OpenAI or Anthropic.

So what does it actually look like?

A simple but effective syntax

The LLMs.txt file would be placed at a site's root, just like its predecessor. Inside, instructions readable by AI crawlers: general information, guidance, and links to detailed Markdown files. Here is a fictional example proposed in Jeremy Howard's documentation:

# Title

> Optional description appears here

Optional details are provided here

## Section name

– [Link title](https://link_url): Optional details about the link

## Optional

– [Link title](https://link_url) (link)

Another one example on Anthropic's website to see what it looks like in real life.

It's clear, readable, and potentially very useful. But nothing is mandatory at this stage. It's still voluntary.

This is where things get murkier. This file does not yet have a solid legal status. It's a standard proposed by the tech community (notably via Hugging Face), but compliance will depend on the goodwill of AI players.

So yes, on paper it's appealing. But we've seen how that went with robots.txt: not everyone plays by the rules.

Towards a new digital social contract?

Who has the right to read what?

This is a big question right now. Publishers are worried. Having their content scraped, digested, remixed without permission — sometimes even without credit — doesn't sit well. And we understand them.

With LLMs.txt, the idea would be to rebalance the scales. Give creators a bit more control. A minimum of consent in an ecosystem that's often too voracious.

Questions without answers (for now)

We're still at the early stages. Who will actually respect this protocol? Will it need to be backed by a legal framework? Will governments follow suit? And above all: how can you verify that your content hasn't been absorbed by a model despite your instructions?

Nothing is decided. But the initiative at least has the merit of laying the groundwork.

Why you should care (at least a little)

Even if you're not a lawyer, developer, or publisher, this topic concerns you. Because it touches a sensitive point: the value of what we publish. On a blog, a newsletter, or an e-commerce site, your words have value. And these files might be the first building blocks of a form of digital respect.

Some things to watch

  • Upcoming protocol updates
  • Positions of the web giants (Google, Meta, OpenAI…)
  • How CMSs like WordPress will incorporate this logic

And honestly, who wants to see their content feed AIs without even a "thank you" in return?

One last thing: it’s going to move fast

There’s no need to rebuild your entire site today. But keeping an eye on the topic isn’t overkill. As with much in the digital world, things progress quietly… then suddenly shift.

LLMs.txt isn’t a magic wand. More a signal. A gentle alert. And maybe the beginning of a fairer relationship between AIs and those who feed the Internet every day — you, us, everyone who writes, shares, and creates.

The article “LLMs.txt: the file AIs don’t want you to know about” was published on the site Abondance.