Joost de Valk, creator of the Yoast SEO plugin for WordPress, has turned the “what should a good website do?” question into The Website Specification: a platform-agnostic checklist that puts HTML basics, SEO, accessibility, security, performance, privacy, internationalization, and agent readiness in one place.
The useful shift is that the AI-facing work is treated as normal website hygiene. Not a separate “AI strategy” project. Not a prompt-engineering side quest. Just another part of making the site understandable to the systems that now read, rank, quote, and retrieve it.
A platform-agnostic specification of the technical features every decent website should have — from
<title>to/.well-known/security.txt, from WCAG contrast tollms.txt. Written for humans and agents.Ten areas, mapped to widely-accepted standards.
Each topic links back to the source standard — WHATWG, W3C, IETF RFCs, WCAG, MDN, and the organisations defining the modern web.
Whether you ship WordPress, Drupal, TYPO3, Next.js, Astro, Hugo, a Django app, or plain HTML, the spec is the spec. Implementation hints follow it, not the other way round.
I like that standards-first posture. A lot of AI advice still treats the web like a pile of pages to be scraped, summarized, and maybe attributed later. De Valk pulls it back toward contracts: stable URLs, explicit policies, structured data, clean source material, and machine-readable ways to discover what matters.
From the Agent Readiness section:
Agent readiness is a loose umbrella term for the choices that make a website legible to AI agents — chat assistants, autonomous browsers, retrieval pipelines, and any other non-human client that reads the web at scale. None of it is a single formal standard. It is a collection of existing web fundamentals plus a few emerging conventions.
Agents read the same HTML as browsers, but they read it differently. They:
- Fetch a page, often without executing JavaScript.
- Strip away navigation, ads, and chrome to extract the main content.
- Follow links, structured data, and well-known endpoints to discover more.
- Cache and quote your content in answers, with or without a link back.
If your content is locked behind client-side rendering, your URLs change every release, or your robots.txt blocks the assistants your customers use, you are invisible in that surface. The pages that win in agent answers are the ones that are easy to fetch, easy to parse, and easy to trust.
That’s the part designers should pay attention to. We tend to think of the interface as the thing on the screen. But if agents are part of the audience now, the interface also includes off-screen surfaces: metadata that explains the page, feeds and sitemaps that expose what exists, crawler policies that say what can be read, and curated indexes like llms.txt that tell software what matters.
De Valk again:
There is no single switch. The items in this category each cover one part:
- Stable URLs so cached answers stay valid.
- Structured data (JSON-LD) so agents can extract entities without guessing.
- Clean semantic HTML so content extraction does not pull in navigation.
- A robots.txt that names AI crawlers explicitly so your policy is unambiguous.
- /llms.txt as a curated index of your most important content (emerging).
- Machine-readable endpoints — sitemaps, RSS, JSON feeds — where they fit.
- MCP server endpoints for sites that expose tools or actions (emerging).
Most of these also benefit traditional search engines and accessibility. Agent readiness rarely conflicts with the rest of the spec; it just raises the priority of things that have always been good practice.
De Valk’s point is simpler: agent readiness mostly means doing the old web discipline well enough that agents can actually read and trust the site.

The Website Specification
A platform-agnostic, full specification of the technical features a good website should have. Built in the open under an MIT licence.




















