Skip to content

PJ Onori built a tool that A/B tests his design system against AI agents, and he’s careful to say it isn’t impressive:

Two groups of agents get spun up, and both are given the same prompt to make an interface. One group’s given the old design system. The other is given our new one. Each agent provides feedback on problems faced after it’s done. Once all agents finish, the builds are evaluated on a bunch of crap and a report is generated.

The list of what the tool measures is long: timing, lines of code, code variance, fix attempts, components used, accessibility, performance, inline styles, visual diff, token usage, agent feedback. Onori, on the test he ran when he wasn’t sure his documentation was actually doing the work:

I was starting to question if documentation was making things better. Maybe component improvements was doing the heavy lifting–who knows? So, I ran a couple tests without documentation… The documentation was clearly the heavy lifter. […] Documentation is essential for systems that agents don’t have a lot of reps with. I’ve started to add a “For agents” section in the docs. That section is the dumpster for “get it in your silicon head” training.

The “For agents” section is a small idea with a real implication. Documentation has historically been written for one audience. Now there are two, and as Onori says elsewhere in the post, the second one needs “the same damned point” repeated five or six times and doesn’t care if the prose is ugly. His instinct is to wall that off so humans don’t have to read it.

Onori is publishing measurements where most people are publishing takes. That’s the missing piece in the design-system-as-moat argument: somebody actually testing whether agents do better with a well-built system than a worse one, and showing the numbers. Onori, on the closing caution:

There’s a lot of noise in the output, feedback, and analysis–otherwise know as everything. That noise compounds fast. Think of the telephone game–then think about what that’d do to a design system. […] Feedback needs to go through a BS filter. […] The feedback part of the analysis is helpful. Make no mistake. But it needs to heavy interpretation.

The telephone game is the right picture. A design system that updates itself based on agent feedback that’s been generated by other agents and analyzed by a third agent is going to drift somewhere strange in a small number of iterations, and nobody on the team will be able to reconstruct why. Onori’s tool stops short of that on purpose: it produces measurements, and a person reads them.

Stippled illustration of a person sitting at a desk, leaning forward and writing or working on something.

Testing agents on design systems

It’s really easy to say agents are able to use a design system. It’s another thing to prove it.

pjonori.blog iconpjonori.blog

Marcus Moretti’s guide to agent-native product management, in Every, is the orchestration shift showing up on the PM side of the team. The guide opens with the 1930s Procter & Gamble origin story: someone owns the product. The job has been rewritten so many times since then that PMs are now expected to be design partners, diplomats, sales people, and statisticians on top of running the 100+ software subscriptions the average company buys. What’s interesting is that the piece is describing the old role, finally legible again now that agents can absorb the administrative debt that piled up on top of it.

Now, much of the interdisciplinary work that goes into product management can be done by an LLM in minutes, sometimes seconds. What used to be a three-hour-long analytics investigation is now a simple back-and-forth with Claude. A product review that used to be a fortnightly chore emerges from a single typo-ridden chat message. This has been my recent experience, at least. I no longer struggle with semicolons in SQL queries or even write tickets. All of my product management work happens in conversation with, in my case, Claude Code. The conversation is the work.

“The conversation is the work” sounds like a description of the new job. Read it next to the 1930s origin story and it’s a description of the old one. The Brand Man at P&G wasn’t writing SQL; he was deciding what the product should be and who it was for. The intervening ninety years of accumulated tooling—agile ceremonies and ticket hygiene, analytics dashboards on top of those—was friction PMs had to push through to get back to the actual work. Moretti’s /ce-strategy command, modeled on Richard Rumelt’s Good Strategy Bad Strategy, isn’t a new artifact either. Strategy documents predate LLMs by decades. What’s new, Moretti says, is the cadence: every few months, the agent re-runs the strategy interview with the accumulated context of everything you’ve shipped.

Writing a strategy document cold is hard. The best way to do it, I’ve found, is to have an agent interview you. The ce-strategy skill does this. It runs through the sections in order and has built-in guidance about what makes a good answer (and what kinds of answers to push back on). […] The interview is deliberately conversational. If the first answer to, “What’s the core problem this product solves” is vague, the agent drills down: “Whose situation specifically? What do they try today, and why doesn’t it work?” The guidance here is taken from personal experience and from the Rumelt book.

The guide assumes a PM who has the taste to recognize when the agent’s follow-up has exposed a gap. The ones who don’t will end up with a strategy.md full of confident-sounding nonsense, generated quickly and reviewed lightly. Agent-native PM removes the alibi that you were too busy with tickets to do the actual thinking. That maps to a warning from Raj Nandan Sharma: when generation gets cheap, the scarce skill is refusal: knowing what to throw out and why. Moretti’s PM is doing exactly that, sentence by sentence, in the strategy interview.

Moretti closes:

LLMs have allowed our tools to catch up with the multifaceted duties of product managers. For me, product management has been reduced to the interesting parts: dreaming up features, thinking through designs, looking at interesting data, and talking to users. We all feel the economic imperative to embrace AI tools, but the better reason, I think, is to make work more fun.

Hand-drawn letter "G" in black chalk-style script on a light blue background, with a black bookmark icon in the top-left corner.

A Guide to Agent-native Product Management

A step-by-step guide to using agentic capabilities for better product management

every.to iconevery.to

Nick Babich on agents in UX Planet. A useful pair to his earlier writeup on Claude skills, since the two words get used interchangeably and they are not the same thing. Babich opens with the plain-language version:

Think of an AI agent as a program you run when you need to solve a particular problem in design. For example, you can create an AI agent that helps you with usability testing, code review, UI/UX audit, etc.

A program you run is the right mental model. A skill, the way Babich described it in his earlier piece, is a recipe: a markdown file Claude reaches for when a task matches. An agent is what runs once Claude has the recipe in hand. It carries state across steps, picks tools, reports back.

Babich’s four attributes of a well-designed agent get at that distinction without saying it out loud:

  1. Good clarity (intent alignment). A strong agent understands what success looks like, not just the task. This understanding helps it translate vague prompts into clear objectives.
  2. Context awareness. Good agents maintain and use context effectively. Not only do they remember previous steps, constraints, and user preferences (which is well-expected behavior nowadays), but they also adapt output based on the environment (tools, data, stage of workflow).
  3. Tool orchestration. Agents can perform the workflow autonomously and they have the ability to use the right tools for a task at hand is what makes an agent so powerful. Well-crafted agents can chain tools together into workflows, and they don’t overuse tools when simple reasoning is enough.
  4. Explainability (transparent reasoning). When you interact with an AI agent, you need to understand why something happened. Thus, an AI agent should provide a rationale behind decisions surface assumptions, and trade-offs.

Context awareness and tool orchestration are what separate an agent from a prompt template. A skill can ship intent alignment and explainability in plain markdown, but state across steps and the ability to chain tools require a runtime. That’s why Babich’s specs include Boundaries sections and “When Not To Use It” blocks: a stateful, tool-using program needs guardrails that a one-shot prompt does not.

If you haven’t built one yet, his five specs—Research Synthesizer, Competitor Intelligence, Problem Definition, Idea Generation, UX Flow Designer—are a clean starter pack. Pick the one closest to a workflow you already do by hand, and notice how much of the spec is about what the agent will not do.

3D illustration of an orange robot head with a maze inside its open skull, glowing circuit lines extending outward to orange cube nodes.

Agentic Product Design

5 design tasks you can automate with AI today

uxplanet.org iconuxplanet.org

Tommy Geoco’s $13,100 OpenClaw harness, ninety days in, is one way to build a personal AI agent. Anton Sten went the other way. He tried OpenClaw and Hermes, found the setup was “days, sometimes weeks, for minutes of return,” and built something smaller. Five Claude Code instances on a Mac mini, named after Suits characters, each handling one role. Architecture is a shared repo and a pile of markdown files. That’s it. Most AI-agent posts pitch what Sten calls “a team of bots that runs your business while you sleep.” His basement firm is the inversion.

Sten on what he actually wanted from his agents:

What I actually wanted was smaller. A handful of tools, each with a narrow job, that I could build in an afternoon and shape around how I actually work. So that’s what I did.

The names of his AI agents are from the show Suits (with Wendy borrowed from Billions), picked so the show’s personalities double as memory aids for each agent’s job. Harvey handles contracts and pricing. Donna takes Harvey’s notes and drafts the emails and follow-ups. Mike stores what Sten would otherwise forget. Louis worries about money. Wendy reads the others’ logs and points out where they’re slipping.

Sten on the autonomous-revenue pitch:

The team in my basement isn’t running anything autonomously. They don’t make decisions for me. If I unplugged the Mac mini tomorrow, my business would keep running. The conflation in the current AI conversation — between playing and building a thing that prints money — is the part I find a bit tiring. They’re treated as the same activity, when they’re almost opposites.

Sten’s right that the autonomous-revenue pitch is a fantasy. Less right on the binary that follows. Geoco’s harness is doing meeting prep, ingesting his survey research, and distributing his content across ten platforms while he sleeps. That counts as “running while you sleep,” and his $50,000 in sponsorship revenue from one survey project isn’t trivial. Play and revenue can sit on the same side. What matters is whether the human stays in the loop. Geoco does, and so does Sten.

The shape of what they’re building is also the same. The Harvey-to-Donna handoff Sten uses most and Geoco’s survey-prep loop are both the specialization-is-the-whole-game pattern: narrow specialists, human in the loop, work compounding into the system. Sten calls it play and Geoco calls it work. The architecture underneath does the same job either way.

Sten on practice:

I’d argue this is the business case for designers right now. Not the agents specifically — the playing. Because in a year or two, every job worth having is going to assume you understand how these tools work, and the only way to understand them is to spend time in them when nothing’s on the line.

The people who’ll do interesting work with this stuff in two years are the ones playing with it badly today.

Geoco is what Sten’s last sentence predicts. The person playing badly today is the person doing interesting work in two years. Sten describes that person as hypothetical. Geoco isn’t.

The basement firm

There’s a Mac mini in my basement running a small consulting firm. Five employees, all named after TV characters, none of them human. They take notes, write drafts, remember things I’ve forgotten, argue with my financial instincts, and occasionally tell each other to do better.

antonsten.com iconantonsten.com

Tommy Geoco spent ninety days and $13,100 tinkering with OpenClaw. His agent runs his capture loop, prepares his meetings, codes the survey for the state-of-prototyping report his studio shipped, and distributes his content across ten platforms. Tom describes the harness like this:

When you install OpenClaw, it is like a starter kit project car. It is a car frame with a swappable engine. The engine being any AI model you choose to use. It is basically a folder that you install onto your computer that contains about seven markdown files. […] When you stop thinking of a custom agent as just a chatbot and start thinking of it like an operating system, some useful questions are going to start to pop up like where does the memory live? What is the source of truth? How do I enforce my rules better? What should stay manual?

The seven files are plain text. soul.md holds the agent’s voice and judgment, agents.md defines permissions, memory.md handles long-term recall, and four others cover identity, the user, tool instructions, and a heartbeat. Tom layers an Obsidian vault on top as long-term knowledge and Slack as the chat surface. Tom on what actually limits an agent:

The agent’s limitations aren’t just about the model. They’re a lot more about the system that you have built around it because you can’t control the quality of the model, but you can control the quality of the system. […] The most important part of my setup is the knowledge vault. This is my alternate memory, and it is built around the work that I actually do.

Geoco says curation is what keeps the whole thing from drifting. The agent runs the loops on top of a vault Geoco curates, and the taste lives with him; the model itself is interchangeable. The challenging part is somewhere else entirely:

The most challenging part of this whole thing is the unlearning. Many of us have old habits that have calcified into our brain. It is why my 17-year-old is able to run laps around us. He has no baggage about how things are supposed to work.

Geoco is right that the unlearning is where the difficulty lives. The harness is just markdown and the model is rented; the orchestration skill Benhur Senabathi described as what designers actually picked up in 2025 is what you practice through the unlearning. Geoco closes the video by saying nobody’s harness is right and everybody’s works for them, which sounds about right to me too.

How I Built an AI Agent That Designs Like Me

This is a practical breakdown of what an OpenClaw agent is, and how I use it for my design and media studio.

youtube.com iconyoutube.com

George Anders, in the Wall Street Journal, makes the case that the 1920s offer a usable template for the AI decade. His strongest evidence is the spillover-jobs data:

By 1930, more than 80,000 people were working as electricians, a profession that hardly existed a decade before. Census data also showed that 168,000 people were working in rubber factories, most of them making tires to accommodate Detroit’s booming production of cars, trucks and buses. Another 450,000 people were building roads, bridges and other structures needed by the ever expanding auto industry.

The ATM parable had the same problem: the version that ends in 2010, with bank-teller employment intact, is the one we love to retell. The version that ends in 2022, with teller jobs cut in half by the iPhone, is the one we leave out. Anders’s 80,000 electricians are real. So is the question of which of them got displaced when the next technology arrived.

Anders does, to his credit, take the costs seriously. He spends a section on the radio fight:

In 1927, H.G. Wells, the British author and intellectual, called radio “inferior” entertainment that should be listened to “only by the sick, the lonely and the suffering.” David Sarnoff, general manager of Radio Corp. of America, shot back that he was trying to improve “the happiness of the nation” by delivering popular music to millions of people. Nearly a century later, that same argument still flares, though now it is more likely to involve TikTok, Reddit or YouTube, instead of dear old radio. The doubters always have a point; with the passage of time, the innovators usually win out.

The early evidence on AI’s job-creation side is thinner than the 1920s comparison flatters: Anthropic’s own researchers find a 14% drop in the job-finding rate for 22-to-25-year-olds in exposed occupations since ChatGPT launched, even as overall unemployment holds. The new electricians of our decade may exist. They just may not be the people getting hired right now.

The safety side of Anders’s case is the one I want to see more of. Cars in 1920 killed at twenty times today’s per-mile rate, and the country chose not to live with that:

Auto safety got better, too, with both industry and government taking action. Better mirrors, better brakes and shatterproof windshields became standard. Cities such as Los Angeles and Detroit installed red-yellow-green traffic lights that governed drivers’ actions on busy streets. New Jersey became the first state to insist on driver’s licenses, with the state’s motor-vehicle commissioner in 1924 declaring: “It is an absolute necessity to do this in order to conserve human life.”

Whether the next century treats our decade as kindly depends on whether we put rearview mirrors and traffic lights on AI before the death rates make us, and whether we do it under the same kind of duress the 1920s did.

Vintage black-and-white photo of an early automobile displayed in a storefront window with bold striped decorations and a sign reading "Auto Show Jan. 19-25 Auditorium Milwaukee.

What the 1920s Can Teach Us About Surviving the AI Revolution

(Gift link) A century ago, cars and radio upended society just as AI is doing today.

wsj.com iconwsj.com

Jake Albaugh wrote a piece on X called “Design is the work” that splits design from the artifacts it produces. Mocks, prototypes, screens, guidelines: those are outputs. Design itself, in his telling, is the upstream act of intent: figuring out what something should be and why, before anyone makes it. Bingo. That distinction matters now because AI is very good at the artifact and unable to do the deciding:

AI cannot do that part. You intend to do something that has not yet happened. You have to bring those parameters to the table to do anything novel. AI doesn’t know your constraints. It doesn’t know your strategy. It doesn’t know what moment in the market you’re in, what your team is trying to prove, or what your customers actually need versus what they’ve said they want. The expectation — the definition of what good looks like — is something only you can provide. AI’s job is to meet that expectation. Not to define it.

The piece made the case that intentionality has to come before execution and that AI changes neither requirement. The closer is where it gets interesting. After all that, Albaugh tells the reader he used AI to draft the essay:

It may surprise you to learn that I used AI to write this. The structure, the sentences, a lot of the phrasing — generated. But the argument existed before any of it. I knew what I was trying to say. I knew what examples mattered and which ones were wrong. I knew when a paragraph was close but not quite right, and I revised toward a target I’d already defined. […] That’s the point. The tools changed. The work didn’t. Design is the process. Design is the intentionality.

It’s a risky reveal. Most readers will read it as self-undermining at first. But the argument and the artifact are doing the same job: Albaugh had a target, and he used AI to reach it. The fact that the prose was generated is exactly why it matters that the argument wasn’t. He knew which examples belonged in the piece and which ones to throw out. The model couldn’t have known that either way, because the criteria for “good” didn’t exist anywhere outside his head until he wrote them down.

Karri Saarinen made a version of this same split when he argued that output isn’t design. The hard part is understanding the problem well enough to know what should exist at all.

A presenter stands on stage in front of a green slide reading "What should be automated? What should be left to touch?

Design is the work.

We’re in a moment where it has never been cheaper or faster to build something convincing. The cost of taking an idea and making it look real, feel functional, or seem finished has collapsed. That is genuinely good news if you already know what you’re building and why. It’s dangerous if you don’t.

x.com iconx.com

You’ve seen it in all the photos from various No Kings protests. The most-shared peace poster of the year did not start with a client. Daniel John, writing in Creative Bloq, traces Warsaw calligrapher Barbara Galińska’s two-tone “STOP WAR” piece. Galińska, in an artist statement quoted by John, describes where it came from:

My graphic “STOP WAR!” was created as a result of an international calligraphy challenge in November 2023, the main goal of which was to stimulate the creativity of artists. My reaction to the assigned theme “Stop war!” was to move away from typical calligraphy towards a powerful work that addresses the global problem of war. So my main personal challenge was to find a new, original solution to the well-known phrase “Stop war!” and transform it into a graphically powerful universal symbol for peace.

Bold typographic print reading "STOP WAR!" in red letters on a black background, signed by Barbara Galichan, numbered 1/25, displayed against a concrete wall.

Who’s behind the striking ‘Stop War’ poster that’s all over social media

The iconic typographic design is striking a chord.

creativebloq.com iconcreativebloq.com

Matt Ström-Awn, writing on his personal site, picks up a three-year-old line from Ted Chiang and turns it inside out:

Three years ago, Ted Chiang described ChatGPT as a blurry JPEG of the web. LLMs are a lossy compression of their training data, which is itself a lossy sample of all the data available to it. But the artifacts we see in AI slop aren’t in the compression. They’re in the decompression.

Every AI-generated output is an extrapolation from that blurry source, vectored toward your prompt, filling in plausible detail where the compression threw information away. The output gets inflated into blog posts and LinkedIn thoughtspam, software platforms, omnichannel advertising campaigns, and movie cameos from dead actors. Chiang compared the gaps and confabulations to compression artifacts.

I think they’re expansion artifacts.

Chiang had the compression metaphor; what we needed was a word for what these tools do on the way back out, and Ström-Awn gave us one.

Ström-Awn lists what expansion artifacts look like across modalities:

  • LLMs produce text stuffed with hedging verbs and fuzzing adjectives (delve, intricate, tapestry, multifaceted). Their paragraphs are structured as miniature essays with setup, payoff, and a signposted takeaway (This matters because…).
  • AI-generated code over-comments the obvious and creates error handlers for operations that can’t logically fail.
  • Image generators have had their own tells: six-fingered hands, symmetrical-but-stylistically-objectionable jewelry, text that looks like text but only if you cross your eyes.
  • Video models struggle with continuity. Limbs appear and disappear, objects clip through each other, and physics sometimes just switches off.

Each of these artifacts is the training distribution leaking through where the model’s confidence runs thin.

Ström-Awn writes about the designer-specific tells too:

Power users of AI website generators (AI-pilled designers) already know how to recognize the tool marks, if only to try to prompt them away: purple gradients are an especially common tell. But as more and more non-designers use tools like Claude Design to prompt their way to fully-functional software products, I expect to see a preference for the aesthetic convergence endemic to the current crop of AI models.

Matt Ström-Awn website header showing the page title "Expansion artifacts" in large bold text on a white background.

Expansion artifacts

Matt Ström-Awn · Designer, leader, and coach focused on building exceptional products and teams.

mattstromawn.com iconmattstromawn.com

Cat Wu, Anthropic’s Head of Product for Claude Code, describes the hiring filter on her team in her interview with Lenny Rachitsky:

I think all of the roles are merging. PMs are doing some engineering work. Engineers are doing PM work. Designers are PMing and also landing code. You can either hire a lot more engineers who have great product taste, or you can keep your engineering hiring the same and hire a lot more PMs to help guide some of their work. On our team, we’re pretty focused on hiring engineers with great product taste. This way we can reduce the amount of overhead for shipping any product. Like there are many engineers on our team who are fully able to end to end go from see user feedback on Twitter through to like ship a product at the end of the week with almost no product involvement. And this, I think, is actually like the most efficient way to ship something. So I think like engineer and PM are kind of overlapping and you will get a lot of benefit from having more of either. I think product taste is still a very rare skill to have and we’ll pretty much hire anyone who we feel has demonstrated this strongly.

This is what the Full Stack Builder pattern looks like as a hiring filter. The headline is the merging of roles. Wu’s own background says where the bench comes from:

Yeah, I was an engineer for many years. I was then a VC very briefly before joining Anthropic. And actually almost all the PMs on our team have either been engineers or ship code here on Claude Code. And so that’s one of the things that I think helps build trust with the team and also just enables us to move a lot faster. And then actually our designers also have been front-end engineers before.

So to be clear, Wu doesn’t say that the roles have merged, but what she’s describing is the continued blurring of lines.

How Anthropic’s product team moves faster than anyone else | Cat Wu (Head of Product, Claude Code)

Cat Wu is Head of Product for Claude Code and Cowork at Anthropic, building one of the most important AI products of this generation. Before joining Anthropic, Cat spent years as an engineer and briefly worked in VC. Today, she’s interviewing hundreds of product managers who are trying to break…

youtube.com iconyoutube.com

Maggie Appleton, staff research engineer at GitHub Next, wrote up her recent talk on agentic AI productivity. (Video here if you’d rather watch.) Her central claim comes early:

I call it this “one man, a two dozen claudes” theory of the future. The pitch here is that one person with a fleet of agents will do the work of an entire team of developers. The main problem with this dream is it assumes software is made by one person. All these tools are single player interfaces. […] Software is not made by one person in a vacuum. It’s a team sport. Everyone building it needs to agree on what they’re building and why.

The single-player critique is the missing piece in most AI productivity takes. Most demos of a coding agent show one engineer at a terminal. Designers face the same situation with AI prompt-to-code tools. Collaborating isn’t as easy as sharing a Figma link. That’s the actual gap in current tooling, and it’s downstream of the single-player assumption.

Appleton’s second move:

Implementation is rapidly becoming a solved problem, right? Writing code is now fast, it’s getting cheap, and quality is going up and to the right. The hard question is no longer how to build it. It’s should we build it. Agreeing on what to build is the new bottleneck. […] When production is cheap, opportunity cost becomes the real cost. You can’t build everything, and whatever you pick comes at the cost of everything else.

When production is cheap, picking what to make becomes the whole job. The cost difference between two engineering paths is now nearly zero, so the choice between them carries all the weight. Teams that miss this will end up shipping volume and mistaking it for productivity.

A talk like this could be about tooling, and Appleton does walk through Ace, GitHub Next’s prototype multiplayer workspace, in some detail. But the more important argument is about what you do with the hours you free up. Going faster is not the prize. Appleton:

We have an opportunity to not just go faster and build a giant pile of the same crappy software. But instead to make much better software through more rigorous critical thinking and better alignment in the planning stage. By doing more exploration, more research, and thinking through problems more deeply than we could have before.

The reclaimed hours are an opportunity, but they are also a test. Do you spend them shipping more, or do you spend them shipping better? The first answer gets you the giant pile. The second takes work the agents cannot do for you.

Appleton closes on craft:

Many people are now realising that in a world of fast, cheap software, quality becomes the new differentiator. The bar is being set much higher. Craftsmanship is what will set you apart from the vibe-coded slop. But craft still costs time and energy. It is not free, and in order to buy the time and energy for it, you need to do fewer things better, which requires strong alignment.

Title card for "One Developer, Two Dozen Agents, Zero Alignment" — a talk about collaborative AI engineering and a tour of Ace, the multiplayer coding workspace.

One Developer, Two Dozen Agents, Zero Alignment

Why we need collaborative AI engineering and a tour of Ace: the multiplayer coding workspace

maggieappleton.com iconmaggieappleton.com

Andy Matuschak describes two accidental tyrannies that have shaped software for forty years: the application model that traps software in one-size-fits-all packages, and programming as a specialization that crowds out non-programmers from inventing interfaces. He thinks coding agents could break both, and he’s already seeing it happen with the designers he works with:

I’ve been seeing it. I spent 2025 collaborating with two talented designers. Their story with coding agents this past year has been truly wild. I think the impact on my collaborators has been much greater than the impact on me, despite the fact that I’m now building perhaps ten times the speed.

Unlike me, these two started their careers in design and spent their formative years in the arts culture. They can program a bit, but the process was really slow and difficult enough to pose a significant barrier. At the start of 2025, coding models could implement small one-off design ideas—but their outputs would just fall apart after a couple of iterations. By the end of the year, my collaborators were routinely prototyping novel interface ideas and sustaining that iteration across weeks.

“The impact on my collaborators has been much greater than the impact on me.” Matuschak is moving ten times faster, and he still thinks his designers are the ones whose careers just turned over. That observation is rare from the person on the receiving end of the bigger gain in raw output.

Matuschak’s diagnosis of why the old arrangement was such a trap for designers:

Non-programming designers are trying to invent something in an interactive medium without being able to make something meaningfully interactive. So much of invention is about intimacy with the materials, tight feedback, sensitive observation, and authentic use. So it’s a catch-22: to enter into proper dialogue with their medium, a non-programmer needs to get help from a programmer. That generally requires the idea to be at least somewhat legible and compelling. But if they’re doing something truly novel, they often can’t make it legible and compelling without being in that close dialogue with their medium.

The old design-engineering separation trapped designers in a less obvious way. They often couldn’t even tell whether their ideas were brilliant, because they couldn’t get their hands on the material to find out. You can’t iterate on a feeling. You have to push something around until it pushes back. For most of my career, designers did that pushing in flat mockups and click-through prototypes, working through dynamic behavior they had never actually felt. Of course the technical ideas fell short. The designers themselves hadn’t felt the thing yet either.

That’s the asymmetry coding agents collapse. The loop between “I have an inkling” and “I am tinkering with a working version of the inkling” has finally closed for non-developers. They still can’t and mostly shouldn’t ship production code, but they don’t need to. The prototype is enough to do the design work. Once the gatekeeping melts, the next question is institutional: where does the next generation of interface inventors come from? Matuschak’s answer:

So, what now? We’ve spent decades building HCI programs that mostly look like computer science departments with design electives. But if we’re moving toward a world where invention is bottlenecked more on imagination than on technical expertise, we may have that backwards. We may need programs that look a little more like art school with technical electives—learning to develop ideas from intuition before being able to express them precisely, to discover by playing with the material.

Title slide and content page from Andy Matuschak's MIT HCI Seminar talk "Apps and programming: two accidental tyrannies" dated 2026-03-03, showing a table of contents and lecture notes.

Apps and programming: two accidental tyrannies

On coding agents, malleable software, and the future of interface invention

andymatuschak.org iconandymatuschak.org

Humans are the bread in the sandwich, and the AI is in the middle.

That’s Dan Shipper on his podcast AI & I, talking with Every’s Kieran Klaassen, the engineer behind the compound engineering plugin. They’re working out where humans actually belong in an AI-driven workflow. It’s the same split showing up on the design side.

Klaassen, on the polish step at the end of the work:

The other moment comes at the end. Something comes out. How do you validate it? Well, it’s already tested—browser automated testing has clicked through everything, all the requirements are clearly specified, and it says everything works. But the beauty comes in when a human looks at it, clicks around, and has a feel for it: “Oh, this doesn’t feel right. We can polish it. We can make it better. There’s something still missing. We can make the design better.” […] all the way at the end, when everything is done, you can elevate everything and make it even better. And I think we need to do that, because if we don’t, it will all be slop—all the same. It’s very important to make it feel great because the bar is high, and the bar will always get higher.

“It will all be slop” is the line every team should have taped to a monitor. A passing test suite and a green PR don’t tell you whether the thing is actually any good. That judgment still lives with a human at the end of the workflow. Klaassen is correct that the bar keeps moving up, not down, and the teams who treat the polish step as optional are the ones whose products will look interchangeable in twelve months.

Klaassen, on the art-and-ownership argument:

But I do think that in the end, if you ship something—if you make a statement in the world—and you want it to be your own, you have to say yes or no at some point. You cannot fully automate everything. It’s a bit like making art. If you want it to be yours, it needs to come from you or somehow be connected. So I believe having those moments where you decide—where you choose what you enjoy—is so important. That’s why it’s so important to do things you enjoy and love.

Whatever your version of beautiful is, that’s the bread. Everything else is filling.

Cover art for "AI & I" podcast by Every, featuring a smiling man with glasses rendered in gold tones against a purple background.

The AI Sandwich: Where Humans Excel in an AI World

‘AI & I’ with compound engineering creator Kieran Klaassen

every.to iconevery.to

Karri Saarinen, Linear’s co-founder, calls out the confusion that most of the new design tooling is built on top of:

Design keeps being misunderstood in our industry. New tools keep promising to generate interfaces faster, move words to product instantly, or collapse design directly into code. The assumption behind them is clear: that design is the act of producing. That is the misunderstanding. The hard part of design is rarely generating the form. It is understanding the problem well enough to know what and how something should exist at all.

What I appreciate about Saarinen’s argument is that he doesn’t stop at the diagnosis. He reaches for Christopher Alexander’s Notes on the Synthesis of Form and recovers a vocabulary term the industry has been missing:

Christopher Alexander came closer than anyone to naming this clearly. In Notes on the Synthesis of Form, he describes design as the search for a good fit between a form and its context. Context, in his sense, is not a background condition. It is the full set of forces that make a problem what it is: human needs, technical constraints, conflicting requirements, habits, edge cases, and relationships that are easy to miss until you spend time with them. Bad design appears where those forces remain unresolved. Good design appears where those misfits have been worked through carefully.

Context as forces, not background. The current generation of prompt-to-code tools, including Lovable, Figma Make, and Claude Design, is very good at producing a plausible form against a thin slice of context. Saarinen describes the symptom directly:

You can already see the result in products that look polished, ambitious, and impressive at first glance, but begin to unravel the moment you actually use them. They feel brittle, poorly integrated, and full of decisions that were never fully worked through. The form is there. The fit is not.

That same bottleneck shows up on the workflow side: production speeds up, judgment doesn’t.

Saarinen’s closer:

The risk is mistaking generated form for solved problems.

That is the mistake to watch for, in your own work and on your team. Design is what happens when someone takes the time to understand the forces and works the misfits out of the form.

Loose, expressive ink and wash sketch of an abstract architectural structure with dense crosshatching and gestural line work.

Output isn’t design

Design keeps being misunderstood in our industry. New tools keep promising to generate interfaces faster, move words to product instantly, or collapse design directly into code. The assumption behind them is clear: that design is the act of producing.

x.com iconx.com

Matt Zieger built jobsdata.ai as a weekend project, with the stated goal of being “a single place that synthesizes what we actually know about AI’s impact on economic opportunity.” The site breaks every occupation into its component tasks, then prices the AI compute cost to do one hour of each task and compares it against the human wage. The result is a per-task crossover year: the point when AI gets cheaper than the human at that work. “Evidence Over Narrative,” as Zieger puts it.

The UX designer report opens directly:

If you’re an ux designer, this is worth taking seriously. But it’s not too late to get ahead of it.

We’ll be honest with you: a lot of the individual tasks in your job are things AI can already do, and that’s accelerating. But there are real reasons not to panic: when technology has made this kind of work cheaper in the past, people ended up wanting more of it, not less. There’s good reason to think that pattern will hold here too. Your field also tends to adopt new technology faster than most, so it’s worth paying attention now.

Double down on the parts of your work that take real judgment and experience. As AI handles more of the straightforward stuff, demand for what you do will likely grow.

Early Signals of AI Impact" live tracker dashboard showing four metrics: 9% job displacement of US jobs by 2030, -2.5% median wage impact, 40% AI adoption by 2027, and 61% earnings call mentions among S&P 500 companies.

Early Signals of AI Impact

462+ sources, one pattern: AI adoption is accelerating, productivity is climbing, and jobs are changing faster than they’re disappearing.

jobsdata.ai iconjobsdata.ai

Obviously, I’ve been pro-AI on this blog, actively trying to understand and figure out how it’s affecting UX design and how to use it for leverage instead of being replaced by it. In Silicon Valley and tech companies everywhere, including BuildOps, we’re racing to incorporate AI into our daily work to increase velocity, and adding it to our products to stay relevant.

Nilay Patel, in a Decoder monologue, lays out the polling that should rattle anyone shipping AI products:

There’s that NBC News poll showing AI with worse favorables than ICE, and only a little bit above the war in Iran and Democrats generally. That’s what the nearly two-thirds of respondents saying they’d used ChatGPT or Copilot in the last month. Quinnipiac just found that over half of Americans think AI will do more harm than good. Well, more than 80% of people were either very concerned or somewhat concerned about the technology. Only 35% of people were excited about it. And poll after poll shows that Gen Z uses AI the most and has the most negative feelings about it. A recent Gallup poll found that only 18% of Gen Z was hopeful about AI, down from an already bad 27% last year. At the same time, anger is growing. 31% of those Gen Z respondents said they feel angry about AI, up from 22% last year.

The killer detail is buried halfway through. The Gen Z curve is striking: heaviest users, and yet the fastest to sour. Anger is up nine points in a year. These aren’t non-users reacting to coverage. They’re the daily customers, and the answer is no. Sam Altman has called this AI’s marketing problem. The polling rebuts him: public exposure has grown, public favor has not.

Patel’s title line:

Regular people don’t see the opportunity to write code as an opportunity at all. The people do not yearn for automation. I’m a full-on smart home sicko. The lights and shades and climate controls of this house are automated in dozens of ways, but huge companies like Apple and Google and Amazon have struggled for over a decade now to make regular care about smart home automation, and they just don’t. AI isn’t gonna fix that.

Patel grounds the title in his own smart-home enthusiasm, and the comparison clicks because the failure pattern is identical: decade-plus of effort, billions in marketing, working products, and persistent indifference. Apple, Google, and Amazon ran that experiment. AI will not crack a problem that smart-home automation hasn’t.

John Gruber connects the same dissonance to the Mos Eisley cantina from Star Wars. Luke walks in with C-3PO and R2-D2. The bartender, Wuher, barks: “We don’t serve their kind here. Your droids. They’ll have to wait outside.” Gruber:

As a kid, I didn’t get it. Why would you not want droids? Star Wars made robots seem so real, so fun. Why would you ban them? That scene has stuck with me for my entire life. I didn’t get why, but I understood what it meant about that galaxy: the underclass deeply resented droids.

Gruber leaves the question open. He says he didn’t get why the droids weren’t welcome. The cantina’s animosity wasn’t arbitrary. Mos Eisley sits in the Outer Rim, where droid armies killed millions and occupied worlds during the Clone Wars. After the war, droids became a subjugated worker class across the galaxy, and Outer Rim spots like Mos Eisley held the line hardest. Wuher’s verdict comes from experience.

That’s the parallel for AI. Public distrust is earned. People have lived with AI overviews getting facts wrong and feeds drowning in slop, while every product asks them to bend a little more toward the database. Patel:

And so the tech industry is rushing forward to put AI everywhere at enormous cost, energy, emissions, manufacturing capacity, the ability to buy RAM locked into the narrow framework of software brain, without realizing they are also asking people to be fundamentally less human. And then they’re sitting around, wondering why everyone hates them. I don’t think a couple haircuts are gonna fix it.

As an industry, we need to continue to show the value of AI by being truly useful, not just market it.

THE PEOPLE DO NOT YEARN FOR AUTOMATION

Today on Decoder, I want to lay out an idea that’s been banging around my head for weeks now as we’ve been reporting on AI and having conversations here on this show. I’ve been calling it software brain, and it’s a particular way of seeing the world that fits everything into algorithms, databases…

youtube.com iconyoutube.com

In design circles, the AI debate splits into two responses: principled resistance and principled engagement. Dan Cohen offers a third: historical context.

Writing for Humane Ingenuity, Cohen uses Tracy Kidder’s The Soul of a New Machine—the 1981 Pulitzer-winning account of Data General’s minicomputer team—as a mirror for the current moment. He opens with a scene that reads like a 2026 AI company profile before revealing it’s from 1979:

A crack team of hardware and software engineers, inspired by breakthroughs in computer science and electrical engineering, are driven to work 18-hour days, seven days a week, on a revolutionary new system. The system’s capabilities and speed will usher in a new era, one that will bring transformative computing to every workplace. The long hours are necessary: the team knows that every major computer company sees what they see on the horizon, and they too are working around the clock to take advantage of powerful new chips and innovative information architectures.

The team is almost entirely men, men whose affect and social skills cluster in a rather narrow band, although they are led by a charismatic figure who knows how to persuade both computer engineers and capitalists. This is a helpful skill. Money, big money, is flowing into the sector; soon it will overflow. Engineers are constantly poached by rival companies. Hundreds of new competitors arise to build variations on the same system, or to write software or build hardware that can take advantage of this next wave of computing power. Some just want to repackage what the computer vendors produce, or act as consultants to the companies that adopt these new machines.

Sounds a bit like today’s Silicon Valley 996 culture, but that’s Data General in 1979. The team also worried about the Pentagon weaponizing their machine, job displacement, and whether their work might eventually produce true AI and destroy humanity. Those concerns date to 1979.

Cohen’s argument is about scale: the minicomputer moved millions of companies from paper to digital for the very first time; that was a genuine revolution. AI, he argues, is improving workflows that are already digital. His question: is that the same order of disruption?

Carl Alsing, one of the engineers who built the Eagle, told Kidder when asked about artificial intelligence:

“Artificial intelligence takes you away from your own trip. What you want to do is look at the wheels of the machine and if you like them, have fun.”

Cohen closes with the historical outcome:

In the 1980s, most of the minicomputer companies, launched with such excitement in the late 1970s, failed. Data General was acquired for a fraction of the billions it was once worth. The minicomputer, however, was broadly adopted, was transformative, became routine, and then was surpassed by a new new machine, the personal computer.

Later, Data General’s domain name, DG.com, was sold to a chain of discount stores, Dollar General.

Vintage blue terminal keyboard with numeric keypad, featuring keys labeled NEW LINE, CR, DEL, SHIFT, ENTER, VIEW, ON LINE, and READY/FAULT indicators.

The Role of a New Machine

An old book puts today’s new technology in perspective

newsletter.dancohen.org iconnewsletter.dancohen.org

“Slop cannons” is Darragh Curran’s term for the fear that AI-generated code will degrade craft. The fear is real. The same fear runs through design: AI-generated interfaces will be derivative, generic, indistinguishable from each other. Curran is Intercom’s CTO, and he published a detailed report on what happened when Intercom went agent-first across their entire R&D org. The result: 3x productivity in 16 months, tracked across nine metrics. The code quality results were not what anyone expected.

Curran:

A legitimate worry with the use of coding Agents, is that they won’t write high-quality code and the craft we’ve fought to protect will be undermined by slop cannons. We have a system to rate the structural quality of code contributions using static analysis and various rules/heuristics. It’s clear that prior to agentic coding, this metric would oscillate up and down above the line. As we started to use AI for writing more and more of our code, the overall quality (by this measure) declined. My intuition was that this was inevitable in the short term, but correctable in the medium term, as models and harnesses get better. We are starting to see this and recently had possibly our first ever five-week streak of net positive code quality overall.

Quality did dip. He confirms it. The slop cannon fear describes a real phase: at 93.6% agent-driven PRs, when agent-generated code degrades, the whole org feels it. But there’s a second finding:

There is huge latent potential. Some people are really pushing the limit of what is possible, tokenmaxxing, doing really interesting things, while others have only really made incremental changes to how they’re working and don’t see much change in their personal throughput. Ultimately one of the biggest bottlenecks to progress is with humans; how we work together, how we change behavior, etc.

Intercom’s top 5% of contributors produce 6x the median PR throughput. Those are the people spending over $1,000 a month on tokens. That spread is the real finding from going agent-first. The slop cannon fear is about whether agents can execute well. The 6x gap is about who’s learned to orchestrate them, and Curran’s candid that most of his org is still finding out.

For design, we worry about going too fast, of solving the wrong problem, and building the wrong thing. Those are legitimate fears. Nonetheless, if you’re working in startupland as a designer, acceleration and automation are coming.

Illustrated astronaut standing on a mountain peak planting an orange flag, with text reading "2x: 9 months later – Fin/ideas" on a dark background.

2× – nine months later: We did it

You can too.

ideas.fin.ai iconideas.fin.ai

Design orgs and publications have been issuing AI bans, calling them principled responses to job displacement, training data theft, and the degradation of craft. The impulse is understandable: AI doesn’t just replace tools; it challenges what made you worth hiring, and the prospect of losing what you’ve built is felt more sharply than any potential gain. Christopher Butler thinks those lines are drawn in the wrong place:

By drawing hard lines against entire categories of tools, we’re mistaking the means for the problem itself, and in doing so, we’re limiting our ability to shape how these technologies integrate into creative work.

Butler doesn’t dismiss the concerns driving those bans: training data problems, corporate consolidation, job displacement. He thinks they’re legitimate and urgent. His objection is to making the tool the target rather than the behavior. Drawing the line at AI, he argues, repeats the mistake designers made at the letterpress and again at paste-up. The technology changed. The question—about authorship, judgment, and what craft actually requires—stayed the same.

Butler’s conclusion:

A designer who uses AI to plagiarize another artist’s style with a simple prompt is engaged in something fundamentally different from one who trains a tool to extend their own creative capacity. A writer who publishes purely generated text as their own work is making a different choice than one who uses AI as a thinking partner and editor while maintaining authorship over their ideas and voice. These distinctions matter more than blanket prohibitions.

Discernment in practice means asking: Am I using this tool to extend my own capabilities or to replicate someone else’s work? Am I shaping the output or simply accepting what’s generated? Does this use serve my creative vision or just expedite a result? These aren’t always easy questions, but they’re the right ones.

Butler himself is the illustration. He spent months training Claude on a 10,000-word skill file—the accumulated context of his subject matter and his voice—building a sounding board and editor that already knows his context. He still writes without it. He says some of his best writing has come from working with it. The output may be indistinguishable to most readers. The difference, he says, is real to him.

The choice isn’t between purity and complicity, between craft and automation. It’s between engagement and abdication—between shaping how these tools develop and how they’re used, or ceding that ground entirely to those with the least interest in protecting what we value about creative work.

Four-panel collage featuring a close-up microchip, a red diagonal line on blue background, an open human hand in black and white, and grid paper partially lit by light.

Red-lining AI - Christopher Butler

Why blanket AI bans mistake the tool for the problem, and how thoughtful integration of automation, ethics, and creative work offers a better path forward.

chrbutler.com iconchrbutler.com

Ant Murphy opens with an eyebrow-raising McKinsey number:

McKinsey reports that 88% of organisations say they “use AI” but only about 1% have mature AI deployments delivering real value.

Murphy’s explanation for the gap is familiar: the diffusion of innovation, Geoffrey Moore’s chasm between early adopters and the majority, now applied to AI. What’s less common in the AI discourse is a behavioral explanation for why the adoption keeps stalling. Murphy:

AI is personal. It’s not another tool, to some it’s viewed as a replacement. “AI attacks our identity in a way that most software doesn’t” — Vikram Sreekanti

That resistance shows up in the record: a friend’s “I didn’t sign up for this”. Claire Vo described designers as the most resistant to change in the EPD triad, vocal AI opponents with little appetite for campaigning for resources. None of it is irrational. Daniel Kahneman and Amos Tversky found that humans weigh losses about twice as heavily as equivalent gains. Years of accumulated craft become our identity. AI doesn’t ask you to learn new tools; it asks you to renegotiate what made you worth hiring in the first place. The reskilling conversation treats that as a capability problem. Identity problems don’t resolve themselves through training on new tools.

Murphy on what that requires:

Surviving a paradigm shift like this is less about what your product does […] Instead it’s about you adapting to the change.

The 88% are held back by what AI is asking them to let go of. Murphy’s argument is that organizations clearing the chasm are doing the internal work first—on process, on how teams function—before it shows up in the product.

There’s an old relationship adage that you can’t be a good partner to someone until you’ve worked out your own stuff first. I think Murphy’s argument is the organizational equivalent.

Diagram labeled "The AI Bubble" with a red arrow pointing to a tiny red dot inside a large circle labeled "Everyone Else," illustrating how small the AI bubble is relative to the general population.

The AI Chasm — Ant Murphy

I challenge the hype around AI and share a more grounded perspective on how adoption actually works. Drawing on real data and firsthand experience, I break down why most companies are still early in the AI journey—and what product leaders should focus on instead.

antmurphy.me iconantmurphy.me

I’ve been pro-prototype: PMs replacing PRDs, designers prototyping interactions in code. Pavel Samsonov, writing at Product Picnic, aims at exactly that position. He opens by borrowing a distinction from Andy Polaine:

Demos and prototypes sit on a continuum, but I consider demos something to help you show a concept to other people in a form that looks and feels like the real thing. Prototypes are things you create to test something you don’t know until you build and test it.

Correct distinction. A demo succeeds on stakeholder approval; a prototype succeeds on learning. Both artifacts can be interactive and polished. What separates them is what counts as success. Samsonov on what happens when teams conflate them:

The only thing these demos are helping you test is whether your stakeholder likes what they see (the first loop) and as soon as they say “yes,” it becomes good enough to ship. Whether that second loop (releases go out, measurements come in) ever gets tracked or not is not something I’d be willing to put money on. Because once the demo is productionized, it goes from the realm of delivery velocity (which gets you shoutouts and promotions) into the realm of maintenance (which tends to be ignored even as it eats up more than half of the team’s bandwidth).

AI makes it easier to produce both, and Samsonov’s read on what happens when teams use the speedup wrong:

Shoving out more prototypes is not a heuristic for success; it is a heuristic for failure because it shows that you don’t know what you are trying to learn.

Agreed. Samsonov goes further:

This is exactly why AI-generated prototypes are not working, and have not helped anyone do anything ever. Some have accused me of going too far with this assertion, but I stand by it, because it is rooted in the very nature of what a prototype is (and is not), and what makes it successful (or does not).

Here’s where I differ. Brian Lovin’s Notion prototype playground exists because static mocks enforce golden-path thinking. The playground surfaces the messy middle of AI chat: follow-ups and latency changes no one mocks up. Édouard Wautier’s Dust team prototypes state changes and motion Figma can’t show. Figma PMs ran five user interviews in two days off an AI-built prototype, which is a textbook closed second loop. All three count as prototype work.

Samsonov’s diagnosis is right. His absolute stance is, well, too absolute. AI-generated prototypes haven’t helped anyone only if you assume they’re all demos, which is exactly what the distinction he just drew tells us not to assume.

Product Picnic 64 title card over a vintage black-and-white photo of three people eating and drinking outdoors on rocky terrain.

Designers will never have influence without understanding how organizations learn

We confuse prototypes with demos, and validation with confirmation bias. As a result, we cannot lead — instead, we are led.

productpicnic.beehiiv.com iconproductpicnic.beehiiv.com

In my previous item, I linked to a post by Adi Leviim who made the case against chat as the AI interface default, reading the 2024 wave of GUI retrofits AI labs shipped—Canvas, Artifacts, Projects, Computer Use, Deep Research—as the industry admitting a text box alone wasn’t enough. Matt Webb, writing on Interconnected, wants every service to ship a CLI instead. Both arguments are about text. They look like they contradict. They don’t. Webb’s case for going headless:

It’s pretty clear that apps and services are all going to have to go headless: that is, they will have to provide access and tools for personal AI agents without any of the visual UI that us humans use today. […] Why? Because using personal AIs is a better experience for users than using services directly (honestly); and headless services are quicker and more dependable for the personal AIs than having them click round a GUI with a bot-controlled mouse.

Webb’s CLI sits on the agent-to-service layer. Leviim’s retrofits sit on the human-to-agent layer. The text on one side is a protocol for machines. The text on the other is a user writing out intent in sentences. Both are text, but the role is different. Webb makes the split explicit when he turns to what it means for design:

So from a usability perspective I see front-end as somewhat sacrificial. AI agents will drive straight through it; users will encounter it only once or twice; it will be customised or personalised; all that work on optimising user journeys doesn’t matter any more. But from a vibe perspective, services are not fungible. […] Understanding that a service is for you is 50% an unconscious process - we call it brand - and I look forward to front-end design for apps and services optimising for brand rather than ease of use.

Interesting, right? Webb believes that the need for human-facing UI and therefore user journeys will be less. He’s designing for an agent-first world.

Webb, goes on…

If I were a bank, I would be releasing a hardened CLI tool like yesterday. There is so much to figure out: […] How does adjacency work? My bank gives me a current account in exchange for putting a “hey, get a loan!” button on the app home screen. How do you make offers to an agent?

The agent becomes the surface designers have to figure out.

Abstract illustration of tangled white curved lines forming loose oval shapes against a soft green background with muted circular shadows.

Headless everything for personal AI

It’s pretty clear that apps and services are all going to have to go *headless:* that is, they will have to provide access and tools for personal AI agents without any of the visual UI that us humans use today.

interconnected.org iconinterconnected.org

Every major AI lab spent 2024 bolting GUI surfaces onto chat: Canvas, Artifacts, Projects, Computer Use, Deep Research. That’s seven retrofits across three AI firms in twelve months. Adi Leviim, writing for UX Collective, reads that wave as the industry conceding in public what designers have been saying since Amelia Wattenberger’s 2023 essay on why chatbots aren’t the future of interfaces. His setup for why the default took hold:

Open any AI product launched in the last three years. Ignore the model, the logo, the branding. You will find the same interface: a text input at the bottom of the screen, a send button, and a scrollback of alternating messages. This is not a random convergence. It is the interface that fell out of what large language models could do on day one: pattern-match on text. In 2022 we had a new capability and no time to design around it, so we shipped what was fastest to build and called it conversational AI. Three years later, the fastest thing to build has become the thing everyone builds. That is how defaults calcify.

The lag between Wattenberger’s essay and the retrofit wave was three years. Leviim counts the retrofits as evidence the rectangle was always going to need help:

Calling this progress is charitable. It is the industry discovering, retrofit by retrofit, that a text box alone cannot hold a meaningful creative surface. You cannot edit a thousand-line document by asking the bot to re-output it with “line 312 changed to X”. You cannot iterate on a design by describing it. You cannot plan a research project without seeing the plan. The moment the task has a structured output, the chat box becomes the wrong place to work, and the vendors put a canvas, a side panel, an editor, a workspace, or a planner next to it.

“Retrofit by retrofit” is the phrase that carries his argument. Each retrofit is a clickable, scrollable, draggable pattern the chat box had removed. The AI labs are rebuilding what 2015-era UI already had.

Leviim continues, separating intent from chat:

Expressing intent does not require prose. A date picker expresses temporal intent more precisely than any sentence. A pair of sliders expresses a tradeoff more legibly than a paragraph. A file upload expresses “work on this thing” without ambiguity. Every one of these is intent-based. None of them is chat. The chat box is one possible implementation of the paradigm, and by all accessible evidence it is a low-resolution one.

Jakob Nielsen’s 2023 essay, “AI: First New UI Paradigm in 60 Years,” treated chat as the way to express intent. Leviim agrees intent-based interaction is the shift. He argues chat is the wrong way to express it. Date pickers, sliders, file uploads are all intent surfaces, and none of them is chat. Which is where the design work goes next:

the good AI UX work of the next three years will be distributed across a thousand of those scoped surfaces rather than concentrated in one generalized text field.

That’s the brief for anyone designing AI products.

Side-by-side comparison of a Structured UI with a dropdown, date picker, checkboxes, and range slider versus a minimal AI Chat Interface with a text input and Send button.

The chat box isn’t a UI paradigm. It’s what shipped.

Before LLMs we had direct manipulation, structured forms, and progressive disclosure. Then we collapsed all of it into a text box.

uxdesign.cc iconuxdesign.cc

Showing stakeholders prototypes is often a high-wire act. Back in the old days, that’s why we showed wireframes prior to high-fidelity comps, or mockups. But now with tools like Lovable or even Claude Design, where the prototype demos really well, it’s easy to mistake it for a product that is shippable. The stakeholder in the room could easily say “ship it.”

That used to be where the Figma-to-code handoff became visible. Now it’s invisible. Greg Kozakiewicz, writing on LinkedIn, wants designers to see it again. He updates an old construction-industry line for the AI era:

We used to confuse the drawing with the building. Now we confuse the prototype with the product. A working prototype also accepts everything. It will let you register, log in, fill out a form, submit something. It all works. In the demo. On a good laptop. With a fast connection. With someone who knows what they’re doing and what the app is supposed to do.

The design-to-code gap didn’t vanish when AI made prototypes interactive. It went underground. Now it shows up as a stakeholder saying “looks great, let’s ship it” to something that couldn’t survive real data or production constraints. Kozakiewicz puts a number on it:

AI gets you to about 60%. A solid, reasonable, generic 60%. The layout makes sense. The flow is logical. The copy is clear enough. It looks like a product that works. And for a lot of people, especially people making decisions about budgets and timelines, 60% looks like 90%. Because the last time they saw a prototype, it was a static Figma file with “Lorem ipsum” everywhere.

A hand lifts a modular glass block from a detailed architectural scale model, revealing illuminated interior floors with tiny figurines inside.

Paper accepts everything. So does a prototype.

There’s an old saying in construction. Paper will accept everything. You can draw anything on paper. A swimming pool on the roof. A spiral staircase made of glass. A cantilever that defies physics. Paper doesn’t argue. Paper doesn’t say “this won’t hold.” Paper just sits there, looking beautiful, full of promise.

linkedin.com iconlinkedin.com
Pointillist-style painting of a formally dressed figure in a black top hat holding a glowing green laptop, surrounded by a crowd of early 20th-century people.

A Sunday Afternoon with Claude Design

It’s really hard to get momentum on a side project when you have a full-time job with lots of travel, an active blog, and a newsletter. But I had to recapture that momentum because this side project is important. It’s for a preschool website for my cousin.

Walking into My Little Learning Tree is like stepping into pure warmth. Yes, yes, preschools are inherently fun environments, but the kids and the teachers there create a visceral energy that is simply special. I wanted to capture that specialness in a long-overdue website redesign project.

Looking at my in-progress design, something felt off. I had these long horizontal lines preceding the eyebrows—the small text above a heading that names the section—that didn’t feel right. First, they were straight. Second, the lines only occurred before the text, not also after. I clicked on the Comment button to enter Comment mode, then clicked on the eyebrow and prompted, “These lines aren’t playful enough. Let’s make them squiggles and have them before and after the eyebrow text.”

And then Claude Design did its thing.