Skip to content

355 posts tagged with “ai”

Nearly nine in ten organizations now use AI in at least one business function. Ninety-four percent aren’t seeing significant value from it. Gale Robins, writing for UX Collective, argues that the gap is a framing problem, not an adoption problem. Her earlier piece on discovery judgment made the same case; the new one sharpens it with an anecdote that shows the trap:

A team I spoke with recently had compressed their discovery cycle from six weeks to ten days using AI. They were proud, and the throughput was real. When I asked what the work had taught them that they did not already believe, the answer was: not much. Same questions, faster. Same answers, sooner.

Same questions, faster. Same answers, sooner. Her analogy for the wider pattern is the electric factory one I’ve used before:

When factories first installed electricity, productivity barely moved. Manufacturers replaced steam engines with electric motors and kept the line-shaft layout. The breakthrough came later, when they redesigned the factory around what electricity made possible. The technology was only part of the answer.

Robins maps McKinsey’s three waves of AI value—productivity, differentiation, transaction-cost reduction—and finds most teams stuck in the first one. Robins on where they have to go to get out:

These decisions are upstream of every artifact a team produces. They are also where AI productivity gains help least, and where human judgment compounds the most.

Robins’s evidence undersells her own thesis. She leans on Generative AI at Work—the Stanford-and-MIT customer-support study by economists Erik Brynjolfsson, Danielle Li, and Lindsey Raymond that became the canonical citation for “AI helps novices most”—to argue AI raises the floor, not the ceiling. Novices gained 34%; experienced workers, basically zero. That’s why so many designers who have never coded—like me—are now suddenly shipping with this newfound superpower. It’s the same finding behind the junior designer crisis. But LinkedIn’s Full Stack Builder rollout found the opposite: top performers adopted AI fastest and got the most out of it, because they had the judgment to know what to ask for. The floor-not-ceiling story is only true where the questions are fixed. Once the questions are the work, the pattern inverts. That’s exactly the territory Robins is mapping. If AI rewards the experienced most when the work is judgment-shaped, framing is where the gap between teams widens.

Cover illustration for Gale Robins's UX Collective essay on discovery as the work AI gives back.

Discovery is the work AI gives back

Nine in ten organizations use AI. Ninety-four percent see no significant value. Gale Robins says the gap isn’t about adoption: teams use AI to do the same work faster instead of asking what’s worth building.

uxdesign.cc iconuxdesign.cc

Open four agent windows at once and the day disappears in a way that feels productive but isn’t. David Hoang, writing in Proof of Concept, puts it plainly:

At times, HITL [human-in-the-loop] agent orchestration feels addictive like Candy Crush or scrolling social media. Every prompt shows a stream of tokens and visible progress being made. You sit and wait to hit the number 2 or continue prompting. Instead of doom scrolling, you’re doom building; a sense of productivity which leaves you not doing anything else.

To be abundantly clear, I’m not against HITL and it’s a great way to build. What I’m saying is the massive productivity gains take a toll on you. I’ve shipped real work this way; being locked in for entire afternoons and evenings to prompt sessions. Sometimes I get good outputs and other times I don’t get anything valuable.

The orchestration tax is like the coordination tax at work. I’m feeling like I’m building but really air traffic controlling in parallel. You are reading partial outputs, deciding which to merge, which to discard, which to re-prompt. It’s a job, and an important one, but it’s not the deep work in design, writing, or thinking I need to do. That is a real job. It is not, however, the same job as design or writing or thinking. It uses a different part of you and it depletes a different reservoir. By the time I sit down to actually draw something or write a paragraph that matters, the reservoir is empty.

I orchestrated my way out of having anything to say.

Hoang’s analogy to coordination tax—the meeting load that eats the day at any tech company—is exact. Watching a token stream and deciding what to keep is real work. It is not the work you sat down to do. Orchestration spends from the same reservoir or account that making spends from, and you do not feel the withdrawal until the end of the day when you go to write the paragraph and there is nothing in the tank. Hoang’s tactical answer is to switch defaults: human-in-the-loop for the few things that benefit from your synchronous attention, human-on-the-loop for everything else, with a real review block on the calendar.

The shift is from watching to bracketing. Agents need start conditions and end conditions, not a babysitter in between.

Header illustration for the Escape from agentic loop essay on Proof of Concept.

Escape from agentic loop

David Hoang on the cognitive cost of orchestrating four agents at once: the productivity feels real, but it depletes the same reservoir you need for design, writing, and thinking. He calls it the orchestration tax.

proofofconcept.pub iconproofofconcept.pub

I met Jennifer Jerde early in my career. She’s one of the nicest humans I’ve ever encountered in this business, and her firm Elixir Design has been quietly building brands out of San Francisco for 27 years.

Rachel Paese, writing for Design Observer, gets Jerde on the question the rest of the industry is trying to dodge:

We’re in a world right now where everything kind of looks the same. You can go to Canva, you can get Squarespace templates, and you basically just change out the messaging. And it looks pretty handsome. It’s a design of an ad. It’s a design of a website. It’s where the design is a noun. It looks pretty nice, but to make stuff that’s actually true, it looks like listening to a lot of different people.

The thing you ship is the artifact, and the artifact can be produced by anyone with a template and a logo file. What Jerde sells is the listening—27 years of it—and the specific verb that template work can’t perform.

She gives the verb a definition later in the piece:

We do have an observation practice. I would never refer to it that way. We listen like crazy. We’re observing what lives inside clients’ minds and hearts about things. And then we don’t just show them 3 directions, we show them fifteen. Then we observe what happens.

The difference between three directions and fifteen is the difference between a studio and a feed. Three directions is what you present when you already know which one you want the client to pick. Fifteen is what you present when you genuinely don’t know yet, and you’re willing to use the meeting as the instrument that tells you. That’s expensive and it doesn’t scale. It’s the entire reason Elixir has been doing this for 27 years and not 27 months.

Jerde on AI:

I’m both deeply impressed by AI and scared of it. But I feel like companies and brands get exactly what they deserve. They hang out there exactly who they are, in a way. […] And then there are the players who are like, no, no, we really want to reach this audience. We want to show how genuinely great we are in whatever way we are genuinely great. That takes effort.

There’s a caveat in there: the migration to AI is optional. You can opt out. The template will work for the audience that doesn’t require you to have listened. It’s good enough.

But the interesting brands are the ones that don’t accept good enough, that demand that extra effort.

Jennifer Jerde and the Elixir Design team gathered around a chocolate anniversary cake, all wearing black ELIXIR sweatshirts.

Elixir Design founder Jennifer Jerde believes in the human touch

Jennifer Jerde’s San Francisco branding agency Elixir has spent 27 years centered on a single verb: listening. The firm shows fifteen directions instead of three, treating real feedback as the instrument that tells them which one is true.

designobserver.com icondesignobserver.com

Brandon Harwood opens with Picasso’s Guernica. He asks you to look at the painting, then tells you the story behind it—the bombing of the Basque town, the civilian deaths, Picasso’s intention to communicate that horror—and asks you to look again.

If you didn’t know the story of this painting beforehand, now you do, and it might strike a different chord, if just slightly. The details of the painting now have the context that shows us what Picasso was thinking when he painted Guernica. […] It’s this kind of context that drives meaning in art. Guernica is not just a painting. It’s communication.

Harwood uses it to draw a line between what AI can generate (the aesthetics of a thing) and what humans build (the context that makes a thing communicate). His answer: instead of asking AI to make meaning, design around the fact that it can’t.

Meaning Machines are, at their core, “signifiers, randomized into a fixed grammar, and read for new meaning.” […] The randomized signifiers are the contextual data surrounding our creative pursuit, the data the AI is trained on, and the relationships built on that data through its training. These signifiers, the data, are then placed into a fixed grammar through agentive interaction and/or agentic actions, and the user can then interpret the result to stimulate their creativity, build new meaning, or explore ideas they might not have considered before.

Tarot doesn’t know what your week looks like. Oblique Strategies doesn’t know what song you’re stuck on. The cards work because they hand you raw material and you do the interpretation. Harwood’s claim is that an LLM, used right, can sit in that same chair. Provoke the human. Dr. Maya Ackerman calls this same arrangement “humble creative machines”: the AI is not the creator, it’s the prompt the creator responds to.

Harwood breaks co-creative AI into three roles:

The Puller: The AI system gathers information about the context the user is working in through active question generation and passive information collection on the works. […] The Pusher: The AI system uses some/none of this context to synthesize considerations for the user to employ throughout their creative journey. […] The Producer: The AI system creates artifacts for use as elements of the users’ larger creative output.

The Puller / Pusher / Producer vocabulary is what I wish more design teams had before they shipped their first AI feature. Each role is a constraint, a way to keep the human in the chair the work actually belongs in. Most AI tools for creatives flatten all three into one button that produces a finished thing. Harwood’s whole argument is that the finished thing is where the meaning has to originate; it can’t be the destination.

Pablo Picasso's *Guernica*, the black-and-white anti-war mural depicting a bull, a screaming horse, a fallen warrior, and figures in anguish.

Collected consciousness

Brandon Harwood opens with Guernica and argues that AI cannot carry meaning or intention—but constrained to three supporting roles (Puller, Pusher, Producer), it functions as a ‘meaning machine’ that amplifies creative judgment instead of replacing it.

doc.cc icondoc.cc

Peter Yang spent the last few months running OpenClaw, Hermes, Claude Code, Codex, and Gemini through ten capabilities he thinks a personal AI agent needs to handle. The headline is in his subtitle: nobody has won yet.

Yang on OpenClaw, an open-source personal-agent platform:

I estimate that 10% of my time with OpenClaw is spent fixing it instead of using it. Examples: It forgot it had access to edit Google Docs. It randomly started using a robot voice instead of the one I like. It breaks half the time after every update.

He switched to Hermes (a newer personal-agent platform from Nous Research) anyway:

If OpenClaw’s maintenance tax is wearing you down, give Hermes a try. A week in, it’s been more reliable for me.

Yang’s full comparison of Claude Code, Codex, and Gemini—plus the stack he ends up running—is in the post. His advice for the rest of us:

Pick one or two agents that work for you based on the pros and cons above and just commit.

His promise to anyone who picks one and stays:

Once you have an agent that’s available 24/7 and can actually get work done for you, you’ll never go back to a regular AI chat interface again.

Promotional hero illustration for an article comparing OpenClaw, Hermes, Claude Code, Codex, and Gemini as personal AI agents.

The Race to Build a Personal AI Agent (And Why Nobody Has Won Yet)

Everyone wants to build an AI chief of staff. Here’s my honest take on the pros and cons of OpenClaw, Hermes, Claude Code, Codex, and Gemini.

creatoreconomy.so iconcreatoreconomy.so

Luke Wroblewski shared his notes from the Design Futures Assembly, a gathering of about a hundred senior designers and leaders from AI labs, big tech, and startups in San Francisco:

When everyone can ship, you get a different kind of problem. One design leader described it perfectly: they let everyone build and push whatever they wanted. And you could feel it in the product, because nothing made sense together.

This is the part of the AI-in-design story that the toolkit numbers obscure. Wroblewski reports roughly half of designers had shipped AI-generated code to production this year, and that the typical designer’s toolkit had doubled in size over twelve months. Those are real numbers. But once production stops being the bottleneck, the bottleneck moves. A single word surfaced repeatedly:

Several people at the assembly used the word “editorial” to describe where design leadership is heading. Less about making the thing, more about deciding what gets made and ensuring it all holds together. The skill of saying no is becoming one of the most important skills in the profession.

The “saying no” line echoes something Chad Johnson wrote a few weeks back: the designers who shape direction “learn to say no with evidence and to disagree without drama.” The Assembly’s framing makes that posture mandatory at a portfolio level, not just on individual features. One tool company founder, Wroblewski notes, preferred “coherence”: the sense that a product came from one shared point of view. I like that word better too. Coherence describes the thing the user actually feels.

Design Futures Assembly event header image from Luke Wroblewski's notes on the San Francisco gathering.

Design Futures Assembly

Half of designers ship AI-generated code to production. Wroblewski’s notes from the Design Futures Assembly land on a new role: editorial leadership.

lukew.com iconlukew.com

Taras Bakusevych closes his walkthrough of ten dying UI patterns on the heuristic that matters:

Execution UI: Interfaces that help humans perform deterministic work — entering data, configuring rules, following process steps, executing repetitive operations. 🟠 Shrinking. As AI automates execution, these surfaces lose their reason to exist.

Judgment UI: Interfaces that help humans evaluate, guide, and correct work done by machines — reviewing outputs, verifying changes, understanding reasoning, intervening at exceptions. 🟢 Growing. As AI takes on more autonomous work, humans need better surfaces to supervise it.

The supervision problem is what Jakob Nielsen called evaluability—the new central UX metric—and Bakusevych is doing the screen-by-screen translation. Every pattern in his list gets re-examined under one question: is this surface helping the human do the work, or helping the human check the work?

The HubSpot quote flow makes the friction concrete:

Creating a single sales quote in HubSpot requires navigating seven sequential screens. The rep manually selects the contact, adds company details, configures line items, chooses signature options, sets payment terms, picks a template, and previews the result — before a single quote reaches the buyer. Each step assumes the system doesn’t know information it already has in the CRM.

Bakusevych’s replacement gives the rep a different role: review what Shopify Sidekick assembled, correct what’s wrong, ship.

That’s the test he leaves you with. Open one screen in your product and ask which job it’s doing. If it’s interrogating the user for context the system could have inferred, it’s on the shrinking side.

Grid of UI pattern cards with a recycling icon at the center, illustrating ten interfaces being remade by AI.

10 UI Patterns That Won’t Survive the AI Shift

Taras Bakusevych walks through ten UI patterns under pressure from AI and lands on the one heuristic worth keeping: execution UI shrinks, judgment UI grows.

syntaxstream.substack.com iconsyntaxstream.substack.com

Thariq Shehzad, on the Claude Code team at Anthropic, has switched from markdown to HTML as his default agent output format. The reasoning is more honest than a format-war argument would suggest, because it’s about what humans will actually read. He opens by acknowledging what markdown was for:

Markdown has become the dominant file format used by agents to communicate with us. It’s simple, portable, has some rich text capability and is easy for you to edit. Claude has even gotten surprisingly good at using ASCII to make diagrams inside of markdown files. But as agents have become more and more powerful, I have felt that markdown has become a restricting format.

Then the pivot:

As Claude is able to do more complex work, it is also writing larger and larger specs and plans. In practice, I’ve found I tend to not actually read more than a 100-line markdown file, and I certainly am not able to get anyone else in my organization to read it. But HTML documents are much easier to read, Claude can organize the structure visually to be ideal to navigate with tabs, illustrations, links, etc.

When the spec gets long enough that you stop reading it, you’ve quietly moved from review to rubber-stamp. Shehzad’s answer isn’t to ask Claude for shorter specs. It’s to make the artifact something a human will actually open, scroll, and share. A controllable, shareable artifact is most of what made personal computing legible in the first place; HTML is the format that already does it.

He puts the trade-off honestly when the obvious objection comes up:

While markdown often uses fewer tokens, I’ve found that the added expressiveness of HTML and the much higher likelihood of me reading it means I get overall better output. With the 1MM context window in Opus 4.7, the increased token usage is not really noticeable in the context window.

And the close is the real argument:

The real reason I use HTML is that I feel much more in the loop with Claude. I had begun to fear that because I had stopped reading plans in depth I would simply have to leave Claude to make its choices. But I am happy to say instead that I feel more in the loop than ever before when using HTML.

Header image accompanying Thariq Shehzad's post on switching from markdown to HTML for Claude Code agent outputs.

Using Claude Code: The Unreasonable Effectiveness of HTML

Thariq Shehzad on Anthropic’s Claude Code team switched his agent output from markdown to HTML — because what keeps Claude honest is what humans actually read.

x.com iconx.com

Owen Williams, a design manager at Stripe, sat down with Claire Vo on How I AI to walk through Protodash, the internal prototyping tool he has spent the last eighteen months building. What sticks is what Protodash has done to the handoff. Williams, describing the Radar fraud-detection team:

They literally have a pull request of a prototype that I had I see an engineer working on and I’m like this has never happened ever in my career as a design manager. They’re like “I’ll just use the prototype as the source of truth” and they can just take it and do that. There’s a huge change — not having to red line a Photoshop file or all of that stuff.

That’s the part that matters. The prototype is the code, in the same components, ready to be picked up. Protodash gets there by constraining generation: a bundle of Cursor rules, a router and chrome scaffold, and Stripe’s design system (Sail) exposed via an MCP server. The off-the-shelf tools—v0, Cursor by itself, Claude Design—produce what Williams calls “blurple slop” because they hallucinate components. Wire the generator to the actual system and the output stops looking like a Tailwind demo and starts looking like Stripe.

The fidelity jump changes the room, too:

It’s sort of been this very transformative thing because all of a sudden I’m sitting in these design reviews and it’s so convincing that I’m like, is this the real product or am I looking at something fake?

This is what Tara Tan predicted: the moat in AI design tooling is the design-system graph, and whoever makes that graph machine-readable for agents wins the enterprise. Stripe just did it, internally, with a homemade stack, meaning it’s really an uphill battle for anyone trying to make a generic tool for this use case.

The interesting thing is who shows up to use it. Williams says Protodash is now used more by PMs than designers; PMs paste a PRD from Google Docs and get back a working flow before designers are pulled in. That tracks with the Figma Make case studies — PM-led prototyping isn’t theoretical anymore.

Williams is clear-eyed about what the tool can’t do:

How can I make sure that the tool knows enough to be dangerous? It gets to 80%. But like that taste, that craft is like, that’s why designers will always exist, in my opinion. Like they know how to elevate the experience. Like this thing knows how to use the components. The components are well designed, but it’s not going to be perfect. And we are here to steer them.

The internal AI tool that’s transforming how Stripe designs products

How Stripe’s internal AI prototyping tool, Protodash, ties generation to the design system and turns the design-to-engineering handoff into a pull request.

youtube.com iconyoutube.com

Nathan Beck, a product designer in Amsterdam, opens his essay with the title “The death of design” and an immediate retraction: “LOL only jk design still alive.” Then he spends a few thousand words on why, walking through what AI tools actually do to a working designer’s day and what they conspicuously do not do.

The pivot quote is buried two-thirds in:

If you call yourself a designer and—be honest with yourself—the bulk of your role has been the production of flat pictures of user interfaces, then I’m sorry to break it to you, but you are not designing. You are styling.

That line is the whole post compressed. Beck is not arguing that AI threatens designers. He is arguing that AI threatens styling, and that a lot of people who call themselves designers have been styling for a decade and are now discovering that the part of the job AI is good at was the part they were doing.

What’s left over, in Beck’s telling, is the reflective work: the thing that happens during design, not in the final file. He quotes Kaari Saarinen on output isn’t design:

In the same way that one writes in order to understand what one is writing, one designs in order to understand what one is designing. As Kaari Saarinen explains, “Working visually keeps me close to the problem and is slow enough [sic] gives me time to think while I work. Moving things around, testing relationships, and refining structure is not separate from the thinking. It is part of how clarity emerges.”

This is the part the “designers are cooked” discourse misses. The understanding accumulated while making the Figma file was the asset all along. The file was the receipt.

Beck has a second argument running underneath the first: AI output, on its own, is aesthetically average. He quotes Nick Foster’s Dezeen piece on what software feels like after a decade of optimization:

The apps I use to hire plumbers look and feel remarkably similar to those I use to watch skiers do backflips. Every brand feels the same, every function feels the same, every interaction feels optimised, streamlined and joyless. By any measure, these pieces of software are miracles of engineering and triumphs of logic, yet they feel profoundly underwhelming to live with.

A designer who only ever produced flat pictures of those interfaces has been replaceable by a model for a while now. The judgment about which of those generic outputs should ship and which should be thrown out and rebuilt is the part no model has managed yet.

Beck closes:

However, I am cautiously optimistic that as we weather this historical conjuncture, and machine intelligence loses its sparkly aura, and weekend vibe coders increasingly learn how substantial the gap is between a prototype and a product, the role of design, however it is redefined, will be just as essential as it ever was.

That unsexy gap is the whole game. Greg Kozakiewicz updated the old construction line: we used to confuse the drawing with the building; now we confuse the prototype with the product. The demo works on a good laptop with someone who knows what the app is supposed to do. The product has to work for the user who doesn’t. Closing that gap is the orchestration job—defining the thresholds and deciding what the system should refuse to do—and when the weekend demos lose their shine.

Wireframe sketch of nested boxes connected by lines, from Nathan Beck's essay on AI and design.

The Death of Design

Nathan Beck argues AI expands the designer’s role rather than ending it. Production becomes cheap; thinking, taste, and assumption-checking become the job.

nathanbeck.eu iconnathanbeck.eu

My recent newsletter, “Out of Your Head, Into the File,” made the case for getting taste out of your head: writing it down so it can survive the messy middle of an AI workflow. Mia Kiraki, writing in Robots Ate My Homework, picks up the other half of that problem: how taste erodes when you don’t.

Her central image is the Hansel and Gretel gingerbread house, recast:

AI output is the gingerbread. You’re tired, the deadline is close. The output IS the shelter. You eat it - of course you eat it. That’s what gingerbread is made for, right? […] The Grimms buried a much smarter lesson in the early pages, before the witch even shows up. Hansel drops breadcrumbs to mark his path through the forest and the birds eat every one. He still leaves them, though. Those breadcrumbs are your taste, every little choice you make on the page (this word, this angle, this risk) which leaves a marker of who you are. The forest will always try to erase them.

What’s specific to AI is the environment. It’s now optimized to wear taste down a percent at a time, in ways you can’t feel while it’s happening, until the work you used to do feels like someone else’s.

Kiraki puts the mechanism plainly:

If most of your reading this week was AI slop (which is pretty likely given the state of the internet), you trained your judgment on machine output. […] Each accepted sentence is a small vote for a lower standard. You accept a vague phrase because the deadline is close. You let a soft claim through because rewriting from scratch would cost an hour you don’t have. […] Taste dies slowly, when you wake up someday and read something you wrote six months ago and realize you used to sound so different. You had edges, took risks, made claims, and you sounded like a person who made choices.

Kiraki gets at the failure mode that follows once taste is the only real moat left.

Her counter:

Read work that operates at a higher standard than yours. Work where someone made choices you wouldn’t have made or took risks you would have edited out. Your taste calibrates upward when you expose it to judgment that outclasses your own. […] Practice the explanation. When something in your own work feels wrong, write down the reason. The specificity of your explanation is the weapon. […] Ship work that makes you nervous. If a piece feels comfortable to publish, you probably didn’t push hard enough. The pieces that make your stomach tighten show and prove your taste is working at full capacity.

The middle one is what that newsletter was about: writing down the reasons, not just the verdict. Kiraki adds the bookends. Read above your level so your baseline isn’t drifting toward consensus. Publish the version that scares you a little, because the version that doesn’t is the gingerbread.

Featured illustration for Mia Kiraki's Substack essay on protecting taste in the AI era.

How to bulletproof your taste in the age of AI

How AI output erodes your editorial judgment, four diagnostic prompts to measure the damage, and the only protection that really, truly works.

open.substack.com iconopen.substack.com

Talking to Peter Yang, Ravi Mehta—former CPO of Tinder, now teaching AI prototyping at Reforge—walks through a live demo of building the same Spotify-style genre page three different ways. The first attempt uses a short functional prompt and produces something that, in Mehta’s words, kind of feels like AI slop. The third uses what he calls a full-stack context bundle: a functional spec, a 20-minute Figma wireframe, and a JSON file of real album data pulled together in Claude with an MCP server. The output is night and day.

His definition of the shift:

Context engineering is designing and building systems that provide an AI model with the right information and tools to accomplish the task. And I think a lot of the common mistake I see with prototyping is people don’t think about context within that 360 degree way. And as a result, people just, you know, write a quick prompt or a quick little mini spec and expect the prototype tool to be able to create something as high fidelity as what they used to create before when they had all of these different artifacts that are a critical part of the product lifecycle.

That definition will sound familiar to anyone who saw Philipp Schmid’s framing of context engineering when it first circulated. Same emphasis on “right information and tools.” It’s the working definition the field has settled on. What Mehta adds is the concrete answer to “okay, what are the three things you actually have to assemble?” Functional context (a spec), visual context (a wireframe), and data context (real structured JSON, not lorem ipsum). Skip any of them and the prototype either looks generic, behaves wrong at edge cases, or breaks suspension of disbelief the moment a real customer touches it.

The piece I want to underline is his defense of visual thinking, because the “designers are obsolete” takes haven’t stopped, and Mehta gives them a clean rebuttal:

So if you start to think differently about the different types of context that are available, you can actually get much more specific and have a lot more control over what gets built and build something that’s a lot more robust. This is functional context. The next level that is really important is visual context. […] And so here, I very quickly in Figma, just taking 20 minutes, done a wireframe, and sort of outlined what I want this interface to look like. […] The prototype needs to have a level of fidelity that’s hard to get with sort of traditional prompting techniques.

Twenty minutes in Figma, then a short prompt that says “use the attached wireframe.” A wireframe does what a 17-page PRD and three rounds of trying to describe a layout in English to the model can’t. The wireframe is part of the input to the deliverable now.

The corollary cuts the other way too. If the wireframe is now an AI briefing document, the people who can produce a decent one in twenty minutes have a real edge over the people who can’t. That’s still designers, still us. It’s just that the wireframe now feeds the model directly, not only the engineer reading the spec next sprint.

Everything You Need to Know About Context Engineering in 40 Minutes

Ravi Mehta builds the same Spotify-style page three times to show how functional spec, visual wireframe, and real data each level up an AI prototype.

youtube.com iconyoutube.com

I wrote about this whole family of files in my recent newsletter: DESIGN.md, SKILL.md, SOUL.md, the markdown artifacts you write so an agent can read them. Nick Babich has the practitioner walkthrough for the DESIGN.md flavor of it, specifically the version that Google Stitch reads when it generates a screen. He describes the format directly:

DESIGN.md is a markdown file with two layers: YAML front matter that contains machine-readable design tokens (exact hex values, font properties, spacing scales) and Body that features a human-readable design rationale.

The two-layer split is right. The YAML is the part the agent can’t argue with: primary: "#d97706" is #d97706. The body is where you tell the agent why, and it has to be written like prose, not a config file. Babich’s philosophy section is where I’d point a designer who’s about to write their first one:

Unlike a traditional specification that often has very specific details that designers should follow when crafting a new design, DESIGN.md is less prescriptive in its nature. It creates a solution foundation for AI tools (colors, typography, corner radius) while providing enough freedom to alter the format for domain-specific needs. Another thing is that DESIGN.md is a living artifact, not a static config file. It should evolve as your design evolves.

The “less prescriptive” line is counterintuitive. You’d think the whole point of feeding rules to an agent is to be more prescriptive, not less. But Babich is right about the shape: pin down the tokens, leave the application loose, refine the file as the agent surfaces edge cases you didn’t think about. These files hold what we used to keep in our heads and call taste, and you don’t write taste like a requirements doc. You write it like a brief, and you keep editing it.

Article header illustration for Nick Babich's UX Planet piece on the DESIGN.md format.

What is DESIGN.md and How To Use It

One of the biggest challenges with AI design generators is producing consistent output. Even with detailed instructions, AI can drift away from the spec.

uxplanet.org iconuxplanet.org

Emil Kowalski, a design engineer at Linear, takes the case for designers who can articulate why a choice works one step further. Once you can explain it, you can hand the rule to an agent.

An engineer has never been more leveraged than today thanks to a fleet of agents. But when it comes to more visual work, like animations, coding agents don’t quite know what great feels like.

My way of getting there is to create a skill file for each aspect of the interface. If you know what great feels like, describe the rules, then give them to your agents so they can follow them.

Kowalski shows two animations side by side, one scaling from scale(0) and one from scale(0.95), and walks the reader from “this feels right” to a real-world reason why:

With enough experience, you can not only tell what feels better, but also why. By then you’ve not only built your taste, but also the ability to articulate it.

The correct animation below feels right, because it animates from a higher initial scale value. It makes the movement feel more gentle, natural, and elegant.

scale(0) on the left feels wrong because it looks like the element comes out of nowhere. A higher initial value resembles the real world more. Just like a balloon, even when deflated it has a visible shape, it never disappears completely.

This is what Ian Guisard at Uber does as a design systems lead: encoding expertise, writing agent skills, defining validation rules, deciding what “correct” means. Nick Babich’s piece on agentic product design covers what makes an agent an agent; Kowalski’s piece shows what an agent actually runs on.

That’s the why. There’s no magic involved. Almost every “taste” decision has a logical reason if you look close enough. This applies to any other discipline really.

Of course the more creative part of the job is still up to you, but the more you can package into a skill, the more leverage you can get out of your agents.

Bold text reading "Agents with Taste" on a white background.

Agents with Taste

How to transfer taste into an AI.

emilkowal.ski iconemilkowal.ski

PJ Onori built a tool that A/B tests his design system against AI agents, and he’s careful to say it isn’t impressive:

Two groups of agents get spun up, and both are given the same prompt to make an interface. One group’s given the old design system. The other is given our new one. Each agent provides feedback on problems faced after it’s done. Once all agents finish, the builds are evaluated on a bunch of crap and a report is generated.

The list of what the tool measures is long: timing, lines of code, code variance, fix attempts, components used, accessibility, performance, inline styles, visual diff, token usage, agent feedback. Onori, on the test he ran when he wasn’t sure his documentation was actually doing the work:

I was starting to question if documentation was making things better. Maybe component improvements was doing the heavy lifting–who knows? So, I ran a couple tests without documentation… The documentation was clearly the heavy lifter. […] Documentation is essential for systems that agents don’t have a lot of reps with. I’ve started to add a “For agents” section in the docs. That section is the dumpster for “get it in your silicon head” training.

The “For agents” section is a small idea with a real implication. Documentation has historically been written for one audience. Now there are two, and as Onori says elsewhere in the post, the second one needs “the same damned point” repeated five or six times and doesn’t care if the prose is ugly. His instinct is to wall that off so humans don’t have to read it.

Onori is publishing measurements where most people are publishing takes. That’s the missing piece in the design-system-as-moat argument: somebody actually testing whether agents do better with a well-built system than a worse one, and showing the numbers. Onori, on the closing caution:

There’s a lot of noise in the output, feedback, and analysis–otherwise know as everything. That noise compounds fast. Think of the telephone game–then think about what that’d do to a design system. […] Feedback needs to go through a BS filter. […] The feedback part of the analysis is helpful. Make no mistake. But it needs to heavy interpretation.

The telephone game is the right picture. A design system that updates itself based on agent feedback that’s been generated by other agents and analyzed by a third agent is going to drift somewhere strange in a small number of iterations, and nobody on the team will be able to reconstruct why. Onori’s tool stops short of that on purpose: it produces measurements, and a person reads them.

Stippled illustration of a person sitting at a desk, leaning forward and writing or working on something.

Testing agents on design systems

It’s really easy to say agents are able to use a design system. It’s another thing to prove it.

pjonori.blog iconpjonori.blog

Marcus Moretti’s guide to agent-native product management, in Every, is the orchestration shift showing up on the PM side of the team. The guide opens with the 1930s Procter & Gamble origin story: someone owns the product. The job has been rewritten so many times since then that PMs are now expected to be design partners, diplomats, sales people, and statisticians on top of running the 100+ software subscriptions the average company buys. What’s interesting is that the piece is describing the old role, finally legible again now that agents can absorb the administrative debt that piled up on top of it.

Now, much of the interdisciplinary work that goes into product management can be done by an LLM in minutes, sometimes seconds. What used to be a three-hour-long analytics investigation is now a simple back-and-forth with Claude. A product review that used to be a fortnightly chore emerges from a single typo-ridden chat message. This has been my recent experience, at least. I no longer struggle with semicolons in SQL queries or even write tickets. All of my product management work happens in conversation with, in my case, Claude Code. The conversation is the work.

“The conversation is the work” sounds like a description of the new job. Read it next to the 1930s origin story and it’s a description of the old one. The Brand Man at P&G wasn’t writing SQL; he was deciding what the product should be and who it was for. The intervening ninety years of accumulated tooling—agile ceremonies and ticket hygiene, analytics dashboards on top of those—was friction PMs had to push through to get back to the actual work. Moretti’s /ce-strategy command, modeled on Richard Rumelt’s Good Strategy Bad Strategy, isn’t a new artifact either. Strategy documents predate LLMs by decades. What’s new, Moretti says, is the cadence: every few months, the agent re-runs the strategy interview with the accumulated context of everything you’ve shipped.

Writing a strategy document cold is hard. The best way to do it, I’ve found, is to have an agent interview you. The ce-strategy skill does this. It runs through the sections in order and has built-in guidance about what makes a good answer (and what kinds of answers to push back on). […] The interview is deliberately conversational. If the first answer to, “What’s the core problem this product solves” is vague, the agent drills down: “Whose situation specifically? What do they try today, and why doesn’t it work?” The guidance here is taken from personal experience and from the Rumelt book.

The guide assumes a PM who has the taste to recognize when the agent’s follow-up has exposed a gap. The ones who don’t will end up with a strategy.md full of confident-sounding nonsense, generated quickly and reviewed lightly. Agent-native PM removes the alibi that you were too busy with tickets to do the actual thinking. That maps to a warning from Raj Nandan Sharma: when generation gets cheap, the scarce skill is refusal: knowing what to throw out and why. Moretti’s PM is doing exactly that, sentence by sentence, in the strategy interview.

Moretti closes:

LLMs have allowed our tools to catch up with the multifaceted duties of product managers. For me, product management has been reduced to the interesting parts: dreaming up features, thinking through designs, looking at interesting data, and talking to users. We all feel the economic imperative to embrace AI tools, but the better reason, I think, is to make work more fun.

Hand-drawn letter "G" in black chalk-style script on a light blue background, with a black bookmark icon in the top-left corner.

A Guide to Agent-native Product Management

A step-by-step guide to using agentic capabilities for better product management

every.to iconevery.to

Nick Babich on agents in UX Planet. A useful pair to his earlier writeup on Claude skills, since the two words get used interchangeably and they are not the same thing. Babich opens with the plain-language version:

Think of an AI agent as a program you run when you need to solve a particular problem in design. For example, you can create an AI agent that helps you with usability testing, code review, UI/UX audit, etc.

A program you run is the right mental model. A skill, the way Babich described it in his earlier piece, is a recipe: a markdown file Claude reaches for when a task matches. An agent is what runs once Claude has the recipe in hand. It carries state across steps, picks tools, reports back.

Babich’s four attributes of a well-designed agent get at that distinction without saying it out loud:

  1. Good clarity (intent alignment). A strong agent understands what success looks like, not just the task. This understanding helps it translate vague prompts into clear objectives.
  2. Context awareness. Good agents maintain and use context effectively. Not only do they remember previous steps, constraints, and user preferences (which is well-expected behavior nowadays), but they also adapt output based on the environment (tools, data, stage of workflow).
  3. Tool orchestration. Agents can perform the workflow autonomously and they have the ability to use the right tools for a task at hand is what makes an agent so powerful. Well-crafted agents can chain tools together into workflows, and they don’t overuse tools when simple reasoning is enough.
  4. Explainability (transparent reasoning). When you interact with an AI agent, you need to understand why something happened. Thus, an AI agent should provide a rationale behind decisions surface assumptions, and trade-offs.

Context awareness and tool orchestration are what separate an agent from a prompt template. A skill can ship intent alignment and explainability in plain markdown, but state across steps and the ability to chain tools require a runtime. That’s why Babich’s specs include Boundaries sections and “When Not To Use It” blocks: a stateful, tool-using program needs guardrails that a one-shot prompt does not.

If you haven’t built one yet, his five specs—Research Synthesizer, Competitor Intelligence, Problem Definition, Idea Generation, UX Flow Designer—are a clean starter pack. Pick the one closest to a workflow you already do by hand, and notice how much of the spec is about what the agent will not do.

3D illustration of an orange robot head with a maze inside its open skull, glowing circuit lines extending outward to orange cube nodes.

Agentic Product Design

5 design tasks you can automate with AI today

uxplanet.org iconuxplanet.org

Tommy Geoco’s $13,100 OpenClaw harness, ninety days in, is one way to build a personal AI agent. Anton Sten went the other way. He tried OpenClaw and Hermes, found the setup was “days, sometimes weeks, for minutes of return,” and built something smaller. Five Claude Code instances on a Mac mini, named after Suits characters, each handling one role. Architecture is a shared repo and a pile of markdown files. That’s it. Most AI-agent posts pitch what Sten calls “a team of bots that runs your business while you sleep.” His basement firm is the inversion.

Sten on what he actually wanted from his agents:

What I actually wanted was smaller. A handful of tools, each with a narrow job, that I could build in an afternoon and shape around how I actually work. So that’s what I did.

The names of his AI agents are from the show Suits (with Wendy borrowed from Billions), picked so the show’s personalities double as memory aids for each agent’s job. Harvey handles contracts and pricing. Donna takes Harvey’s notes and drafts the emails and follow-ups. Mike stores what Sten would otherwise forget. Louis worries about money. Wendy reads the others’ logs and points out where they’re slipping.

Sten on the autonomous-revenue pitch:

The team in my basement isn’t running anything autonomously. They don’t make decisions for me. If I unplugged the Mac mini tomorrow, my business would keep running. The conflation in the current AI conversation — between playing and building a thing that prints money — is the part I find a bit tiring. They’re treated as the same activity, when they’re almost opposites.

Sten’s right that the autonomous-revenue pitch is a fantasy. Less right on the binary that follows. Geoco’s harness is doing meeting prep, ingesting his survey research, and distributing his content across ten platforms while he sleeps. That counts as “running while you sleep,” and his $50,000 in sponsorship revenue from one survey project isn’t trivial. Play and revenue can sit on the same side. What matters is whether the human stays in the loop. Geoco does, and so does Sten.

The shape of what they’re building is also the same. The Harvey-to-Donna handoff Sten uses most and Geoco’s survey-prep loop are both the specialization-is-the-whole-game pattern: narrow specialists, human in the loop, work compounding into the system. Sten calls it play and Geoco calls it work. The architecture underneath does the same job either way.

Sten on practice:

I’d argue this is the business case for designers right now. Not the agents specifically — the playing. Because in a year or two, every job worth having is going to assume you understand how these tools work, and the only way to understand them is to spend time in them when nothing’s on the line.

The people who’ll do interesting work with this stuff in two years are the ones playing with it badly today.

Geoco is what Sten’s last sentence predicts. The person playing badly today is the person doing interesting work in two years. Sten describes that person as hypothetical. Geoco isn’t.

The basement firm

There’s a Mac mini in my basement running a small consulting firm. Five employees, all named after TV characters, none of them human. They take notes, write drafts, remember things I’ve forgotten, argue with my financial instincts, and occasionally tell each other to do better.

antonsten.com iconantonsten.com

Tommy Geoco spent ninety days and $13,100 tinkering with OpenClaw. His agent runs his capture loop, prepares his meetings, codes the survey for the state-of-prototyping report his studio shipped, and distributes his content across ten platforms. Tom describes the harness like this:

When you install OpenClaw, it is like a starter kit project car. It is a car frame with a swappable engine. The engine being any AI model you choose to use. It is basically a folder that you install onto your computer that contains about seven markdown files. […] When you stop thinking of a custom agent as just a chatbot and start thinking of it like an operating system, some useful questions are going to start to pop up like where does the memory live? What is the source of truth? How do I enforce my rules better? What should stay manual?

The seven files are plain text. soul.md holds the agent’s voice and judgment, agents.md defines permissions, memory.md handles long-term recall, and four others cover identity, the user, tool instructions, and a heartbeat. Tom layers an Obsidian vault on top as long-term knowledge and Slack as the chat surface. Tom on what actually limits an agent:

The agent’s limitations aren’t just about the model. They’re a lot more about the system that you have built around it because you can’t control the quality of the model, but you can control the quality of the system. […] The most important part of my setup is the knowledge vault. This is my alternate memory, and it is built around the work that I actually do.

Geoco says curation is what keeps the whole thing from drifting. The agent runs the loops on top of a vault Geoco curates, and the taste lives with him; the model itself is interchangeable. The challenging part is somewhere else entirely:

The most challenging part of this whole thing is the unlearning. Many of us have old habits that have calcified into our brain. It is why my 17-year-old is able to run laps around us. He has no baggage about how things are supposed to work.

Geoco is right that the unlearning is where the difficulty lives. The harness is just markdown and the model is rented; the orchestration skill Benhur Senabathi described as what designers actually picked up in 2025 is what you practice through the unlearning. Geoco closes the video by saying nobody’s harness is right and everybody’s works for them, which sounds about right to me too.

How I Built an AI Agent That Designs Like Me

This is a practical breakdown of what an OpenClaw agent is, and how I use it for my design and media studio.

youtube.com iconyoutube.com

George Anders, in the Wall Street Journal, makes the case that the 1920s offer a usable template for the AI decade. His strongest evidence is the spillover-jobs data:

By 1930, more than 80,000 people were working as electricians, a profession that hardly existed a decade before. Census data also showed that 168,000 people were working in rubber factories, most of them making tires to accommodate Detroit’s booming production of cars, trucks and buses. Another 450,000 people were building roads, bridges and other structures needed by the ever expanding auto industry.

The ATM parable had the same problem: the version that ends in 2010, with bank-teller employment intact, is the one we love to retell. The version that ends in 2022, with teller jobs cut in half by the iPhone, is the one we leave out. Anders’s 80,000 electricians are real. So is the question of which of them got displaced when the next technology arrived.

Anders does, to his credit, take the costs seriously. He spends a section on the radio fight:

In 1927, H.G. Wells, the British author and intellectual, called radio “inferior” entertainment that should be listened to “only by the sick, the lonely and the suffering.” David Sarnoff, general manager of Radio Corp. of America, shot back that he was trying to improve “the happiness of the nation” by delivering popular music to millions of people. Nearly a century later, that same argument still flares, though now it is more likely to involve TikTok, Reddit or YouTube, instead of dear old radio. The doubters always have a point; with the passage of time, the innovators usually win out.

The early evidence on AI’s job-creation side is thinner than the 1920s comparison flatters: Anthropic’s own researchers find a 14% drop in the job-finding rate for 22-to-25-year-olds in exposed occupations since ChatGPT launched, even as overall unemployment holds. The new electricians of our decade may exist. They just may not be the people getting hired right now.

The safety side of Anders’s case is the one I want to see more of. Cars in 1920 killed at twenty times today’s per-mile rate, and the country chose not to live with that:

Auto safety got better, too, with both industry and government taking action. Better mirrors, better brakes and shatterproof windshields became standard. Cities such as Los Angeles and Detroit installed red-yellow-green traffic lights that governed drivers’ actions on busy streets. New Jersey became the first state to insist on driver’s licenses, with the state’s motor-vehicle commissioner in 1924 declaring: “It is an absolute necessity to do this in order to conserve human life.”

Whether the next century treats our decade as kindly depends on whether we put rearview mirrors and traffic lights on AI before the death rates make us, and whether we do it under the same kind of duress the 1920s did.

Vintage black-and-white photo of an early automobile displayed in a storefront window with bold striped decorations and a sign reading "Auto Show Jan. 19-25 Auditorium Milwaukee.

What the 1920s Can Teach Us About Surviving the AI Revolution

(Gift link) A century ago, cars and radio upended society just as AI is doing today.

wsj.com iconwsj.com

Jake Albaugh wrote a piece on X called “Design is the work” that splits design from the artifacts it produces. Mocks, prototypes, screens, guidelines: those are outputs. Design itself, in his telling, is the upstream act of intent: figuring out what something should be and why, before anyone makes it. Bingo. That distinction matters now because AI is very good at the artifact and unable to do the deciding:

AI cannot do that part. You intend to do something that has not yet happened. You have to bring those parameters to the table to do anything novel. AI doesn’t know your constraints. It doesn’t know your strategy. It doesn’t know what moment in the market you’re in, what your team is trying to prove, or what your customers actually need versus what they’ve said they want. The expectation — the definition of what good looks like — is something only you can provide. AI’s job is to meet that expectation. Not to define it.

The piece made the case that intentionality has to come before execution and that AI changes neither requirement. The closer is where it gets interesting. After all that, Albaugh tells the reader he used AI to draft the essay:

It may surprise you to learn that I used AI to write this. The structure, the sentences, a lot of the phrasing — generated. But the argument existed before any of it. I knew what I was trying to say. I knew what examples mattered and which ones were wrong. I knew when a paragraph was close but not quite right, and I revised toward a target I’d already defined. […] That’s the point. The tools changed. The work didn’t. Design is the process. Design is the intentionality.

It’s a risky reveal. Most readers will read it as self-undermining at first. But the argument and the artifact are doing the same job: Albaugh had a target, and he used AI to reach it. The fact that the prose was generated is exactly why it matters that the argument wasn’t. He knew which examples belonged in the piece and which ones to throw out. The model couldn’t have known that either way, because the criteria for “good” didn’t exist anywhere outside his head until he wrote them down.

Karri Saarinen made a version of this same split when he argued that output isn’t design. The hard part is understanding the problem well enough to know what should exist at all.

A presenter stands on stage in front of a green slide reading "What should be automated? What should be left to touch?

Design is the work.

We’re in a moment where it has never been cheaper or faster to build something convincing. The cost of taking an idea and making it look real, feel functional, or seem finished has collapsed. That is genuinely good news if you already know what you’re building and why. It’s dangerous if you don’t.

x.com iconx.com

Matt Ström-Awn, writing on his personal site, picks up a three-year-old line from Ted Chiang and turns it inside out:

Three years ago, Ted Chiang described ChatGPT as a blurry JPEG of the web. LLMs are a lossy compression of their training data, which is itself a lossy sample of all the data available to it. But the artifacts we see in AI slop aren’t in the compression. They’re in the decompression.

Every AI-generated output is an extrapolation from that blurry source, vectored toward your prompt, filling in plausible detail where the compression threw information away. The output gets inflated into blog posts and LinkedIn thoughtspam, software platforms, omnichannel advertising campaigns, and movie cameos from dead actors. Chiang compared the gaps and confabulations to compression artifacts.

I think they’re expansion artifacts.

Chiang had the compression metaphor; what we needed was a word for what these tools do on the way back out, and Ström-Awn gave us one.

Ström-Awn lists what expansion artifacts look like across modalities:

  • LLMs produce text stuffed with hedging verbs and fuzzing adjectives (delve, intricate, tapestry, multifaceted). Their paragraphs are structured as miniature essays with setup, payoff, and a signposted takeaway (This matters because…).
  • AI-generated code over-comments the obvious and creates error handlers for operations that can’t logically fail.
  • Image generators have had their own tells: six-fingered hands, symmetrical-but-stylistically-objectionable jewelry, text that looks like text but only if you cross your eyes.
  • Video models struggle with continuity. Limbs appear and disappear, objects clip through each other, and physics sometimes just switches off.

Each of these artifacts is the training distribution leaking through where the model’s confidence runs thin.

Ström-Awn writes about the designer-specific tells too:

Power users of AI website generators (AI-pilled designers) already know how to recognize the tool marks, if only to try to prompt them away: purple gradients are an especially common tell. But as more and more non-designers use tools like Claude Design to prompt their way to fully-functional software products, I expect to see a preference for the aesthetic convergence endemic to the current crop of AI models.

Matt Ström-Awn website header showing the page title "Expansion artifacts" in large bold text on a white background.

Expansion artifacts

Matt Ström-Awn · Designer, leader, and coach focused on building exceptional products and teams.

mattstromawn.com iconmattstromawn.com

Cat Wu, Anthropic’s Head of Product for Claude Code, describes the hiring filter on her team in her interview with Lenny Rachitsky:

I think all of the roles are merging. PMs are doing some engineering work. Engineers are doing PM work. Designers are PMing and also landing code. You can either hire a lot more engineers who have great product taste, or you can keep your engineering hiring the same and hire a lot more PMs to help guide some of their work. On our team, we’re pretty focused on hiring engineers with great product taste. This way we can reduce the amount of overhead for shipping any product. Like there are many engineers on our team who are fully able to end to end go from see user feedback on Twitter through to like ship a product at the end of the week with almost no product involvement. And this, I think, is actually like the most efficient way to ship something. So I think like engineer and PM are kind of overlapping and you will get a lot of benefit from having more of either. I think product taste is still a very rare skill to have and we’ll pretty much hire anyone who we feel has demonstrated this strongly.

This is what the Full Stack Builder pattern looks like as a hiring filter. The headline is the merging of roles. Wu’s own background says where the bench comes from:

Yeah, I was an engineer for many years. I was then a VC very briefly before joining Anthropic. And actually almost all the PMs on our team have either been engineers or ship code here on Claude Code. And so that’s one of the things that I think helps build trust with the team and also just enables us to move a lot faster. And then actually our designers also have been front-end engineers before.

So to be clear, Wu doesn’t say that the roles have merged, but what she’s describing is the continued blurring of lines.

How Anthropic’s product team moves faster than anyone else | Cat Wu (Head of Product, Claude Code)

Cat Wu is Head of Product for Claude Code and Cowork at Anthropic, building one of the most important AI products of this generation. Before joining Anthropic, Cat spent years as an engineer and briefly worked in VC. Today, she’s interviewing hundreds of product managers who are trying to break…

youtube.com iconyoutube.com

Maggie Appleton, staff research engineer at GitHub Next, wrote up her recent talk on agentic AI productivity. (Video here if you’d rather watch.) Her central claim comes early:

I call it this “one man, a two dozen claudes” theory of the future. The pitch here is that one person with a fleet of agents will do the work of an entire team of developers. The main problem with this dream is it assumes software is made by one person. All these tools are single player interfaces. […] Software is not made by one person in a vacuum. It’s a team sport. Everyone building it needs to agree on what they’re building and why.

The single-player critique is the missing piece in most AI productivity takes. Most demos of a coding agent show one engineer at a terminal. Designers face the same situation with AI prompt-to-code tools. Collaborating isn’t as easy as sharing a Figma link. That’s the actual gap in current tooling, and it’s downstream of the single-player assumption.

Appleton’s second move:

Implementation is rapidly becoming a solved problem, right? Writing code is now fast, it’s getting cheap, and quality is going up and to the right. The hard question is no longer how to build it. It’s should we build it. Agreeing on what to build is the new bottleneck. […] When production is cheap, opportunity cost becomes the real cost. You can’t build everything, and whatever you pick comes at the cost of everything else.

When production is cheap, picking what to make becomes the whole job. The cost difference between two engineering paths is now nearly zero, so the choice between them carries all the weight. Teams that miss this will end up shipping volume and mistaking it for productivity.

A talk like this could be about tooling, and Appleton does walk through Ace, GitHub Next’s prototype multiplayer workspace, in some detail. But the more important argument is about what you do with the hours you free up. Going faster is not the prize. Appleton:

We have an opportunity to not just go faster and build a giant pile of the same crappy software. But instead to make much better software through more rigorous critical thinking and better alignment in the planning stage. By doing more exploration, more research, and thinking through problems more deeply than we could have before.

The reclaimed hours are an opportunity, but they are also a test. Do you spend them shipping more, or do you spend them shipping better? The first answer gets you the giant pile. The second takes work the agents cannot do for you.

Appleton closes on craft:

Many people are now realising that in a world of fast, cheap software, quality becomes the new differentiator. The bar is being set much higher. Craftsmanship is what will set you apart from the vibe-coded slop. But craft still costs time and energy. It is not free, and in order to buy the time and energy for it, you need to do fewer things better, which requires strong alignment.

Title card for "One Developer, Two Dozen Agents, Zero Alignment" — a talk about collaborative AI engineering and a tour of Ace, the multiplayer coding workspace.

One Developer, Two Dozen Agents, Zero Alignment

Why we need collaborative AI engineering and a tour of Ace: the multiplayer coding workspace

maggieappleton.com iconmaggieappleton.com

Andy Matuschak describes two accidental tyrannies that have shaped software for forty years: the application model that traps software in one-size-fits-all packages, and programming as a specialization that crowds out non-programmers from inventing interfaces. He thinks coding agents could break both, and he’s already seeing it happen with the designers he works with:

I’ve been seeing it. I spent 2025 collaborating with two talented designers. Their story with coding agents this past year has been truly wild. I think the impact on my collaborators has been much greater than the impact on me, despite the fact that I’m now building perhaps ten times the speed.

Unlike me, these two started their careers in design and spent their formative years in the arts culture. They can program a bit, but the process was really slow and difficult enough to pose a significant barrier. At the start of 2025, coding models could implement small one-off design ideas—but their outputs would just fall apart after a couple of iterations. By the end of the year, my collaborators were routinely prototyping novel interface ideas and sustaining that iteration across weeks.

“The impact on my collaborators has been much greater than the impact on me.” Matuschak is moving ten times faster, and he still thinks his designers are the ones whose careers just turned over. That observation is rare from the person on the receiving end of the bigger gain in raw output.

Matuschak’s diagnosis of why the old arrangement was such a trap for designers:

Non-programming designers are trying to invent something in an interactive medium without being able to make something meaningfully interactive. So much of invention is about intimacy with the materials, tight feedback, sensitive observation, and authentic use. So it’s a catch-22: to enter into proper dialogue with their medium, a non-programmer needs to get help from a programmer. That generally requires the idea to be at least somewhat legible and compelling. But if they’re doing something truly novel, they often can’t make it legible and compelling without being in that close dialogue with their medium.

The old design-engineering separation trapped designers in a less obvious way. They often couldn’t even tell whether their ideas were brilliant, because they couldn’t get their hands on the material to find out. You can’t iterate on a feeling. You have to push something around until it pushes back. For most of my career, designers did that pushing in flat mockups and click-through prototypes, working through dynamic behavior they had never actually felt. Of course the technical ideas fell short. The designers themselves hadn’t felt the thing yet either.

That’s the asymmetry coding agents collapse. The loop between “I have an inkling” and “I am tinkering with a working version of the inkling” has finally closed for non-developers. They still can’t and mostly shouldn’t ship production code, but they don’t need to. The prototype is enough to do the design work. Once the gatekeeping melts, the next question is institutional: where does the next generation of interface inventors come from? Matuschak’s answer:

So, what now? We’ve spent decades building HCI programs that mostly look like computer science departments with design electives. But if we’re moving toward a world where invention is bottlenecked more on imagination than on technical expertise, we may have that backwards. We may need programs that look a little more like art school with technical electives—learning to develop ideas from intuition before being able to express them precisely, to discover by playing with the material.

Title slide and content page from Andy Matuschak's MIT HCI Seminar talk "Apps and programming: two accidental tyrannies" dated 2026-03-03, showing a table of contents and lecture notes.

Apps and programming: two accidental tyrannies

On coding agents, malleable software, and the future of interface invention

andymatuschak.org iconandymatuschak.org