Skip to content

Owen Williams, a design manager at Stripe, sat down with Claire Vo on How I AI to walk through Protodash, the internal prototyping tool he has spent the last eighteen months building. What sticks is what Protodash has done to the handoff. Williams, describing the Radar fraud-detection team:

They literally have a pull request of a prototype that I had I see an engineer working on and I’m like this has never happened ever in my career as a design manager. They’re like “I’ll just use the prototype as the source of truth” and they can just take it and do that. There’s a huge change — not having to red line a Photoshop file or all of that stuff.

That’s the part that matters. The prototype is the code, in the same components, ready to be picked up. Protodash gets there by constraining generation: a bundle of Cursor rules, a router and chrome scaffold, and Stripe’s design system (Sail) exposed via an MCP server. The off-the-shelf tools—v0, Cursor by itself, Claude Design—produce what Williams calls “blurple slop” because they hallucinate components. Wire the generator to the actual system and the output stops looking like a Tailwind demo and starts looking like Stripe.

The fidelity jump changes the room, too:

It’s sort of been this very transformative thing because all of a sudden I’m sitting in these design reviews and it’s so convincing that I’m like, is this the real product or am I looking at something fake?

This is what Tara Tan predicted: the moat in AI design tooling is the design-system graph, and whoever makes that graph machine-readable for agents wins the enterprise. Stripe just did it, internally, with a homemade stack, meaning it’s really an uphill battle for anyone trying to make a generic tool for this use case.

The interesting thing is who shows up to use it. Williams says Protodash is now used more by PMs than designers; PMs paste a PRD from Google Docs and get back a working flow before designers are pulled in. That tracks with the Figma Make case studies — PM-led prototyping isn’t theoretical anymore.

Williams is clear-eyed about what the tool can’t do:

How can I make sure that the tool knows enough to be dangerous? It gets to 80%. But like that taste, that craft is like, that’s why designers will always exist, in my opinion. Like they know how to elevate the experience. Like this thing knows how to use the components. The components are well designed, but it’s not going to be perfect. And we are here to steer them.

The internal AI tool that’s transforming how Stripe designs products

How Stripe’s internal AI prototyping tool, Protodash, ties generation to the design system and turns the design-to-engineering handoff into a pull request.

youtube.com iconyoutube.com

Nathan Beck, a product designer in Amsterdam, opens his essay with the title “The death of design” and an immediate retraction: “LOL only jk design still alive.” Then he spends a few thousand words on why, walking through what AI tools actually do to a working designer’s day and what they conspicuously do not do.

The pivot quote is buried two-thirds in:

If you call yourself a designer and—be honest with yourself—the bulk of your role has been the production of flat pictures of user interfaces, then I’m sorry to break it to you, but you are not designing. You are styling.

That line is the whole post compressed. Beck is not arguing that AI threatens designers. He is arguing that AI threatens styling, and that a lot of people who call themselves designers have been styling for a decade and are now discovering that the part of the job AI is good at was the part they were doing.

What’s left over, in Beck’s telling, is the reflective work: the thing that happens during design, not in the final file. He quotes Kaari Saarinen on output isn’t design:

In the same way that one writes in order to understand what one is writing, one designs in order to understand what one is designing. As Kaari Saarinen explains, “Working visually keeps me close to the problem and is slow enough [sic] gives me time to think while I work. Moving things around, testing relationships, and refining structure is not separate from the thinking. It is part of how clarity emerges.”

This is the part the “designers are cooked” discourse misses. The understanding accumulated while making the Figma file was the asset all along. The file was the receipt.

Beck has a second argument running underneath the first: AI output, on its own, is aesthetically average. He quotes Nick Foster’s Dezeen piece on what software feels like after a decade of optimization:

The apps I use to hire plumbers look and feel remarkably similar to those I use to watch skiers do backflips. Every brand feels the same, every function feels the same, every interaction feels optimised, streamlined and joyless. By any measure, these pieces of software are miracles of engineering and triumphs of logic, yet they feel profoundly underwhelming to live with.

A designer who only ever produced flat pictures of those interfaces has been replaceable by a model for a while now. The judgment about which of those generic outputs should ship and which should be thrown out and rebuilt is the part no model has managed yet.

Beck closes:

However, I am cautiously optimistic that as we weather this historical conjuncture, and machine intelligence loses its sparkly aura, and weekend vibe coders increasingly learn how substantial the gap is between a prototype and a product, the role of design, however it is redefined, will be just as essential as it ever was.

That unsexy gap is the whole game. Greg Kozakiewicz updated the old construction line: we used to confuse the drawing with the building; now we confuse the prototype with the product. The demo works on a good laptop with someone who knows what the app is supposed to do. The product has to work for the user who doesn’t. Closing that gap is the orchestration job—defining the thresholds and deciding what the system should refuse to do—and when the weekend demos lose their shine.

Wireframe sketch of nested boxes connected by lines, from Nathan Beck's essay on AI and design.

The Death of Design

Nathan Beck argues AI expands the designer’s role rather than ending it. Production becomes cheap; thinking, taste, and assumption-checking become the job.

nathanbeck.eu iconnathanbeck.eu

Scott Berkun lists three portable superpowers most designers underrate in themselves: investigative curiosity, the ability to translate between people who can’t understand each other, and a working grasp of tradeoffs. The first one is where he starts:

If we can spend hours reading about the 16th-century French history behind the beloved font Garamond, or studying the details of the design prototypes Jonathan Ives made to create the first iPhone, we have the rare capacity to discover and digest layers of complex information for practical use in solving problems.

Designers tend to file “I went deep on Garamond’s history” as a hobby or a tic, not a transferable skill. Berkun’s point is that the depth is the skill, and the subject is interchangeable. Aim it at a thing your CEO is worried about and you’re suddenly the person who knows the most about it in the room.

On translation:

Someone who explains things clearly, including through insightful sketches, diagrams, or metaphors, has tremendous value. Explainers help people make sense of each other. Designers are often shy about their ability to explain things, but typically we’re better at this than other professionals, since our work is rooted in communication (even visual design is rooted in semiotics, the study of symbols and their meaning). If we can be curious about our coworkers’ perspectives, objectives, and frustrations, we can be translators.

Berkun has made the curiosity argument before, in the negative, when he listed lack of curiosity as one of the five worst habits a designer can have. Reading this piece next to that one, the two halves connect: the habit he warns against in one post is the superpower he’s asking us to revive in this one.

Featured illustration for Scott Berkun's Substack essay on designer superpowers.

Revive your design superpowers

Scott Berkun names three portable designer superpowers — investigative curiosity, translation between teams, and tradeoff negotiation — that we underrate in ourselves.

whydesignishard.substack.com iconwhydesignishard.substack.com

My recent newsletter, “Out of Your Head, Into the File,” made the case for getting taste out of your head: writing it down so it can survive the messy middle of an AI workflow. Mia Kiraki, writing in Robots Ate My Homework, picks up the other half of that problem: how taste erodes when you don’t.

Her central image is the Hansel and Gretel gingerbread house, recast:

AI output is the gingerbread. You’re tired, the deadline is close. The output IS the shelter. You eat it - of course you eat it. That’s what gingerbread is made for, right? […] The Grimms buried a much smarter lesson in the early pages, before the witch even shows up. Hansel drops breadcrumbs to mark his path through the forest and the birds eat every one. He still leaves them, though. Those breadcrumbs are your taste, every little choice you make on the page (this word, this angle, this risk) which leaves a marker of who you are. The forest will always try to erase them.

What’s specific to AI is the environment. It’s now optimized to wear taste down a percent at a time, in ways you can’t feel while it’s happening, until the work you used to do feels like someone else’s.

Kiraki puts the mechanism plainly:

If most of your reading this week was AI slop (which is pretty likely given the state of the internet), you trained your judgment on machine output. […] Each accepted sentence is a small vote for a lower standard. You accept a vague phrase because the deadline is close. You let a soft claim through because rewriting from scratch would cost an hour you don’t have. […] Taste dies slowly, when you wake up someday and read something you wrote six months ago and realize you used to sound so different. You had edges, took risks, made claims, and you sounded like a person who made choices.

Kiraki gets at the failure mode that follows once taste is the only real moat left.

Her counter:

Read work that operates at a higher standard than yours. Work where someone made choices you wouldn’t have made or took risks you would have edited out. Your taste calibrates upward when you expose it to judgment that outclasses your own. […] Practice the explanation. When something in your own work feels wrong, write down the reason. The specificity of your explanation is the weapon. […] Ship work that makes you nervous. If a piece feels comfortable to publish, you probably didn’t push hard enough. The pieces that make your stomach tighten show and prove your taste is working at full capacity.

The middle one is what that newsletter was about: writing down the reasons, not just the verdict. Kiraki adds the bookends. Read above your level so your baseline isn’t drifting toward consensus. Publish the version that scares you a little, because the version that doesn’t is the gingerbread.

Featured illustration for Mia Kiraki's Substack essay on protecting taste in the AI era.

How to bulletproof your taste in the age of AI

How AI output erodes your editorial judgment, four diagnostic prompts to measure the damage, and the only protection that really, truly works.

open.substack.com iconopen.substack.com

Talking to Peter Yang, Ravi Mehta—former CPO of Tinder, now teaching AI prototyping at Reforge—walks through a live demo of building the same Spotify-style genre page three different ways. The first attempt uses a short functional prompt and produces something that, in Mehta’s words, kind of feels like AI slop. The third uses what he calls a full-stack context bundle: a functional spec, a 20-minute Figma wireframe, and a JSON file of real album data pulled together in Claude with an MCP server. The output is night and day.

His definition of the shift:

Context engineering is designing and building systems that provide an AI model with the right information and tools to accomplish the task. And I think a lot of the common mistake I see with prototyping is people don’t think about context within that 360 degree way. And as a result, people just, you know, write a quick prompt or a quick little mini spec and expect the prototype tool to be able to create something as high fidelity as what they used to create before when they had all of these different artifacts that are a critical part of the product lifecycle.

That definition will sound familiar to anyone who saw Philipp Schmid’s framing of context engineering when it first circulated. Same emphasis on “right information and tools.” It’s the working definition the field has settled on. What Mehta adds is the concrete answer to “okay, what are the three things you actually have to assemble?” Functional context (a spec), visual context (a wireframe), and data context (real structured JSON, not lorem ipsum). Skip any of them and the prototype either looks generic, behaves wrong at edge cases, or breaks suspension of disbelief the moment a real customer touches it.

The piece I want to underline is his defense of visual thinking, because the “designers are obsolete” takes haven’t stopped, and Mehta gives them a clean rebuttal:

So if you start to think differently about the different types of context that are available, you can actually get much more specific and have a lot more control over what gets built and build something that’s a lot more robust. This is functional context. The next level that is really important is visual context. […] And so here, I very quickly in Figma, just taking 20 minutes, done a wireframe, and sort of outlined what I want this interface to look like. […] The prototype needs to have a level of fidelity that’s hard to get with sort of traditional prompting techniques.

Twenty minutes in Figma, then a short prompt that says “use the attached wireframe.” A wireframe does what a 17-page PRD and three rounds of trying to describe a layout in English to the model can’t. The wireframe is part of the input to the deliverable now.

The corollary cuts the other way too. If the wireframe is now an AI briefing document, the people who can produce a decent one in twenty minutes have a real edge over the people who can’t. That’s still designers, still us. It’s just that the wireframe now feeds the model directly, not only the engineer reading the spec next sprint.

Everything You Need to Know About Context Engineering in 40 Minutes

Ravi Mehta builds the same Spotify-style page three times to show how functional spec, visual wireframe, and real data each level up an AI prototype.

youtube.com iconyoutube.com

I wrote about this whole family of files in my recent newsletter: DESIGN.md, SKILL.md, SOUL.md, the markdown artifacts you write so an agent can read them. Nick Babich has the practitioner walkthrough for the DESIGN.md flavor of it, specifically the version that Google Stitch reads when it generates a screen. He describes the format directly:

DESIGN.md is a markdown file with two layers: YAML front matter that contains machine-readable design tokens (exact hex values, font properties, spacing scales) and Body that features a human-readable design rationale.

The two-layer split is right. The YAML is the part the agent can’t argue with: primary: "#d97706" is #d97706. The body is where you tell the agent why, and it has to be written like prose, not a config file. Babich’s philosophy section is where I’d point a designer who’s about to write their first one:

Unlike a traditional specification that often has very specific details that designers should follow when crafting a new design, DESIGN.md is less prescriptive in its nature. It creates a solution foundation for AI tools (colors, typography, corner radius) while providing enough freedom to alter the format for domain-specific needs. Another thing is that DESIGN.md is a living artifact, not a static config file. It should evolve as your design evolves.

The “less prescriptive” line is counterintuitive. You’d think the whole point of feeding rules to an agent is to be more prescriptive, not less. But Babich is right about the shape: pin down the tokens, leave the application loose, refine the file as the agent surfaces edge cases you didn’t think about. These files hold what we used to keep in our heads and call taste, and you don’t write taste like a requirements doc. You write it like a brief, and you keep editing it.

Article header illustration for Nick Babich's UX Planet piece on the DESIGN.md format.

What is DESIGN.md and How To Use It

One of the biggest challenges with AI design generators is producing consistent output. Even with detailed instructions, AI can drift away from the spec.

uxplanet.org iconuxplanet.org

There is a small genre of data-visualization writing whose whole purpose is to give readers vocabulary for the moves bad-faith presenters make. Nathan Yau’s “Defense Against Dishonest Charts” is a must-read. It is interactive, taxonomic, and names its villains: the Damper, the Cherrypicker, the Base Stealer, the Storyteller. Read it once and you start seeing the moves everywhere.

Then watch Hank Green take apart a Reason video that argues climate change is real but not worth doing anything about. Green opens with the diagnosis:

This video is a master class in like very subtle manipulations. A lot of a lot of the internet is really brute force here. But this video is like just appears to be a calm, collected guy helping you understand the world better. […] But if you look closely, if you follow this closely, you see the subtle manipulations in a way that makes it so clear that this is bad faith and that he is making an argument, not because he believes it, but because he has an agenda.

For Green, calmness is the manipulation. Yau is mostly cataloguing chart geometry: axes truncated, scales squeezed, slices reordered. Green is cataloguing the verbal layer that wraps around the geometry. A steady tone primes you to trust a graph that’s doing the lying. He spends a long beat on how the Reason presenter introduces his experts:

We have Michael Mann who is a climate activist. And then we have Steven [Koonin] who is a theoretical physicist. […] Like you could say former oil industry executive Steven [Koonin]. You could say lead climate contrarian Steven [Koonin]. Like you could call Steven [Koonin] a lot of things and theoretical physicist is certainly one of those things, but you’ve picked which one you’re going to call him whereas you’ve picked what you’re going to call Michael Mann. Honestly, if you didn’t do these little things, I would believe that you believe your BS. But you do these little things and it makes it very clear that you don’t believe your BS. You’re trying to manipulate me.

Same person, two truthful titles, picked to do opposite work on the viewer’s belief. Yau’s taxonomy doesn’t cover that one. The dishonesty lives in the labels around the chart, in the small choices nobody flags because each one, on its own, is technically true. Same kind of move as a Cherrypicker, though, in a different layer of the presentation. The ratio-graph segment is closer to Yau’s home turf:

How would you correct for that? Well, you just take a ratio every year and see how that changes year-over-year. Oh my god, what an elegant solution. But apparently the only reason you could possibly do that is cuz you wanted to SHOW A SCARY GRAPH. WHAT ARE YOU TALKING ABOUT, GUY? HAVE YOU NEVER BEEN AROUND A GRAPH?

The dishonest move there is the inversion: a valid statistical adjustment, presented to the audience as proof of motive. The graph is fine. The taxonomy of the attack on the graph is what’s missing.

Read Yau first, then watch Green. You will have names for almost every trick the Reason presenter pulls, and a clearer view of the ones the field still needs to name.

A Masterclass in Manipulation

This video came across my feed recently and like I watch a lot of climate stuff on YouTube like I watch just have a think, I watch undecided with Matt Pharaoh, I watch climate Adam, I watch Zenuro. These are all channels you should watch that are trying in different ways to help people understand the very real and complicated problem of climate change and the very real and complicated solutions that we’re working on. So I think if YouTube is handing me this video, it’s not because it’s like a big video for the climate skeptics to get like really jazzed about. Like that’s a different kind of video. This isn’t that kind of video. So, I clicked on it in part because like it’s strange that YouTube surfaced it to me.

youtube.com iconyoutube.com

The terminal’s return as a serious surface for new tools (Claude Code, Codex, Omarchy) has mostly been read as a developer aesthetic story. Alcides Fonseca reads it as the receipt for thirty years of GUI toolkit churn. He walks the platforms one by one (Windows, Linux, macOS), then through Electron, then through the failed restarts (Google’s Flutter UI, Zed’s GPUI), and ends on TUIs as the place developers go when none of the layers above hold up.

Fonseca on macOS:

Apple used to be a one-book religion. Apple’s Human Interface Guidelines used to be cited by every User Interface course over the world. Xerox PARC and Apple were the two institutions that studied what it means to have a good human interface. Fast forward a few decades, and Apple is doing the best worst it can to break all the guidelines and consistency it was known for.

This isn’t a nostalgia complaint. Fonseca lists the live breaks (Fitts’ law getting ignored, the Tahoe window-resizing saga that didn’t stay fixed, the icons cluttering Apple menus) and treats them as the same class of failure as Microsoft’s WinForms-WPF-Silverlight-WinUI-MAUI parade. The mechanism differs but the outcome is the same: the platform stops being a place a designer can rely on.

Fonseca on Electron:

Looking at my dock, I have 8 native apps (text mate and macOS system utilities) and 6 electron apps (Slack, Discord, Mattermost, VScode, Cursor, Plexampp). And that’s from someone who really wishes he could avoid having any electron app at all. […] These are actions that should be the same across every macOS application, and even if there are shortcuts, they are not announced in the menus.

The dock count is the right way to measure it. RAM is the visible cost of Electron; the invisible cost is that every Electron app becomes its own little keyboard regime, with shortcuts that often don’t match the rest of the system and aren’t announced in menus when they do exist. Fonseca’s Cursor example (can you keyboard from the agent panel to the agent list and archive an item) is the kind of question any pre-Electron Mac app would have answered yes to. Most Electron apps answer maybe, with a shortcut their vendor invented.

His prescription that follows (make HCI mandatory in CS curricula, fail student projects with bad UIs, push OS vendors to invest in toolkits developers want to use) is correct in shape and probably wrong about leverage. Students aren’t the bottleneck. Apple and Microsoft have already read Norman. TUIs are back because the platforms quit, and the curriculum can’t fix that.

Fonseca’s diagnosis is right. The prescription is narrower. The TUI escape hatch works for developers because their work is text. Designers don’t get the same exit when the canvas is the medium itself.

Bonus: Speaking of TUIs, TUIStudio is a macOS app for designing terminal UIs, just like Figma!

Linux desktop split between a terminal showing an `ls` directory listing, a lazygit interface with recent commits, and btop system monitor displaying CPU, memory, disk, network, and process stats.

Why TUIs are back

Terminal User Interfaces (TUIs) are making a comeback. DHH’s Omarchy is made of three types of user interfaces: TUIs, for immediate feedback and bonus geek points, webapps because 37signals (his company) sells SAAS web applications and the unavoidable gnome-style native applications that really do not fit well in the style of the distro.

wiki.alcidesfonseca.com iconwiki.alcidesfonseca.com

In product orgs, the word “autonomy” tends to get attached to seniority and titles. Sara Paul, writing for Nielsen Norman Group, puts the bar somewhere else:

Our research shows that autonomy is about becoming sufficiently informed to credibly shape shared product decisions.

You’ve earned design autonomy when you’ve collected enough context to make a recommendation that holds up under scrutiny. Until then, you haven’t. Low-autonomy designers, in Paul’s terms, “execute predefined solutions.” High-autonomy designers shape what gets prioritized, because they know things their stakeholders don’t.

The four-part pipeline is the practitioner half:

The designers who achieved high autonomy kept information flowing to them from all sources within their organization. Their pipelines consisted of four parts: (1) Gathering information from across teams and channels, (2) Building relationships with people who provide information, (3) Creating crossfunctional spaces for information to be shared, (4) Synthesizing information to form a “big picture” of context that empowered credible recommendations.

Paul’s examples are specific enough to put to use. The opening one is a lead designer at an online review platform whose ad-setup experience lived across mobile, desktop, and web. Three teams owned different parts of the experience and the whole was nobody’s job. Here’s how the story ends:

She saw the problem, took the initiative to gather the information she needed, and synthesized it into a recommendation that boosted her influence over what got built. This is design autonomy.

None of this required a new title. It required a tracker, a few standing meetings, and the willingness to do the synthesis work nobody assigned.

The designers I want—and have—on my team are the ones who can fill in for a PM when they’re on vacation. Paul’s article is the mechanism for getting there. The PM-shaped skill is holding the information context that lets you make a defensible call.

Title card reading "Boost Design Autonomy with an Information Pipeline" from NN/G, with six icons illustrating documents, collaboration, scheduling, workflows, UI review, and process pipelines.

Boost Design Autonomy with an Information Pipeline

A four-step framework for building influence over product direction by closing the information gaps that large, complex organizations create.

nngroup.com iconnngroup.com

Alex Dapunt, VP Design and Brand at Moonfare, opens with a research session in which a senior client laid out exactly what to build next, with the roadmap, rationale, and feature list ready inside a minute. The client was wrong, Dapunt writes, but not because he was stupid. He was wrong because he had been asked the wrong question and his instinct was to answer it anyway.

The smarter your users, the more convincing their wrong answers. A user says they want ice cream. While they say they want ice cream, what they need is to cool down. Their body wants sugar. It’s hot. There’s a memory somewhere in there, a summer ritual, something cold in their hand. The want closes off options. The need opens them. Take “I want ice cream” at face value and you sell them ice cream. Understand the need and you can sell them a popsicle, a cold drink, air conditioning, a swim in the sea.

The want-versus-need split is older than this piece. Dapunt credits Jared Spool for it. The part Dapunt adds is about who tends to give you the worst version of a want. He argues the failure intensifies in premium and B2B contexts, where the people you most want to talk to are the people most trained to produce confident answers under pressure.

The Moonfare client wasn’t an outlier. I think a lot about why this happens. Part of the answer, I think, is that the people we were interviewing had been trained, explicitly, to produce answers. At Bain, where I spent time earlier in my career, the core discipline is what’s called the answer-first approach, or the A1. You lead with the answer. Then you work backwards. […] It’s a disastrous way to sit in a research session as a user. An executive trained that way walks in and the instinct takes over. They feel the absence of an answer as pressure. They want to be useful. They want to look smart. They give you the A1, and it’s precise and articulate because producing precise articulate answers is what they are paid to do.

Dapunt’s observation about ambiguity is worth carrying into the next interview transcript you read. When a regular user says “I dunno, maybe?” he argues, the fuzziness is signal that the question is wrong. The executive doesn’t give you that signal, so you have to know to discount the clarity.

Dapunt then turns the same lens on metrics. His version of the metrics-as-avoidance failure mode is more specific: the wrong moment, not just the wrong number.

At Moonfare we tracked logins. More logins looks good on a dashboard. Looks like engagement. But private equity is a 5-to-10 year product. For most of that time nothing is supposed to happen. […] The right moment isn’t a platform question. It’s a life question. When does this person have cashflow? When’s bonus season? What does their portfolio look like right now, and is there a product we offer that fits the gap? The real need isn’t log in more. It’s be present when a decision is being made. Five well-timed touchpoints in a year beat fifty random ones.

The piece closes on the part of research practice that gets least attention.

Research is intake. You take it in. You synthesise. Then someone has to make the call and own it. […] In practice I’ve watched it produce three biases averaged into a consensus nobody owns. Someone has to own the interpretation. It can be a researcher, a designer, a founder, a PM. But it’s one person’s job, and it comes with the accountability for the call that follows. The alternative is research-as-stalling.

Dapunt is careful here. He likes continuous discovery, he likes the product trio in theory, and he is not making a contrarian case against any of it. His point is narrower. A team can run all the right research rituals and still end up with a process whose actual function is to ensure no single person has to take responsibility for being wrong.

dir14" text overlaid on a medieval-style painting depicting a crowd of figures in colorful robes gathered outdoors near a castle.

Users own the present. You own the future.

A few years ago I sat in a research session at Moonfare. Since private equity is a premium product, our clients are mostly C-level executives, founders or people who have spent decades being the person in the room with the answer. He was one of them.

dir14.com icondir14.com

Emil Kowalski, a design engineer at Linear, takes the case for designers who can articulate why a choice works one step further. Once you can explain it, you can hand the rule to an agent.

An engineer has never been more leveraged than today thanks to a fleet of agents. But when it comes to more visual work, like animations, coding agents don’t quite know what great feels like.

My way of getting there is to create a skill file for each aspect of the interface. If you know what great feels like, describe the rules, then give them to your agents so they can follow them.

Kowalski shows two animations side by side, one scaling from scale(0) and one from scale(0.95), and walks the reader from “this feels right” to a real-world reason why:

With enough experience, you can not only tell what feels better, but also why. By then you’ve not only built your taste, but also the ability to articulate it.

The correct animation below feels right, because it animates from a higher initial scale value. It makes the movement feel more gentle, natural, and elegant.

scale(0) on the left feels wrong because it looks like the element comes out of nowhere. A higher initial value resembles the real world more. Just like a balloon, even when deflated it has a visible shape, it never disappears completely.

This is what Ian Guisard at Uber does as a design systems lead: encoding expertise, writing agent skills, defining validation rules, deciding what “correct” means. Nick Babich’s piece on agentic product design covers what makes an agent an agent; Kowalski’s piece shows what an agent actually runs on.

That’s the why. There’s no magic involved. Almost every “taste” decision has a logical reason if you look close enough. This applies to any other discipline really.

Of course the more creative part of the job is still up to you, but the more you can package into a skill, the more leverage you can get out of your agents.

Bold text reading "Agents with Taste" on a white background.

Agents with Taste

How to transfer taste into an AI.

emilkowal.ski iconemilkowal.ski

PJ Onori built a tool that A/B tests his design system against AI agents, and he’s careful to say it isn’t impressive:

Two groups of agents get spun up, and both are given the same prompt to make an interface. One group’s given the old design system. The other is given our new one. Each agent provides feedback on problems faced after it’s done. Once all agents finish, the builds are evaluated on a bunch of crap and a report is generated.

The list of what the tool measures is long: timing, lines of code, code variance, fix attempts, components used, accessibility, performance, inline styles, visual diff, token usage, agent feedback. Onori, on the test he ran when he wasn’t sure his documentation was actually doing the work:

I was starting to question if documentation was making things better. Maybe component improvements was doing the heavy lifting–who knows? So, I ran a couple tests without documentation… The documentation was clearly the heavy lifter. […] Documentation is essential for systems that agents don’t have a lot of reps with. I’ve started to add a “For agents” section in the docs. That section is the dumpster for “get it in your silicon head” training.

The “For agents” section is a small idea with a real implication. Documentation has historically been written for one audience. Now there are two, and as Onori says elsewhere in the post, the second one needs “the same damned point” repeated five or six times and doesn’t care if the prose is ugly. His instinct is to wall that off so humans don’t have to read it.

Onori is publishing measurements where most people are publishing takes. That’s the missing piece in the design-system-as-moat argument: somebody actually testing whether agents do better with a well-built system than a worse one, and showing the numbers. Onori, on the closing caution:

There’s a lot of noise in the output, feedback, and analysis–otherwise know as everything. That noise compounds fast. Think of the telephone game–then think about what that’d do to a design system. […] Feedback needs to go through a BS filter. […] The feedback part of the analysis is helpful. Make no mistake. But it needs to heavy interpretation.

The telephone game is the right picture. A design system that updates itself based on agent feedback that’s been generated by other agents and analyzed by a third agent is going to drift somewhere strange in a small number of iterations, and nobody on the team will be able to reconstruct why. Onori’s tool stops short of that on purpose: it produces measurements, and a person reads them.

Stippled illustration of a person sitting at a desk, leaning forward and writing or working on something.

Testing agents on design systems

It’s really easy to say agents are able to use a design system. It’s another thing to prove it.

pjonori.blog iconpjonori.blog

Marcus Moretti’s guide to agent-native product management, in Every, is the orchestration shift showing up on the PM side of the team. The guide opens with the 1930s Procter & Gamble origin story: someone owns the product. The job has been rewritten so many times since then that PMs are now expected to be design partners, diplomats, sales people, and statisticians on top of running the 100+ software subscriptions the average company buys. What’s interesting is that the piece is describing the old role, finally legible again now that agents can absorb the administrative debt that piled up on top of it.

Now, much of the interdisciplinary work that goes into product management can be done by an LLM in minutes, sometimes seconds. What used to be a three-hour-long analytics investigation is now a simple back-and-forth with Claude. A product review that used to be a fortnightly chore emerges from a single typo-ridden chat message. This has been my recent experience, at least. I no longer struggle with semicolons in SQL queries or even write tickets. All of my product management work happens in conversation with, in my case, Claude Code. The conversation is the work.

“The conversation is the work” sounds like a description of the new job. Read it next to the 1930s origin story and it’s a description of the old one. The Brand Man at P&G wasn’t writing SQL; he was deciding what the product should be and who it was for. The intervening ninety years of accumulated tooling—agile ceremonies and ticket hygiene, analytics dashboards on top of those—was friction PMs had to push through to get back to the actual work. Moretti’s /ce-strategy command, modeled on Richard Rumelt’s Good Strategy Bad Strategy, isn’t a new artifact either. Strategy documents predate LLMs by decades. What’s new, Moretti says, is the cadence: every few months, the agent re-runs the strategy interview with the accumulated context of everything you’ve shipped.

Writing a strategy document cold is hard. The best way to do it, I’ve found, is to have an agent interview you. The ce-strategy skill does this. It runs through the sections in order and has built-in guidance about what makes a good answer (and what kinds of answers to push back on). […] The interview is deliberately conversational. If the first answer to, “What’s the core problem this product solves” is vague, the agent drills down: “Whose situation specifically? What do they try today, and why doesn’t it work?” The guidance here is taken from personal experience and from the Rumelt book.

The guide assumes a PM who has the taste to recognize when the agent’s follow-up has exposed a gap. The ones who don’t will end up with a strategy.md full of confident-sounding nonsense, generated quickly and reviewed lightly. Agent-native PM removes the alibi that you were too busy with tickets to do the actual thinking. That maps to a warning from Raj Nandan Sharma: when generation gets cheap, the scarce skill is refusal: knowing what to throw out and why. Moretti’s PM is doing exactly that, sentence by sentence, in the strategy interview.

Moretti closes:

LLMs have allowed our tools to catch up with the multifaceted duties of product managers. For me, product management has been reduced to the interesting parts: dreaming up features, thinking through designs, looking at interesting data, and talking to users. We all feel the economic imperative to embrace AI tools, but the better reason, I think, is to make work more fun.

Hand-drawn letter "G" in black chalk-style script on a light blue background, with a black bookmark icon in the top-left corner.

A Guide to Agent-native Product Management

A step-by-step guide to using agentic capabilities for better product management

every.to iconevery.to

Nick Babich on agents in UX Planet. A useful pair to his earlier writeup on Claude skills, since the two words get used interchangeably and they are not the same thing. Babich opens with the plain-language version:

Think of an AI agent as a program you run when you need to solve a particular problem in design. For example, you can create an AI agent that helps you with usability testing, code review, UI/UX audit, etc.

A program you run is the right mental model. A skill, the way Babich described it in his earlier piece, is a recipe: a markdown file Claude reaches for when a task matches. An agent is what runs once Claude has the recipe in hand. It carries state across steps, picks tools, reports back.

Babich’s four attributes of a well-designed agent get at that distinction without saying it out loud:

  1. Good clarity (intent alignment). A strong agent understands what success looks like, not just the task. This understanding helps it translate vague prompts into clear objectives.
  2. Context awareness. Good agents maintain and use context effectively. Not only do they remember previous steps, constraints, and user preferences (which is well-expected behavior nowadays), but they also adapt output based on the environment (tools, data, stage of workflow).
  3. Tool orchestration. Agents can perform the workflow autonomously and they have the ability to use the right tools for a task at hand is what makes an agent so powerful. Well-crafted agents can chain tools together into workflows, and they don’t overuse tools when simple reasoning is enough.
  4. Explainability (transparent reasoning). When you interact with an AI agent, you need to understand why something happened. Thus, an AI agent should provide a rationale behind decisions surface assumptions, and trade-offs.

Context awareness and tool orchestration are what separate an agent from a prompt template. A skill can ship intent alignment and explainability in plain markdown, but state across steps and the ability to chain tools require a runtime. That’s why Babich’s specs include Boundaries sections and “When Not To Use It” blocks: a stateful, tool-using program needs guardrails that a one-shot prompt does not.

If you haven’t built one yet, his five specs—Research Synthesizer, Competitor Intelligence, Problem Definition, Idea Generation, UX Flow Designer—are a clean starter pack. Pick the one closest to a workflow you already do by hand, and notice how much of the spec is about what the agent will not do.

3D illustration of an orange robot head with a maze inside its open skull, glowing circuit lines extending outward to orange cube nodes.

Agentic Product Design

5 design tasks you can automate with AI today

uxplanet.org iconuxplanet.org

Tommy Geoco’s $13,100 OpenClaw harness, ninety days in, is one way to build a personal AI agent. Anton Sten went the other way. He tried OpenClaw and Hermes, found the setup was “days, sometimes weeks, for minutes of return,” and built something smaller. Five Claude Code instances on a Mac mini, named after Suits characters, each handling one role. Architecture is a shared repo and a pile of markdown files. That’s it. Most AI-agent posts pitch what Sten calls “a team of bots that runs your business while you sleep.” His basement firm is the inversion.

Sten on what he actually wanted from his agents:

What I actually wanted was smaller. A handful of tools, each with a narrow job, that I could build in an afternoon and shape around how I actually work. So that’s what I did.

The names of his AI agents are from the show Suits (with Wendy borrowed from Billions), picked so the show’s personalities double as memory aids for each agent’s job. Harvey handles contracts and pricing. Donna takes Harvey’s notes and drafts the emails and follow-ups. Mike stores what Sten would otherwise forget. Louis worries about money. Wendy reads the others’ logs and points out where they’re slipping.

Sten on the autonomous-revenue pitch:

The team in my basement isn’t running anything autonomously. They don’t make decisions for me. If I unplugged the Mac mini tomorrow, my business would keep running. The conflation in the current AI conversation — between playing and building a thing that prints money — is the part I find a bit tiring. They’re treated as the same activity, when they’re almost opposites.

Sten’s right that the autonomous-revenue pitch is a fantasy. Less right on the binary that follows. Geoco’s harness is doing meeting prep, ingesting his survey research, and distributing his content across ten platforms while he sleeps. That counts as “running while you sleep,” and his $50,000 in sponsorship revenue from one survey project isn’t trivial. Play and revenue can sit on the same side. What matters is whether the human stays in the loop. Geoco does, and so does Sten.

The shape of what they’re building is also the same. The Harvey-to-Donna handoff Sten uses most and Geoco’s survey-prep loop are both the specialization-is-the-whole-game pattern: narrow specialists, human in the loop, work compounding into the system. Sten calls it play and Geoco calls it work. The architecture underneath does the same job either way.

Sten on practice:

I’d argue this is the business case for designers right now. Not the agents specifically — the playing. Because in a year or two, every job worth having is going to assume you understand how these tools work, and the only way to understand them is to spend time in them when nothing’s on the line.

The people who’ll do interesting work with this stuff in two years are the ones playing with it badly today.

Geoco is what Sten’s last sentence predicts. The person playing badly today is the person doing interesting work in two years. Sten describes that person as hypothetical. Geoco isn’t.

The basement firm

There’s a Mac mini in my basement running a small consulting firm. Five employees, all named after TV characters, none of them human. They take notes, write drafts, remember things I’ve forgotten, argue with my financial instincts, and occasionally tell each other to do better.

antonsten.com iconantonsten.com

Tommy Geoco spent ninety days and $13,100 tinkering with OpenClaw. His agent runs his capture loop, prepares his meetings, codes the survey for the state-of-prototyping report his studio shipped, and distributes his content across ten platforms. Tom describes the harness like this:

When you install OpenClaw, it is like a starter kit project car. It is a car frame with a swappable engine. The engine being any AI model you choose to use. It is basically a folder that you install onto your computer that contains about seven markdown files. […] When you stop thinking of a custom agent as just a chatbot and start thinking of it like an operating system, some useful questions are going to start to pop up like where does the memory live? What is the source of truth? How do I enforce my rules better? What should stay manual?

The seven files are plain text. soul.md holds the agent’s voice and judgment, agents.md defines permissions, memory.md handles long-term recall, and four others cover identity, the user, tool instructions, and a heartbeat. Tom layers an Obsidian vault on top as long-term knowledge and Slack as the chat surface. Tom on what actually limits an agent:

The agent’s limitations aren’t just about the model. They’re a lot more about the system that you have built around it because you can’t control the quality of the model, but you can control the quality of the system. […] The most important part of my setup is the knowledge vault. This is my alternate memory, and it is built around the work that I actually do.

Geoco says curation is what keeps the whole thing from drifting. The agent runs the loops on top of a vault Geoco curates, and the taste lives with him; the model itself is interchangeable. The challenging part is somewhere else entirely:

The most challenging part of this whole thing is the unlearning. Many of us have old habits that have calcified into our brain. It is why my 17-year-old is able to run laps around us. He has no baggage about how things are supposed to work.

Geoco is right that the unlearning is where the difficulty lives. The harness is just markdown and the model is rented; the orchestration skill Benhur Senabathi described as what designers actually picked up in 2025 is what you practice through the unlearning. Geoco closes the video by saying nobody’s harness is right and everybody’s works for them, which sounds about right to me too.

How I Built an AI Agent That Designs Like Me

This is a practical breakdown of what an OpenClaw agent is, and how I use it for my design and media studio.

youtube.com iconyoutube.com

George Anders, in the Wall Street Journal, makes the case that the 1920s offer a usable template for the AI decade. His strongest evidence is the spillover-jobs data:

By 1930, more than 80,000 people were working as electricians, a profession that hardly existed a decade before. Census data also showed that 168,000 people were working in rubber factories, most of them making tires to accommodate Detroit’s booming production of cars, trucks and buses. Another 450,000 people were building roads, bridges and other structures needed by the ever expanding auto industry.

The ATM parable had the same problem: the version that ends in 2010, with bank-teller employment intact, is the one we love to retell. The version that ends in 2022, with teller jobs cut in half by the iPhone, is the one we leave out. Anders’s 80,000 electricians are real. So is the question of which of them got displaced when the next technology arrived.

Anders does, to his credit, take the costs seriously. He spends a section on the radio fight:

In 1927, H.G. Wells, the British author and intellectual, called radio “inferior” entertainment that should be listened to “only by the sick, the lonely and the suffering.” David Sarnoff, general manager of Radio Corp. of America, shot back that he was trying to improve “the happiness of the nation” by delivering popular music to millions of people. Nearly a century later, that same argument still flares, though now it is more likely to involve TikTok, Reddit or YouTube, instead of dear old radio. The doubters always have a point; with the passage of time, the innovators usually win out.

The early evidence on AI’s job-creation side is thinner than the 1920s comparison flatters: Anthropic’s own researchers find a 14% drop in the job-finding rate for 22-to-25-year-olds in exposed occupations since ChatGPT launched, even as overall unemployment holds. The new electricians of our decade may exist. They just may not be the people getting hired right now.

The safety side of Anders’s case is the one I want to see more of. Cars in 1920 killed at twenty times today’s per-mile rate, and the country chose not to live with that:

Auto safety got better, too, with both industry and government taking action. Better mirrors, better brakes and shatterproof windshields became standard. Cities such as Los Angeles and Detroit installed red-yellow-green traffic lights that governed drivers’ actions on busy streets. New Jersey became the first state to insist on driver’s licenses, with the state’s motor-vehicle commissioner in 1924 declaring: “It is an absolute necessity to do this in order to conserve human life.”

Whether the next century treats our decade as kindly depends on whether we put rearview mirrors and traffic lights on AI before the death rates make us, and whether we do it under the same kind of duress the 1920s did.

Vintage black-and-white photo of an early automobile displayed in a storefront window with bold striped decorations and a sign reading "Auto Show Jan. 19-25 Auditorium Milwaukee.

What the 1920s Can Teach Us About Surviving the AI Revolution

(Gift link) A century ago, cars and radio upended society just as AI is doing today.

wsj.com iconwsj.com

Jake Albaugh wrote a piece on X called “Design is the work” that splits design from the artifacts it produces. Mocks, prototypes, screens, guidelines: those are outputs. Design itself, in his telling, is the upstream act of intent: figuring out what something should be and why, before anyone makes it. Bingo. That distinction matters now because AI is very good at the artifact and unable to do the deciding:

AI cannot do that part. You intend to do something that has not yet happened. You have to bring those parameters to the table to do anything novel. AI doesn’t know your constraints. It doesn’t know your strategy. It doesn’t know what moment in the market you’re in, what your team is trying to prove, or what your customers actually need versus what they’ve said they want. The expectation — the definition of what good looks like — is something only you can provide. AI’s job is to meet that expectation. Not to define it.

The piece made the case that intentionality has to come before execution and that AI changes neither requirement. The closer is where it gets interesting. After all that, Albaugh tells the reader he used AI to draft the essay:

It may surprise you to learn that I used AI to write this. The structure, the sentences, a lot of the phrasing — generated. But the argument existed before any of it. I knew what I was trying to say. I knew what examples mattered and which ones were wrong. I knew when a paragraph was close but not quite right, and I revised toward a target I’d already defined. […] That’s the point. The tools changed. The work didn’t. Design is the process. Design is the intentionality.

It’s a risky reveal. Most readers will read it as self-undermining at first. But the argument and the artifact are doing the same job: Albaugh had a target, and he used AI to reach it. The fact that the prose was generated is exactly why it matters that the argument wasn’t. He knew which examples belonged in the piece and which ones to throw out. The model couldn’t have known that either way, because the criteria for “good” didn’t exist anywhere outside his head until he wrote them down.

Karri Saarinen made a version of this same split when he argued that output isn’t design. The hard part is understanding the problem well enough to know what should exist at all.

A presenter stands on stage in front of a green slide reading "What should be automated? What should be left to touch?

Design is the work.

We’re in a moment where it has never been cheaper or faster to build something convincing. The cost of taking an idea and making it look real, feel functional, or seem finished has collapsed. That is genuinely good news if you already know what you’re building and why. It’s dangerous if you don’t.

x.com iconx.com

You’ve seen it in all the photos from various No Kings protests. The most-shared peace poster of the year did not start with a client. Daniel John, writing in Creative Bloq, traces Warsaw calligrapher Barbara Galińska’s two-tone “STOP WAR” piece. Galińska, in an artist statement quoted by John, describes where it came from:

My graphic “STOP WAR!” was created as a result of an international calligraphy challenge in November 2023, the main goal of which was to stimulate the creativity of artists. My reaction to the assigned theme “Stop war!” was to move away from typical calligraphy towards a powerful work that addresses the global problem of war. So my main personal challenge was to find a new, original solution to the well-known phrase “Stop war!” and transform it into a graphically powerful universal symbol for peace.

Bold typographic print reading "STOP WAR!" in red letters on a black background, signed by Barbara Galichan, numbered 1/25, displayed against a concrete wall.

Who’s behind the striking ‘Stop War’ poster that’s all over social media

The iconic typographic design is striking a chord.

creativebloq.com iconcreativebloq.com

Matt Ström-Awn, writing on his personal site, picks up a three-year-old line from Ted Chiang and turns it inside out:

Three years ago, Ted Chiang described ChatGPT as a blurry JPEG of the web. LLMs are a lossy compression of their training data, which is itself a lossy sample of all the data available to it. But the artifacts we see in AI slop aren’t in the compression. They’re in the decompression.

Every AI-generated output is an extrapolation from that blurry source, vectored toward your prompt, filling in plausible detail where the compression threw information away. The output gets inflated into blog posts and LinkedIn thoughtspam, software platforms, omnichannel advertising campaigns, and movie cameos from dead actors. Chiang compared the gaps and confabulations to compression artifacts.

I think they’re expansion artifacts.

Chiang had the compression metaphor; what we needed was a word for what these tools do on the way back out, and Ström-Awn gave us one.

Ström-Awn lists what expansion artifacts look like across modalities:

  • LLMs produce text stuffed with hedging verbs and fuzzing adjectives (delve, intricate, tapestry, multifaceted). Their paragraphs are structured as miniature essays with setup, payoff, and a signposted takeaway (This matters because…).
  • AI-generated code over-comments the obvious and creates error handlers for operations that can’t logically fail.
  • Image generators have had their own tells: six-fingered hands, symmetrical-but-stylistically-objectionable jewelry, text that looks like text but only if you cross your eyes.
  • Video models struggle with continuity. Limbs appear and disappear, objects clip through each other, and physics sometimes just switches off.

Each of these artifacts is the training distribution leaking through where the model’s confidence runs thin.

Ström-Awn writes about the designer-specific tells too:

Power users of AI website generators (AI-pilled designers) already know how to recognize the tool marks, if only to try to prompt them away: purple gradients are an especially common tell. But as more and more non-designers use tools like Claude Design to prompt their way to fully-functional software products, I expect to see a preference for the aesthetic convergence endemic to the current crop of AI models.

Matt Ström-Awn website header showing the page title "Expansion artifacts" in large bold text on a white background.

Expansion artifacts

Matt Ström-Awn · Designer, leader, and coach focused on building exceptional products and teams.

mattstromawn.com iconmattstromawn.com

Cat Wu, Anthropic’s Head of Product for Claude Code, describes the hiring filter on her team in her interview with Lenny Rachitsky:

I think all of the roles are merging. PMs are doing some engineering work. Engineers are doing PM work. Designers are PMing and also landing code. You can either hire a lot more engineers who have great product taste, or you can keep your engineering hiring the same and hire a lot more PMs to help guide some of their work. On our team, we’re pretty focused on hiring engineers with great product taste. This way we can reduce the amount of overhead for shipping any product. Like there are many engineers on our team who are fully able to end to end go from see user feedback on Twitter through to like ship a product at the end of the week with almost no product involvement. And this, I think, is actually like the most efficient way to ship something. So I think like engineer and PM are kind of overlapping and you will get a lot of benefit from having more of either. I think product taste is still a very rare skill to have and we’ll pretty much hire anyone who we feel has demonstrated this strongly.

This is what the Full Stack Builder pattern looks like as a hiring filter. The headline is the merging of roles. Wu’s own background says where the bench comes from:

Yeah, I was an engineer for many years. I was then a VC very briefly before joining Anthropic. And actually almost all the PMs on our team have either been engineers or ship code here on Claude Code. And so that’s one of the things that I think helps build trust with the team and also just enables us to move a lot faster. And then actually our designers also have been front-end engineers before.

So to be clear, Wu doesn’t say that the roles have merged, but what she’s describing is the continued blurring of lines.

How Anthropic’s product team moves faster than anyone else | Cat Wu (Head of Product, Claude Code)

Cat Wu is Head of Product for Claude Code and Cowork at Anthropic, building one of the most important AI products of this generation. Before joining Anthropic, Cat spent years as an engineer and briefly worked in VC. Today, she’s interviewing hundreds of product managers who are trying to break…

youtube.com iconyoutube.com

Maggie Appleton, staff research engineer at GitHub Next, wrote up her recent talk on agentic AI productivity. (Video here if you’d rather watch.) Her central claim comes early:

I call it this “one man, a two dozen claudes” theory of the future. The pitch here is that one person with a fleet of agents will do the work of an entire team of developers. The main problem with this dream is it assumes software is made by one person. All these tools are single player interfaces. […] Software is not made by one person in a vacuum. It’s a team sport. Everyone building it needs to agree on what they’re building and why.

The single-player critique is the missing piece in most AI productivity takes. Most demos of a coding agent show one engineer at a terminal. Designers face the same situation with AI prompt-to-code tools. Collaborating isn’t as easy as sharing a Figma link. That’s the actual gap in current tooling, and it’s downstream of the single-player assumption.

Appleton’s second move:

Implementation is rapidly becoming a solved problem, right? Writing code is now fast, it’s getting cheap, and quality is going up and to the right. The hard question is no longer how to build it. It’s should we build it. Agreeing on what to build is the new bottleneck. […] When production is cheap, opportunity cost becomes the real cost. You can’t build everything, and whatever you pick comes at the cost of everything else.

When production is cheap, picking what to make becomes the whole job. The cost difference between two engineering paths is now nearly zero, so the choice between them carries all the weight. Teams that miss this will end up shipping volume and mistaking it for productivity.

A talk like this could be about tooling, and Appleton does walk through Ace, GitHub Next’s prototype multiplayer workspace, in some detail. But the more important argument is about what you do with the hours you free up. Going faster is not the prize. Appleton:

We have an opportunity to not just go faster and build a giant pile of the same crappy software. But instead to make much better software through more rigorous critical thinking and better alignment in the planning stage. By doing more exploration, more research, and thinking through problems more deeply than we could have before.

The reclaimed hours are an opportunity, but they are also a test. Do you spend them shipping more, or do you spend them shipping better? The first answer gets you the giant pile. The second takes work the agents cannot do for you.

Appleton closes on craft:

Many people are now realising that in a world of fast, cheap software, quality becomes the new differentiator. The bar is being set much higher. Craftsmanship is what will set you apart from the vibe-coded slop. But craft still costs time and energy. It is not free, and in order to buy the time and energy for it, you need to do fewer things better, which requires strong alignment.

Title card for "One Developer, Two Dozen Agents, Zero Alignment" — a talk about collaborative AI engineering and a tour of Ace, the multiplayer coding workspace.

One Developer, Two Dozen Agents, Zero Alignment

Why we need collaborative AI engineering and a tour of Ace: the multiplayer coding workspace

maggieappleton.com iconmaggieappleton.com

Andy Matuschak describes two accidental tyrannies that have shaped software for forty years: the application model that traps software in one-size-fits-all packages, and programming as a specialization that crowds out non-programmers from inventing interfaces. He thinks coding agents could break both, and he’s already seeing it happen with the designers he works with:

I’ve been seeing it. I spent 2025 collaborating with two talented designers. Their story with coding agents this past year has been truly wild. I think the impact on my collaborators has been much greater than the impact on me, despite the fact that I’m now building perhaps ten times the speed.

Unlike me, these two started their careers in design and spent their formative years in the arts culture. They can program a bit, but the process was really slow and difficult enough to pose a significant barrier. At the start of 2025, coding models could implement small one-off design ideas—but their outputs would just fall apart after a couple of iterations. By the end of the year, my collaborators were routinely prototyping novel interface ideas and sustaining that iteration across weeks.

“The impact on my collaborators has been much greater than the impact on me.” Matuschak is moving ten times faster, and he still thinks his designers are the ones whose careers just turned over. That observation is rare from the person on the receiving end of the bigger gain in raw output.

Matuschak’s diagnosis of why the old arrangement was such a trap for designers:

Non-programming designers are trying to invent something in an interactive medium without being able to make something meaningfully interactive. So much of invention is about intimacy with the materials, tight feedback, sensitive observation, and authentic use. So it’s a catch-22: to enter into proper dialogue with their medium, a non-programmer needs to get help from a programmer. That generally requires the idea to be at least somewhat legible and compelling. But if they’re doing something truly novel, they often can’t make it legible and compelling without being in that close dialogue with their medium.

The old design-engineering separation trapped designers in a less obvious way. They often couldn’t even tell whether their ideas were brilliant, because they couldn’t get their hands on the material to find out. You can’t iterate on a feeling. You have to push something around until it pushes back. For most of my career, designers did that pushing in flat mockups and click-through prototypes, working through dynamic behavior they had never actually felt. Of course the technical ideas fell short. The designers themselves hadn’t felt the thing yet either.

That’s the asymmetry coding agents collapse. The loop between “I have an inkling” and “I am tinkering with a working version of the inkling” has finally closed for non-developers. They still can’t and mostly shouldn’t ship production code, but they don’t need to. The prototype is enough to do the design work. Once the gatekeeping melts, the next question is institutional: where does the next generation of interface inventors come from? Matuschak’s answer:

So, what now? We’ve spent decades building HCI programs that mostly look like computer science departments with design electives. But if we’re moving toward a world where invention is bottlenecked more on imagination than on technical expertise, we may have that backwards. We may need programs that look a little more like art school with technical electives—learning to develop ideas from intuition before being able to express them precisely, to discover by playing with the material.

Title slide and content page from Andy Matuschak's MIT HCI Seminar talk "Apps and programming: two accidental tyrannies" dated 2026-03-03, showing a table of contents and lecture notes.

Apps and programming: two accidental tyrannies

On coding agents, malleable software, and the future of interface invention

andymatuschak.org iconandymatuschak.org

Humans are the bread in the sandwich, and the AI is in the middle.

That’s Dan Shipper on his podcast AI & I, talking with Every’s Kieran Klaassen, the engineer behind the compound engineering plugin. They’re working out where humans actually belong in an AI-driven workflow. It’s the same split showing up on the design side.

Klaassen, on the polish step at the end of the work:

The other moment comes at the end. Something comes out. How do you validate it? Well, it’s already tested—browser automated testing has clicked through everything, all the requirements are clearly specified, and it says everything works. But the beauty comes in when a human looks at it, clicks around, and has a feel for it: “Oh, this doesn’t feel right. We can polish it. We can make it better. There’s something still missing. We can make the design better.” […] all the way at the end, when everything is done, you can elevate everything and make it even better. And I think we need to do that, because if we don’t, it will all be slop—all the same. It’s very important to make it feel great because the bar is high, and the bar will always get higher.

“It will all be slop” is the line every team should have taped to a monitor. A passing test suite and a green PR don’t tell you whether the thing is actually any good. That judgment still lives with a human at the end of the workflow. Klaassen is correct that the bar keeps moving up, not down, and the teams who treat the polish step as optional are the ones whose products will look interchangeable in twelve months.

Klaassen, on the art-and-ownership argument:

But I do think that in the end, if you ship something—if you make a statement in the world—and you want it to be your own, you have to say yes or no at some point. You cannot fully automate everything. It’s a bit like making art. If you want it to be yours, it needs to come from you or somehow be connected. So I believe having those moments where you decide—where you choose what you enjoy—is so important. That’s why it’s so important to do things you enjoy and love.

Whatever your version of beautiful is, that’s the bread. Everything else is filling.

Cover art for "AI & I" podcast by Every, featuring a smiling man with glasses rendered in gold tones against a purple background.

The AI Sandwich: Where Humans Excel in an AI World

‘AI & I’ with compound engineering creator Kieran Klaassen

every.to iconevery.to

Karri Saarinen, Linear’s co-founder, calls out the confusion that most of the new design tooling is built on top of:

Design keeps being misunderstood in our industry. New tools keep promising to generate interfaces faster, move words to product instantly, or collapse design directly into code. The assumption behind them is clear: that design is the act of producing. That is the misunderstanding. The hard part of design is rarely generating the form. It is understanding the problem well enough to know what and how something should exist at all.

What I appreciate about Saarinen’s argument is that he doesn’t stop at the diagnosis. He reaches for Christopher Alexander’s Notes on the Synthesis of Form and recovers a vocabulary term the industry has been missing:

Christopher Alexander came closer than anyone to naming this clearly. In Notes on the Synthesis of Form, he describes design as the search for a good fit between a form and its context. Context, in his sense, is not a background condition. It is the full set of forces that make a problem what it is: human needs, technical constraints, conflicting requirements, habits, edge cases, and relationships that are easy to miss until you spend time with them. Bad design appears where those forces remain unresolved. Good design appears where those misfits have been worked through carefully.

Context as forces, not background. The current generation of prompt-to-code tools, including Lovable, Figma Make, and Claude Design, is very good at producing a plausible form against a thin slice of context. Saarinen describes the symptom directly:

You can already see the result in products that look polished, ambitious, and impressive at first glance, but begin to unravel the moment you actually use them. They feel brittle, poorly integrated, and full of decisions that were never fully worked through. The form is there. The fit is not.

That same bottleneck shows up on the workflow side: production speeds up, judgment doesn’t.

Saarinen’s closer:

The risk is mistaking generated form for solved problems.

That is the mistake to watch for, in your own work and on your team. Design is what happens when someone takes the time to understand the forces and works the misfits out of the form.

Loose, expressive ink and wash sketch of an abstract architectural structure with dense crosshatching and gestural line work.

Output isn’t design

Design keeps being misunderstood in our industry. New tools keep promising to generate interfaces faster, move words to product instantly, or collapse design directly into code. The assumption behind them is clear: that design is the act of producing.

x.com iconx.com