Skip to content

151 posts tagged with “process”

Dan Carey leads product at Anthropic Labs, the team behind Claude Code and Claude Design. In a talk on how a three-person team shipped Claude Design in ten weeks, he describes what happened to everyone else after their engineers got fast:

And so once Claude Code took off, the bottleneck moved. The bottleneck moved from building the feature to figuring out the right things to be building for your users, in a lot of cases. So the option was either skip those early steps, just try and decide on the fly, and potentially build the wrong thing really fast, or try to find ways for the rest of us to speed up. So our designers, our PMs, were having trouble keeping up. We needed our own accelerator tool.

Carey just relocated the bottleneck onto the exact work designers and PMs own: figuring out what’s worth building. That’s product discovery becoming the real constraint. When building gets cheap, what’s left to get right is the decision about what to build at all.

How does the team make that call? Not by writing it down:

So we like to use prototypes because documents are imprecise. It’s so easy for two people to look at the same doc and have two different products in mind about what the experience should be. […] Prototypes are more concrete, more visceral. They let you get hands on with the thing and really feel the experience yourself.

They skipped the PRD and the vision docs entirely. A working prototype immediately aligns people, and it doubles as the discovery tool: you build the rough thing to find out what the right thing is.

And it helped that the team was small enough to skip coordination entirely. Here’s Carey:

Everyone on the team does everything. The engineers talk to users, PMs write code, designers do data analysis. All of these things are enabled in part with Claude. And the lines between the roles on this team, they have essentially dissolved at this point. You do have your specialization, you do have the unique perspective and diversity that you bring to a team, but at any moment, any one of these people on this team can talk to 10 users, you can realize what the underlying problem is, you can design a solution to it, you can ship it to users, you can listen for feedback, you can keep iterating solo if you need to.

On Carey’s team, the designer who spots the problem also builds the solution and ships it. That’s the kind of role a lot of designers are now being asked to grow into, and it looks less like a handoff between specialists than one person carrying an idea from problem to finished screen.

Speed doesn’t guarantee you build the right thing, though, and Carey is candid about the team’s misses. They built a set of advanced, fine-grained controls for power users. A few vocal testers loved them—I know I would have. But the usage showed everyone else hated them, and the team pulled the controls in a week. Two lessons came out of it:

So this taught us a couple of things. One, this taught us that we should be a tool that lifts the level of craft for everybody, not just the ceiling on power users. It also taught us that we want to be as open as possible, because there will be users that we never meet the full needs of. There’s going to be some power user out there who wants to do something very specific that we’re not going to support. And that’s what convinced us that we wanted this to be a very open tool. That’s why if you export from it, you get HTML, CSS, JavaScript.

Designing with Claude: From prompt to production

Claude Design lets you describe what you want in plain language and get production-quality outputs. Learn how a small team built a design tool that ships in your brand, from prompt to production.

youtube.com iconyoutube.com

Addy Osmani makes a clean separation that most of the “is AI making us dumber” discourse keeps glossing over. He reports on Anthropic’s randomized trial of engineers learning a new Python library:

Engineers who used AI to ask conceptual questions scored above 65%. Engineers who copy-pasted the generated code scored under 40%. The tool didn’t determine the outcome. The posture did.

Osmani is writing for engineers, but most of that translates to designers picking up Figma Make, Lovable, or v0. Ship-without-comprehension scales beautifully right up until the moment you have to debug, redesign, or defend a choice you didn’t really make.

He ends on a ritual any designer can adopt verbatim:

I’ve started ending coding sessions with a simple question: did I learn anything today, or did I just close tickets? Sometimes the honest answer is “I just closed issues” and that’s fine. If it becomes the answer for months in a row, cognitive debt is accumulating in the background. Ship and learn are two separate metrics.

Workslop is the companion failure mode: the cost goes to your coworkers, where skipped learning costs your future self.

Hero image from Addy Osmani's post about not outsourcing the learning when coding with AI.

Don’t Outsource the Learning

Right now, it’s too easy to let AI write the code while you skip the learning. The bug gets fixed. Your mental model doesn’t move. We are silently trading future capability for present-day speed.

addyosmani.com iconaddyosmani.com

The artifact-to-intent argument has been working its way through design writing for a while now. What Jakob Nielsen adds to it, writing in UX Tigers, is a name for the failure mode that comes with the territory:

We used to accumulate design debt when teams shipped inconsistent components or patched over poor flows. Now we will accumulate intent debt: undocumented assumptions, vague brand guidance, missing escalation rules, untested agent permissions, and research insights that never become usable by the systems doing the work. Intent debt will be harder to see than visual inconsistency, but it will be more damaging because it compounds invisibly through every generated output.

Nielsen’s prior writing on intent-based UX argued that evaluation has become the new bottleneck for the user. A chat completes the task in seconds, and you spend the next half hour checking whether it actually did what you meant. Intent debt extends that bottleneck to the organization. The team ships ten variants in an afternoon, and nobody can tell which ones violated a brand rule that was never written down, or bypassed an escalation path that only lived in a senior designer’s head.

Nielsen puts the failure plainly:

The new danger is that AI will produce many adequate screens that all seem defensible in isolation and incoherent in aggregate. Mediocrity will arrive well-dressed. The designer’s role is to prevent the organization from drowning in plausible options.

Which is why the design system has to grow up:

The design system thus stops being a component library and becomes an operating system for taste. Tokens, components, and usage rules are only the visible layer. Underneath must be a deeper set of instructions about brand behavior, interaction philosophy, accessibility standards, motion logic, content tone, escalation patterns, and product judgment. The system must know not only which button to use, but when not to add a button at all.

Developer Mark Anthony Cianfrani has argued that LLMs finally let us ship the reasoning behind a token alongside the token. Nielsen draws the consequence of skipping that work: a weak design system in the AI era becomes an active liability. Agents will faithfully build with whatever’s encoded, and faithfully invent the rest.

AI-generated hero image for Nielsen's UX Tigers post on design shifting from artifact production to intent shaping.

Design Changing from Artifact-Production to Intent-Shaping

AI is changing the object of design itself. The UX profession’s most valuable contribution stops being UI production and becomes the design of intent: defining what good means, encoding judgment into live systems.

uxtigers.com iconuxtigers.com

Last week I linked Ravi Mehta on the three layers of context engineering for AI prototyping: functional spec, visual wireframe, structured data. Karo Zieminski, an AI PM writing Product with Attitude, makes the same case at the product scale and cites Mehta directly. Mehta wrote about prototyping one screen; Zieminski writes about designing the whole product around an agent.

Zieminski puts it in one line:

Prompt engineering is deciding what and how to ask the model. Context engineering is deciding what the model knows when it answers.

Then the asymmetry:

A well-crafted prompt in a poorly engineered context still fails. A poorly crafted prompt in a well-engineered context often succeeds.

That asymmetry is the argument for treating context as the underlying system.

If that asymmetry is real—and a year of using these tools tells me it is—then most teams are still optimizing the wrong layer. The visible artifact is the prompt. The work that actually decides the output is everything around it.

The piece I want to underline is who owns the work:

PMs define what goes in each context layer. Engineers build the infrastructure to fetch and store it… If the PM isn’t doing this, one of two things happens. Either an engineer makes the product decision by default, or nobody makes it and the agent gets every available signal dumped into the window.

Zieminski calls the alternative abdication. I think she’s right and I also think most PM job descriptions in 2026 haven’t caught up. The hiring filter still selects for ticket-shaping and roadmap maintenance, not for “decide what the model should know about the user, what should age out, what should never get re-fetched.” Those are product decisions about how memory is organized, and the people best positioned to make them—PMs who understand the product and the user—are often the ones least equipped to talk about retrieval and eviction. The gap is one of vocabulary and authority.

Both write for PMs, but the work is also design work. The context an agent sees is a designed surface: what gets included, what gets hidden, what should age out, what should persist between sessions. Mehta’s three-layer brief—spec, wireframe, JSON, twenty minutes in Figma, real data—is daily prototyping for designers working with agents now. Zieminski’s architecture is the system those prototypes live inside. If designers don’t show up here, PMs and engineers will design this surface for us.

Illustrated header for Karo Zieminski's Product with Attitude essay on context engineering for AI PMs.

An Illustrated Guide to Context Engineering, Prompt Engineering, and The Future of Both

Karo Zieminski, an AI PM writing Product with Attitude, draws the line between prompt engineering (what you ask) and context engineering (what the model knows when it answers). She argues PMs—not engineers—own the context architecture.

karozieminski.substack.com iconkarozieminski.substack.com

Most agent-velocity hype rests on one premise: that writing code was the slow part. .txt, the team behind the structured-generation library, takes a saw to that assumption. The point goes back to two foundational software-engineering texts—Fred Brooks’s The Mythical Man-Month (1975) and Gerald Weinberg’s The Psychology of Computer Programming (1971)—and .txt puts it like this:

Software is what’s left over after a group of humans finishes negotiating with each other about what the system should do. The code matters, but it is the residue of the harder work, not the work itself.

Code as residue. That inversion reorganizes the whole conversation. The tools and processes we’ve built around software for fifty years—IDEs, wireframes, mockups, code review, even pair programming—have been about lowering the cost of producing the residue. Once that cost approaches zero, what’s left to slow you down is the negotiation underneath. And that negotiation has not gotten any cheaper.

What that layer actually consists of, in practice:

What slows down a team where agents do the implementation is the production of specifications precise enough for an agent to pick up and run. Roadmap, written down. Acceptance criteria, written down. The “what we actually want” forced into precision, be it via a test suite, a ticket, or a written design.

The bottleneck moves from people writing code to people deciding what code should exist. .txt calls that work management, and I’d put it a little wider; it’s also product, design, and anyone whose job description includes the phrase “what we’re building.” A spec precise enough for an agent is a falsifiable description of the outcome, with the trade-offs already made.

.txt on what runs underneath the spec:

Context is the commodity an organization runs on. It is the shared understanding of what we are building, why it matters, what has been tried, who decided what, what is load-bearing and what is vestigial. Humans on a team accrete it by osmosis. By being in the room, by reading the same Slack channel, by debugging the same outage at two in the morning. Most of it is never written down. When a senior engineer reviews a PR and says “this’ll break the migration,” they are drawing on context that has no document. Agents cannot do osmosis.

“Agents cannot do osmosis” is the line. Specs are the formal surface; context is what’s underneath, and teams absorb it without writing it down. The post closes here:

The companies that win the next decade will not necessarily have the best models or the best agent infrastructure. It will be the companies whose fifty people, then two hundred, then two thousand, can stay aligned on a shrinking set of decisions while shipping more output per head. They will be the ones that already knew, before agents arrived, that their hardest problem was coherence. That is a culture and management problem. Always has been.

Default header image for thetypicalset.com, .txt's company blog.

The bottleneck was never the code

.txt revisits Brooks and Weinberg’s old observation: software is what’s left over after humans negotiate what to build. With agents writing code cheaply, the negotiation is now the bottleneck. Coherence is the moat.

thetypicalset.com iconthetypicalset.com

Jess Eddy reaches back to the 19th-century pessimist Arthur Schopenhauer for the distinction senior creatives may worry about:

Talent is like the marksman who hits a target which others cannot reach; genius is like the marksman who hits a target which others cannot even see.

Eddy borrows a line from Jack Grapes, the poet and writing teacher: “Make me look good, and I’ll keep you on the payroll.” That’s the trap. The longer you’ve been at it, the more reliably your talent delivers, and the more expensive it gets to walk away from what works. Most career advice says lean into your strengths. Eddy says your strengths keep you aimed at targets you already know how to hit.

For experienced designers, those targets are getting harder to find. AI is changing what counts as design work and what tools do it, and the ground under the profession is moving with it. The reflex when the ground moves is to double down on the move you’ve already mastered. But the mastered move hits the visible target. The targets that come next won’t be visible yet.

Eddy doesn’t let anyone skip the mastery step. The 5–10-year window isn’t optional. But once you’ve put in the time, you have to walk away from the talent that made you reliable. Eddy closes with Grapes:

Talent does what it can, genius does what it must.

Header illustration for Jess Eddy's Genius vs. Talent essay on everyday ux.

Genius vs. Talent: Why playing it safe holds you back

Jess Eddy on Schopenhauer’s line: talent hits a target others can’t reach; genius hits a target others can’t even see. Your reliable skills are the trap. The longer they’ve worked, the more expensive it gets to walk away from them.

everydayux.net iconeverydayux.net

Nearly nine in ten organizations now use AI in at least one business function. Ninety-four percent aren’t seeing significant value from it. Gale Robins, writing for UX Collective, argues that the gap is a framing problem, not an adoption problem. Her earlier piece on discovery judgment made the same case; the new one sharpens it with an anecdote that shows the trap:

A team I spoke with recently had compressed their discovery cycle from six weeks to ten days using AI. They were proud, and the throughput was real. When I asked what the work had taught them that they did not already believe, the answer was: not much. Same questions, faster. Same answers, sooner.

Same questions, faster. Same answers, sooner. Her analogy for the wider pattern is the electric factory one I’ve used before:

When factories first installed electricity, productivity barely moved. Manufacturers replaced steam engines with electric motors and kept the line-shaft layout. The breakthrough came later, when they redesigned the factory around what electricity made possible. The technology was only part of the answer.

Robins maps McKinsey’s three waves of AI value—productivity, differentiation, transaction-cost reduction—and finds most teams stuck in the first one. Robins on where they have to go to get out:

These decisions are upstream of every artifact a team produces. They are also where AI productivity gains help least, and where human judgment compounds the most.

Robins’s evidence undersells her own thesis. She leans on Generative AI at Work—the Stanford-and-MIT customer-support study by economists Erik Brynjolfsson, Danielle Li, and Lindsey Raymond that became the canonical citation for “AI helps novices most”—to argue AI raises the floor, not the ceiling. Novices gained 34%; experienced workers, basically zero. That’s why so many designers who have never coded—like me—are now suddenly shipping with this newfound superpower. It’s the same finding behind the junior designer crisis. But LinkedIn’s Full Stack Builder rollout found the opposite: top performers adopted AI fastest and got the most out of it, because they had the judgment to know what to ask for. The floor-not-ceiling story is only true where the questions are fixed. Once the questions are the work, the pattern inverts. That’s exactly the territory Robins is mapping. If AI rewards the experienced most when the work is judgment-shaped, framing is where the gap between teams widens.

Cover illustration for Gale Robins's UX Collective essay on discovery as the work AI gives back.

Discovery is the work AI gives back

Nine in ten organizations use AI. Ninety-four percent see no significant value. Gale Robins says the gap isn’t about adoption: teams use AI to do the same work faster instead of asking what’s worth building.

uxdesign.cc iconuxdesign.cc

Open four agent windows at once and the day disappears in a way that feels productive but isn’t. David Hoang, writing in Proof of Concept, puts it plainly:

At times, HITL [human-in-the-loop] agent orchestration feels addictive like Candy Crush or scrolling social media. Every prompt shows a stream of tokens and visible progress being made. You sit and wait to hit the number 2 or continue prompting. Instead of doom scrolling, you’re doom building; a sense of productivity which leaves you not doing anything else.

To be abundantly clear, I’m not against HITL and it’s a great way to build. What I’m saying is the massive productivity gains take a toll on you. I’ve shipped real work this way; being locked in for entire afternoons and evenings to prompt sessions. Sometimes I get good outputs and other times I don’t get anything valuable.

The orchestration tax is like the coordination tax at work. I’m feeling like I’m building but really air traffic controlling in parallel. You are reading partial outputs, deciding which to merge, which to discard, which to re-prompt. It’s a job, and an important one, but it’s not the deep work in design, writing, or thinking I need to do. That is a real job. It is not, however, the same job as design or writing or thinking. It uses a different part of you and it depletes a different reservoir. By the time I sit down to actually draw something or write a paragraph that matters, the reservoir is empty.

I orchestrated my way out of having anything to say.

Hoang’s analogy to coordination tax—the meeting load that eats the day at any tech company—is exact. Watching a token stream and deciding what to keep is real work. It is not the work you sat down to do. Orchestration spends from the same reservoir or account that making spends from, and you do not feel the withdrawal until the end of the day when you go to write the paragraph and there is nothing in the tank. Hoang’s tactical answer is to switch defaults: human-in-the-loop for the few things that benefit from your synchronous attention, human-on-the-loop for everything else, with a real review block on the calendar.

The shift is from watching to bracketing. Agents need start conditions and end conditions, not a babysitter in between.

Header illustration for the Escape from agentic loop essay on Proof of Concept.

Escape from agentic loop

David Hoang on the cognitive cost of orchestrating four agents at once: the productivity feels real, but it depletes the same reservoir you need for design, writing, and thinking. He calls it the orchestration tax.

proofofconcept.pub iconproofofconcept.pub

When I wrote about the forward-deployed designer squad model earlier this year, I was working from the outside in: what the model should look like, who it serves, why it matters. Ron Bronson ran it for four years as director of a 40-person design division at 18F, the now-defunct US government’s in-house digital services agency. His post is the inside view and he diagnoses why most orgs never get there:

The real reasons that design roles aren’t being considered for this is the ways orgs constrain how designers show up on cross-functional teams. If your designers are only good for handoffs, you’re not going to invest in the headcount.

The people are the key, but you have to be opinionated about what you’re looking for your designers to do. If you’re looking for pixel-perfect, portfolio polish then you’re doing it wrong. Due to the quirks of federal hiring rules, we weren’t allowed to consider portfolios. It didn’t mean we couldn’t look at them, they just couldn’t be part of the criteria someone got an offer or not.

Take the portfolio rule: federal hiring restrictions sound like the kind of constraint that makes a practice worse, and instead they forced 18F to evaluate designers on the things that actually predict forward-deployed performance—ambiguity tolerance, collaboration, low ego, willingness to work in the open. The portfolio gauntlet that dominates tech-industry design hiring optimizes for the opposite skill: producing pixel-perfect artifacts in isolation. Bronson’s team got better signal because they were prevented from looking at the worse one.

Bronson on the multidisciplinary bar:

hired designers who can do more than one thing. Some impressive UX researchers would show up on our doorstep often, and if they talked to me, I’d be very direct with them about how we worked and that our designers often had to wear more than one hat out of necessity. The other constraint? Headcount. Design often has to justify itself more than other practices, so we couldn’t afford people who were too “special” to be staffed to a broad array of partner engagements. What this meant in practice? Designers who could code, researchers with content strategy & information architecture chops, service designers who could lead and/or PM projects, and every designer being a strategist on some level.

Generalist breadth in this context is a structural requirement of the engagement. That’s what Bronson means by “wear more than one hat out of necessity.” You can’t deploy a specialist into a six-week problem-scoping sprint and expect them to be useful for more than one week of it.

Bronson on where designers should sit:

As I explained in Design as Repair at IxDA Oslo last September: we need designers embedded where problems happened, not downstream after it’s been scoped, broken and all the framing has been done and asked to execute.

Most design orgs are structurally downstream: invited in after PM and engineering have already decided what’s being built, given a brief that pre-resolves the questions design should be asking. Bronson’s 18F was built to refuse that posture by default, which is why the model worked there before it had a name.

Screenshot of the article page at blog.ronbronson.com.

What Forward Deployed Design Actually Looks Like

Ron Bronson on what made forward-deployed design work at 18F: multidisciplinary hiring, upstream embedding, and the organizational constraint that determines whether designers ever get invited into the room.

blog.ronbronson.com iconblog.ronbronson.com

Thariq Shehzad, on the Claude Code team at Anthropic, has switched from markdown to HTML as his default agent output format. The reasoning is more honest than a format-war argument would suggest, because it’s about what humans will actually read. He opens by acknowledging what markdown was for:

Markdown has become the dominant file format used by agents to communicate with us. It’s simple, portable, has some rich text capability and is easy for you to edit. Claude has even gotten surprisingly good at using ASCII to make diagrams inside of markdown files. But as agents have become more and more powerful, I have felt that markdown has become a restricting format.

Then the pivot:

As Claude is able to do more complex work, it is also writing larger and larger specs and plans. In practice, I’ve found I tend to not actually read more than a 100-line markdown file, and I certainly am not able to get anyone else in my organization to read it. But HTML documents are much easier to read, Claude can organize the structure visually to be ideal to navigate with tabs, illustrations, links, etc.

When the spec gets long enough that you stop reading it, you’ve quietly moved from review to rubber-stamp. Shehzad’s answer isn’t to ask Claude for shorter specs. It’s to make the artifact something a human will actually open, scroll, and share. A controllable, shareable artifact is most of what made personal computing legible in the first place; HTML is the format that already does it.

He puts the trade-off honestly when the obvious objection comes up:

While markdown often uses fewer tokens, I’ve found that the added expressiveness of HTML and the much higher likelihood of me reading it means I get overall better output. With the 1MM context window in Opus 4.7, the increased token usage is not really noticeable in the context window.

And the close is the real argument:

The real reason I use HTML is that I feel much more in the loop with Claude. I had begun to fear that because I had stopped reading plans in depth I would simply have to leave Claude to make its choices. But I am happy to say instead that I feel more in the loop than ever before when using HTML.

Header image accompanying Thariq Shehzad's post on switching from markdown to HTML for Claude Code agent outputs.

Using Claude Code: The Unreasonable Effectiveness of HTML

Thariq Shehzad on Anthropic’s Claude Code team switched his agent output from markdown to HTML — because what keeps Claude honest is what humans actually read.

x.com iconx.com

My recent newsletter, “Out of Your Head, Into the File,” made the case for getting taste out of your head: writing it down so it can survive the messy middle of an AI workflow. Mia Kiraki, writing in Robots Ate My Homework, picks up the other half of that problem: how taste erodes when you don’t.

Her central image is the Hansel and Gretel gingerbread house, recast:

AI output is the gingerbread. You’re tired, the deadline is close. The output IS the shelter. You eat it - of course you eat it. That’s what gingerbread is made for, right? […] The Grimms buried a much smarter lesson in the early pages, before the witch even shows up. Hansel drops breadcrumbs to mark his path through the forest and the birds eat every one. He still leaves them, though. Those breadcrumbs are your taste, every little choice you make on the page (this word, this angle, this risk) which leaves a marker of who you are. The forest will always try to erase them.

What’s specific to AI is the environment. It’s now optimized to wear taste down a percent at a time, in ways you can’t feel while it’s happening, until the work you used to do feels like someone else’s.

Kiraki puts the mechanism plainly:

If most of your reading this week was AI slop (which is pretty likely given the state of the internet), you trained your judgment on machine output. […] Each accepted sentence is a small vote for a lower standard. You accept a vague phrase because the deadline is close. You let a soft claim through because rewriting from scratch would cost an hour you don’t have. […] Taste dies slowly, when you wake up someday and read something you wrote six months ago and realize you used to sound so different. You had edges, took risks, made claims, and you sounded like a person who made choices.

Kiraki gets at the failure mode that follows once taste is the only real moat left.

Her counter:

Read work that operates at a higher standard than yours. Work where someone made choices you wouldn’t have made or took risks you would have edited out. Your taste calibrates upward when you expose it to judgment that outclasses your own. […] Practice the explanation. When something in your own work feels wrong, write down the reason. The specificity of your explanation is the weapon. […] Ship work that makes you nervous. If a piece feels comfortable to publish, you probably didn’t push hard enough. The pieces that make your stomach tighten show and prove your taste is working at full capacity.

The middle one is what that newsletter was about: writing down the reasons, not just the verdict. Kiraki adds the bookends. Read above your level so your baseline isn’t drifting toward consensus. Publish the version that scares you a little, because the version that doesn’t is the gingerbread.

Featured illustration for Mia Kiraki's Substack essay on protecting taste in the AI era.

How to bulletproof your taste in the age of AI

How AI output erodes your editorial judgment, four diagnostic prompts to measure the damage, and the only protection that really, truly works.

open.substack.com iconopen.substack.com

Talking to Peter Yang, Ravi Mehta—former CPO of Tinder, now teaching AI prototyping at Reforge—walks through a live demo of building the same Spotify-style genre page three different ways. The first attempt uses a short functional prompt and produces something that, in Mehta’s words, kind of feels like AI slop. The third uses what he calls a full-stack context bundle: a functional spec, a 20-minute Figma wireframe, and a JSON file of real album data pulled together in Claude with an MCP server. The output is night and day.

His definition of the shift:

Context engineering is designing and building systems that provide an AI model with the right information and tools to accomplish the task. And I think a lot of the common mistake I see with prototyping is people don’t think about context within that 360 degree way. And as a result, people just, you know, write a quick prompt or a quick little mini spec and expect the prototype tool to be able to create something as high fidelity as what they used to create before when they had all of these different artifacts that are a critical part of the product lifecycle.

That definition will sound familiar to anyone who saw Philipp Schmid’s framing of context engineering when it first circulated. Same emphasis on “right information and tools.” It’s the working definition the field has settled on. What Mehta adds is the concrete answer to “okay, what are the three things you actually have to assemble?” Functional context (a spec), visual context (a wireframe), and data context (real structured JSON, not lorem ipsum). Skip any of them and the prototype either looks generic, behaves wrong at edge cases, or breaks suspension of disbelief the moment a real customer touches it.

The piece I want to underline is his defense of visual thinking, because the “designers are obsolete” takes haven’t stopped, and Mehta gives them a clean rebuttal:

So if you start to think differently about the different types of context that are available, you can actually get much more specific and have a lot more control over what gets built and build something that’s a lot more robust. This is functional context. The next level that is really important is visual context. […] And so here, I very quickly in Figma, just taking 20 minutes, done a wireframe, and sort of outlined what I want this interface to look like. […] The prototype needs to have a level of fidelity that’s hard to get with sort of traditional prompting techniques.

Twenty minutes in Figma, then a short prompt that says “use the attached wireframe.” A wireframe does what a 17-page PRD and three rounds of trying to describe a layout in English to the model can’t. The wireframe is part of the input to the deliverable now.

The corollary cuts the other way too. If the wireframe is now an AI briefing document, the people who can produce a decent one in twenty minutes have a real edge over the people who can’t. That’s still designers, still us. It’s just that the wireframe now feeds the model directly, not only the engineer reading the spec next sprint.

Everything You Need to Know About Context Engineering in 40 Minutes

Ravi Mehta builds the same Spotify-style page three times to show how functional spec, visual wireframe, and real data each level up an AI prototype.

youtube.com iconyoutube.com

In product orgs, the word “autonomy” tends to get attached to seniority and titles. Sara Paul, writing for Nielsen Norman Group, puts the bar somewhere else:

Our research shows that autonomy is about becoming sufficiently informed to credibly shape shared product decisions.

You’ve earned design autonomy when you’ve collected enough context to make a recommendation that holds up under scrutiny. Until then, you haven’t. Low-autonomy designers, in Paul’s terms, “execute predefined solutions.” High-autonomy designers shape what gets prioritized, because they know things their stakeholders don’t.

The four-part pipeline is the practitioner half:

The designers who achieved high autonomy kept information flowing to them from all sources within their organization. Their pipelines consisted of four parts: (1) Gathering information from across teams and channels, (2) Building relationships with people who provide information, (3) Creating crossfunctional spaces for information to be shared, (4) Synthesizing information to form a “big picture” of context that empowered credible recommendations.

Paul’s examples are specific enough to put to use. The opening one is a lead designer at an online review platform whose ad-setup experience lived across mobile, desktop, and web. Three teams owned different parts of the experience and the whole was nobody’s job. Here’s how the story ends:

She saw the problem, took the initiative to gather the information she needed, and synthesized it into a recommendation that boosted her influence over what got built. This is design autonomy.

None of this required a new title. It required a tracker, a few standing meetings, and the willingness to do the synthesis work nobody assigned.

The designers I want—and have—on my team are the ones who can fill in for a PM when they’re on vacation. Paul’s article is the mechanism for getting there. The PM-shaped skill is holding the information context that lets you make a defensible call.

Title card reading "Boost Design Autonomy with an Information Pipeline" from NN/G, with six icons illustrating documents, collaboration, scheduling, workflows, UI review, and process pipelines.

Boost Design Autonomy with an Information Pipeline

A four-step framework for building influence over product direction by closing the information gaps that large, complex organizations create.

nngroup.com iconnngroup.com

Alex Dapunt, VP Design and Brand at Moonfare, opens with a research session in which a senior client laid out exactly what to build next, with the roadmap, rationale, and feature list ready inside a minute. The client was wrong, Dapunt writes, but not because he was stupid. He was wrong because he had been asked the wrong question and his instinct was to answer it anyway.

The smarter your users, the more convincing their wrong answers. A user says they want ice cream. While they say they want ice cream, what they need is to cool down. Their body wants sugar. It’s hot. There’s a memory somewhere in there, a summer ritual, something cold in their hand. The want closes off options. The need opens them. Take “I want ice cream” at face value and you sell them ice cream. Understand the need and you can sell them a popsicle, a cold drink, air conditioning, a swim in the sea.

The want-versus-need split is older than this piece. Dapunt credits Jared Spool for it. The part Dapunt adds is about who tends to give you the worst version of a want. He argues the failure intensifies in premium and B2B contexts, where the people you most want to talk to are the people most trained to produce confident answers under pressure.

The Moonfare client wasn’t an outlier. I think a lot about why this happens. Part of the answer, I think, is that the people we were interviewing had been trained, explicitly, to produce answers. At Bain, where I spent time earlier in my career, the core discipline is what’s called the answer-first approach, or the A1. You lead with the answer. Then you work backwards. […] It’s a disastrous way to sit in a research session as a user. An executive trained that way walks in and the instinct takes over. They feel the absence of an answer as pressure. They want to be useful. They want to look smart. They give you the A1, and it’s precise and articulate because producing precise articulate answers is what they are paid to do.

Dapunt’s observation about ambiguity is worth carrying into the next interview transcript you read. When a regular user says “I dunno, maybe?” he argues, the fuzziness is signal that the question is wrong. The executive doesn’t give you that signal, so you have to know to discount the clarity.

Dapunt then turns the same lens on metrics. His version of the metrics-as-avoidance failure mode is more specific: the wrong moment, not just the wrong number.

At Moonfare we tracked logins. More logins looks good on a dashboard. Looks like engagement. But private equity is a 5-to-10 year product. For most of that time nothing is supposed to happen. […] The right moment isn’t a platform question. It’s a life question. When does this person have cashflow? When’s bonus season? What does their portfolio look like right now, and is there a product we offer that fits the gap? The real need isn’t log in more. It’s be present when a decision is being made. Five well-timed touchpoints in a year beat fifty random ones.

The piece closes on the part of research practice that gets least attention.

Research is intake. You take it in. You synthesise. Then someone has to make the call and own it. […] In practice I’ve watched it produce three biases averaged into a consensus nobody owns. Someone has to own the interpretation. It can be a researcher, a designer, a founder, a PM. But it’s one person’s job, and it comes with the accountability for the call that follows. The alternative is research-as-stalling.

Dapunt is careful here. He likes continuous discovery, he likes the product trio in theory, and he is not making a contrarian case against any of it. His point is narrower. A team can run all the right research rituals and still end up with a process whose actual function is to ensure no single person has to take responsibility for being wrong.

dir14" text overlaid on a medieval-style painting depicting a crowd of figures in colorful robes gathered outdoors near a castle.

Users own the present. You own the future.

A few years ago I sat in a research session at Moonfare. Since private equity is a premium product, our clients are mostly C-level executives, founders or people who have spent decades being the person in the room with the answer. He was one of them.

dir14.com icondir14.com

PJ Onori built a tool that A/B tests his design system against AI agents, and he’s careful to say it isn’t impressive:

Two groups of agents get spun up, and both are given the same prompt to make an interface. One group’s given the old design system. The other is given our new one. Each agent provides feedback on problems faced after it’s done. Once all agents finish, the builds are evaluated on a bunch of crap and a report is generated.

The list of what the tool measures is long: timing, lines of code, code variance, fix attempts, components used, accessibility, performance, inline styles, visual diff, token usage, agent feedback. Onori, on the test he ran when he wasn’t sure his documentation was actually doing the work:

I was starting to question if documentation was making things better. Maybe component improvements was doing the heavy lifting–who knows? So, I ran a couple tests without documentation… The documentation was clearly the heavy lifter. […] Documentation is essential for systems that agents don’t have a lot of reps with. I’ve started to add a “For agents” section in the docs. That section is the dumpster for “get it in your silicon head” training.

The “For agents” section is a small idea with a real implication. Documentation has historically been written for one audience. Now there are two, and as Onori says elsewhere in the post, the second one needs “the same damned point” repeated five or six times and doesn’t care if the prose is ugly. His instinct is to wall that off so humans don’t have to read it.

Onori is publishing measurements where most people are publishing takes. That’s the missing piece in the design-system-as-moat argument: somebody actually testing whether agents do better with a well-built system than a worse one, and showing the numbers. Onori, on the closing caution:

There’s a lot of noise in the output, feedback, and analysis–otherwise know as everything. That noise compounds fast. Think of the telephone game–then think about what that’d do to a design system. […] Feedback needs to go through a BS filter. […] The feedback part of the analysis is helpful. Make no mistake. But it needs to heavy interpretation.

The telephone game is the right picture. A design system that updates itself based on agent feedback that’s been generated by other agents and analyzed by a third agent is going to drift somewhere strange in a small number of iterations, and nobody on the team will be able to reconstruct why. Onori’s tool stops short of that on purpose: it produces measurements, and a person reads them.

Stippled illustration of a person sitting at a desk, leaning forward and writing or working on something.

Testing agents on design systems

It’s really easy to say agents are able to use a design system. It’s another thing to prove it.

pjonori.blog iconpjonori.blog

Marcus Moretti’s guide to agent-native product management, in Every, is the orchestration shift showing up on the PM side of the team. The guide opens with the 1930s Procter & Gamble origin story: someone owns the product. The job has been rewritten so many times since then that PMs are now expected to be design partners, diplomats, sales people, and statisticians on top of running the 100+ software subscriptions the average company buys. What’s interesting is that the piece is describing the old role, finally legible again now that agents can absorb the administrative debt that piled up on top of it.

Now, much of the interdisciplinary work that goes into product management can be done by an LLM in minutes, sometimes seconds. What used to be a three-hour-long analytics investigation is now a simple back-and-forth with Claude. A product review that used to be a fortnightly chore emerges from a single typo-ridden chat message. This has been my recent experience, at least. I no longer struggle with semicolons in SQL queries or even write tickets. All of my product management work happens in conversation with, in my case, Claude Code. The conversation is the work.

“The conversation is the work” sounds like a description of the new job. Read it next to the 1930s origin story and it’s a description of the old one. The Brand Man at P&G wasn’t writing SQL; he was deciding what the product should be and who it was for. The intervening ninety years of accumulated tooling—agile ceremonies and ticket hygiene, analytics dashboards on top of those—was friction PMs had to push through to get back to the actual work. Moretti’s /ce-strategy command, modeled on Richard Rumelt’s Good Strategy Bad Strategy, isn’t a new artifact either. Strategy documents predate LLMs by decades. What’s new, Moretti says, is the cadence: every few months, the agent re-runs the strategy interview with the accumulated context of everything you’ve shipped.

Writing a strategy document cold is hard. The best way to do it, I’ve found, is to have an agent interview you. The ce-strategy skill does this. It runs through the sections in order and has built-in guidance about what makes a good answer (and what kinds of answers to push back on). […] The interview is deliberately conversational. If the first answer to, “What’s the core problem this product solves” is vague, the agent drills down: “Whose situation specifically? What do they try today, and why doesn’t it work?” The guidance here is taken from personal experience and from the Rumelt book.

The guide assumes a PM who has the taste to recognize when the agent’s follow-up has exposed a gap. The ones who don’t will end up with a strategy.md full of confident-sounding nonsense, generated quickly and reviewed lightly. Agent-native PM removes the alibi that you were too busy with tickets to do the actual thinking. That maps to a warning from Raj Nandan Sharma: when generation gets cheap, the scarce skill is refusal: knowing what to throw out and why. Moretti’s PM is doing exactly that, sentence by sentence, in the strategy interview.

Moretti closes:

LLMs have allowed our tools to catch up with the multifaceted duties of product managers. For me, product management has been reduced to the interesting parts: dreaming up features, thinking through designs, looking at interesting data, and talking to users. We all feel the economic imperative to embrace AI tools, but the better reason, I think, is to make work more fun.

Hand-drawn letter "G" in black chalk-style script on a light blue background, with a black bookmark icon in the top-left corner.

A Guide to Agent-native Product Management

A step-by-step guide to using agentic capabilities for better product management

every.to iconevery.to

Nick Babich on agents in UX Planet. A useful pair to his earlier writeup on Claude skills, since the two words get used interchangeably and they are not the same thing. Babich opens with the plain-language version:

Think of an AI agent as a program you run when you need to solve a particular problem in design. For example, you can create an AI agent that helps you with usability testing, code review, UI/UX audit, etc.

A program you run is the right mental model. A skill, the way Babich described it in his earlier piece, is a recipe: a markdown file Claude reaches for when a task matches. An agent is what runs once Claude has the recipe in hand. It carries state across steps, picks tools, reports back.

Babich’s four attributes of a well-designed agent get at that distinction without saying it out loud:

  1. Good clarity (intent alignment). A strong agent understands what success looks like, not just the task. This understanding helps it translate vague prompts into clear objectives.
  2. Context awareness. Good agents maintain and use context effectively. Not only do they remember previous steps, constraints, and user preferences (which is well-expected behavior nowadays), but they also adapt output based on the environment (tools, data, stage of workflow).
  3. Tool orchestration. Agents can perform the workflow autonomously and they have the ability to use the right tools for a task at hand is what makes an agent so powerful. Well-crafted agents can chain tools together into workflows, and they don’t overuse tools when simple reasoning is enough.
  4. Explainability (transparent reasoning). When you interact with an AI agent, you need to understand why something happened. Thus, an AI agent should provide a rationale behind decisions surface assumptions, and trade-offs.

Context awareness and tool orchestration are what separate an agent from a prompt template. A skill can ship intent alignment and explainability in plain markdown, but state across steps and the ability to chain tools require a runtime. That’s why Babich’s specs include Boundaries sections and “When Not To Use It” blocks: a stateful, tool-using program needs guardrails that a one-shot prompt does not.

If you haven’t built one yet, his five specs—Research Synthesizer, Competitor Intelligence, Problem Definition, Idea Generation, UX Flow Designer—are a clean starter pack. Pick the one closest to a workflow you already do by hand, and notice how much of the spec is about what the agent will not do.

3D illustration of an orange robot head with a maze inside its open skull, glowing circuit lines extending outward to orange cube nodes.

Agentic Product Design

5 design tasks you can automate with AI today

uxplanet.org iconuxplanet.org

Tommy Geoco’s $13,100 OpenClaw harness, ninety days in, is one way to build a personal AI agent. Anton Sten went the other way. He tried OpenClaw and Hermes, found the setup was “days, sometimes weeks, for minutes of return,” and built something smaller. Five Claude Code instances on a Mac mini, named after Suits characters, each handling one role. Architecture is a shared repo and a pile of markdown files. That’s it. Most AI-agent posts pitch what Sten calls “a team of bots that runs your business while you sleep.” His basement firm is the inversion.

Sten on what he actually wanted from his agents:

What I actually wanted was smaller. A handful of tools, each with a narrow job, that I could build in an afternoon and shape around how I actually work. So that’s what I did.

The names of his AI agents are from the show Suits (with Wendy borrowed from Billions), picked so the show’s personalities double as memory aids for each agent’s job. Harvey handles contracts and pricing. Donna takes Harvey’s notes and drafts the emails and follow-ups. Mike stores what Sten would otherwise forget. Louis worries about money. Wendy reads the others’ logs and points out where they’re slipping.

Sten on the autonomous-revenue pitch:

The team in my basement isn’t running anything autonomously. They don’t make decisions for me. If I unplugged the Mac mini tomorrow, my business would keep running. The conflation in the current AI conversation — between playing and building a thing that prints money — is the part I find a bit tiring. They’re treated as the same activity, when they’re almost opposites.

Sten’s right that the autonomous-revenue pitch is a fantasy. Less right on the binary that follows. Geoco’s harness is doing meeting prep, ingesting his survey research, and distributing his content across ten platforms while he sleeps. That counts as “running while you sleep,” and his $50,000 in sponsorship revenue from one survey project isn’t trivial. Play and revenue can sit on the same side. What matters is whether the human stays in the loop. Geoco does, and so does Sten.

The shape of what they’re building is also the same. The Harvey-to-Donna handoff Sten uses most and Geoco’s survey-prep loop are both the specialization-is-the-whole-game pattern: narrow specialists, human in the loop, work compounding into the system. Sten calls it play and Geoco calls it work. The architecture underneath does the same job either way.

Sten on practice:

I’d argue this is the business case for designers right now. Not the agents specifically — the playing. Because in a year or two, every job worth having is going to assume you understand how these tools work, and the only way to understand them is to spend time in them when nothing’s on the line.

The people who’ll do interesting work with this stuff in two years are the ones playing with it badly today.

Geoco is what Sten’s last sentence predicts. The person playing badly today is the person doing interesting work in two years. Sten describes that person as hypothetical. Geoco isn’t.

The basement firm

There’s a Mac mini in my basement running a small consulting firm. Five employees, all named after TV characters, none of them human. They take notes, write drafts, remember things I’ve forgotten, argue with my financial instincts, and occasionally tell each other to do better.

antonsten.com iconantonsten.com

Tommy Geoco spent ninety days and $13,100 tinkering with OpenClaw. His agent runs his capture loop, prepares his meetings, codes the survey for the state-of-prototyping report his studio shipped, and distributes his content across ten platforms. Tom describes the harness like this:

When you install OpenClaw, it is like a starter kit project car. It is a car frame with a swappable engine. The engine being any AI model you choose to use. It is basically a folder that you install onto your computer that contains about seven markdown files. […] When you stop thinking of a custom agent as just a chatbot and start thinking of it like an operating system, some useful questions are going to start to pop up like where does the memory live? What is the source of truth? How do I enforce my rules better? What should stay manual?

The seven files are plain text. soul.md holds the agent’s voice and judgment, agents.md defines permissions, memory.md handles long-term recall, and four others cover identity, the user, tool instructions, and a heartbeat. Tom layers an Obsidian vault on top as long-term knowledge and Slack as the chat surface. Tom on what actually limits an agent:

The agent’s limitations aren’t just about the model. They’re a lot more about the system that you have built around it because you can’t control the quality of the model, but you can control the quality of the system. […] The most important part of my setup is the knowledge vault. This is my alternate memory, and it is built around the work that I actually do.

Geoco says curation is what keeps the whole thing from drifting. The agent runs the loops on top of a vault Geoco curates, and the taste lives with him; the model itself is interchangeable. The challenging part is somewhere else entirely:

The most challenging part of this whole thing is the unlearning. Many of us have old habits that have calcified into our brain. It is why my 17-year-old is able to run laps around us. He has no baggage about how things are supposed to work.

Geoco is right that the unlearning is where the difficulty lives. The harness is just markdown and the model is rented; the orchestration skill Benhur Senabathi described as what designers actually picked up in 2025 is what you practice through the unlearning. Geoco closes the video by saying nobody’s harness is right and everybody’s works for them, which sounds about right to me too.

How I Built an AI Agent That Designs Like Me

This is a practical breakdown of what an OpenClaw agent is, and how I use it for my design and media studio.

youtube.com iconyoutube.com

Jake Albaugh wrote a piece on X called “Design is the work” that splits design from the artifacts it produces. Mocks, prototypes, screens, guidelines: those are outputs. Design itself, in his telling, is the upstream act of intent: figuring out what something should be and why, before anyone makes it. Bingo. That distinction matters now because AI is very good at the artifact and unable to do the deciding:

AI cannot do that part. You intend to do something that has not yet happened. You have to bring those parameters to the table to do anything novel. AI doesn’t know your constraints. It doesn’t know your strategy. It doesn’t know what moment in the market you’re in, what your team is trying to prove, or what your customers actually need versus what they’ve said they want. The expectation — the definition of what good looks like — is something only you can provide. AI’s job is to meet that expectation. Not to define it.

The piece made the case that intentionality has to come before execution and that AI changes neither requirement. The closer is where it gets interesting. After all that, Albaugh tells the reader he used AI to draft the essay:

It may surprise you to learn that I used AI to write this. The structure, the sentences, a lot of the phrasing — generated. But the argument existed before any of it. I knew what I was trying to say. I knew what examples mattered and which ones were wrong. I knew when a paragraph was close but not quite right, and I revised toward a target I’d already defined. […] That’s the point. The tools changed. The work didn’t. Design is the process. Design is the intentionality.

It’s a risky reveal. Most readers will read it as self-undermining at first. But the argument and the artifact are doing the same job: Albaugh had a target, and he used AI to reach it. The fact that the prose was generated is exactly why it matters that the argument wasn’t. He knew which examples belonged in the piece and which ones to throw out. The model couldn’t have known that either way, because the criteria for “good” didn’t exist anywhere outside his head until he wrote them down.

Karri Saarinen made a version of this same split when he argued that output isn’t design. The hard part is understanding the problem well enough to know what should exist at all.

A presenter stands on stage in front of a green slide reading "What should be automated? What should be left to touch?

Design is the work.

We’re in a moment where it has never been cheaper or faster to build something convincing. The cost of taking an idea and making it look real, feel functional, or seem finished has collapsed. That is genuinely good news if you already know what you’re building and why. It’s dangerous if you don’t.

x.com iconx.com

Maggie Appleton, staff research engineer at GitHub Next, wrote up her recent talk on agentic AI productivity. (Video here if you’d rather watch.) Her central claim comes early:

I call it this “one man, a two dozen claudes” theory of the future. The pitch here is that one person with a fleet of agents will do the work of an entire team of developers. The main problem with this dream is it assumes software is made by one person. All these tools are single player interfaces. […] Software is not made by one person in a vacuum. It’s a team sport. Everyone building it needs to agree on what they’re building and why.

The single-player critique is the missing piece in most AI productivity takes. Most demos of a coding agent show one engineer at a terminal. Designers face the same situation with AI prompt-to-code tools. Collaborating isn’t as easy as sharing a Figma link. That’s the actual gap in current tooling, and it’s downstream of the single-player assumption.

Appleton’s second move:

Implementation is rapidly becoming a solved problem, right? Writing code is now fast, it’s getting cheap, and quality is going up and to the right. The hard question is no longer how to build it. It’s should we build it. Agreeing on what to build is the new bottleneck. […] When production is cheap, opportunity cost becomes the real cost. You can’t build everything, and whatever you pick comes at the cost of everything else.

When production is cheap, picking what to make becomes the whole job. The cost difference between two engineering paths is now nearly zero, so the choice between them carries all the weight. Teams that miss this will end up shipping volume and mistaking it for productivity.

A talk like this could be about tooling, and Appleton does walk through Ace, GitHub Next’s prototype multiplayer workspace, in some detail. But the more important argument is about what you do with the hours you free up. Going faster is not the prize. Appleton:

We have an opportunity to not just go faster and build a giant pile of the same crappy software. But instead to make much better software through more rigorous critical thinking and better alignment in the planning stage. By doing more exploration, more research, and thinking through problems more deeply than we could have before.

The reclaimed hours are an opportunity, but they are also a test. Do you spend them shipping more, or do you spend them shipping better? The first answer gets you the giant pile. The second takes work the agents cannot do for you.

Appleton closes on craft:

Many people are now realising that in a world of fast, cheap software, quality becomes the new differentiator. The bar is being set much higher. Craftsmanship is what will set you apart from the vibe-coded slop. But craft still costs time and energy. It is not free, and in order to buy the time and energy for it, you need to do fewer things better, which requires strong alignment.

Title card for "One Developer, Two Dozen Agents, Zero Alignment" — a talk about collaborative AI engineering and a tour of Ace, the multiplayer coding workspace.

One Developer, Two Dozen Agents, Zero Alignment

Why we need collaborative AI engineering and a tour of Ace: the multiplayer coding workspace

maggieappleton.com iconmaggieappleton.com

Andy Matuschak describes two accidental tyrannies that have shaped software for forty years: the application model that traps software in one-size-fits-all packages, and programming as a specialization that crowds out non-programmers from inventing interfaces. He thinks coding agents could break both, and he’s already seeing it happen with the designers he works with:

I’ve been seeing it. I spent 2025 collaborating with two talented designers. Their story with coding agents this past year has been truly wild. I think the impact on my collaborators has been much greater than the impact on me, despite the fact that I’m now building perhaps ten times the speed.

Unlike me, these two started their careers in design and spent their formative years in the arts culture. They can program a bit, but the process was really slow and difficult enough to pose a significant barrier. At the start of 2025, coding models could implement small one-off design ideas—but their outputs would just fall apart after a couple of iterations. By the end of the year, my collaborators were routinely prototyping novel interface ideas and sustaining that iteration across weeks.

“The impact on my collaborators has been much greater than the impact on me.” Matuschak is moving ten times faster, and he still thinks his designers are the ones whose careers just turned over. That observation is rare from the person on the receiving end of the bigger gain in raw output.

Matuschak’s diagnosis of why the old arrangement was such a trap for designers:

Non-programming designers are trying to invent something in an interactive medium without being able to make something meaningfully interactive. So much of invention is about intimacy with the materials, tight feedback, sensitive observation, and authentic use. So it’s a catch-22: to enter into proper dialogue with their medium, a non-programmer needs to get help from a programmer. That generally requires the idea to be at least somewhat legible and compelling. But if they’re doing something truly novel, they often can’t make it legible and compelling without being in that close dialogue with their medium.

The old design-engineering separation trapped designers in a less obvious way. They often couldn’t even tell whether their ideas were brilliant, because they couldn’t get their hands on the material to find out. You can’t iterate on a feeling. You have to push something around until it pushes back. For most of my career, designers did that pushing in flat mockups and click-through prototypes, working through dynamic behavior they had never actually felt. Of course the technical ideas fell short. The designers themselves hadn’t felt the thing yet either.

That’s the asymmetry coding agents collapse. The loop between “I have an inkling” and “I am tinkering with a working version of the inkling” has finally closed for non-developers. They still can’t and mostly shouldn’t ship production code, but they don’t need to. The prototype is enough to do the design work. Once the gatekeeping melts, the next question is institutional: where does the next generation of interface inventors come from? Matuschak’s answer:

So, what now? We’ve spent decades building HCI programs that mostly look like computer science departments with design electives. But if we’re moving toward a world where invention is bottlenecked more on imagination than on technical expertise, we may have that backwards. We may need programs that look a little more like art school with technical electives—learning to develop ideas from intuition before being able to express them precisely, to discover by playing with the material.

Title slide and content page from Andy Matuschak's MIT HCI Seminar talk "Apps and programming: two accidental tyrannies" dated 2026-03-03, showing a table of contents and lecture notes.

Apps and programming: two accidental tyrannies

On coding agents, malleable software, and the future of interface invention

andymatuschak.org iconandymatuschak.org

Humans are the bread in the sandwich, and the AI is in the middle.

That’s Dan Shipper on his podcast AI & I, talking with Every’s Kieran Klaassen, the engineer behind the compound engineering plugin. They’re working out where humans actually belong in an AI-driven workflow. It’s the same split showing up on the design side.

Klaassen, on the polish step at the end of the work:

The other moment comes at the end. Something comes out. How do you validate it? Well, it’s already tested—browser automated testing has clicked through everything, all the requirements are clearly specified, and it says everything works. But the beauty comes in when a human looks at it, clicks around, and has a feel for it: “Oh, this doesn’t feel right. We can polish it. We can make it better. There’s something still missing. We can make the design better.” […] all the way at the end, when everything is done, you can elevate everything and make it even better. And I think we need to do that, because if we don’t, it will all be slop—all the same. It’s very important to make it feel great because the bar is high, and the bar will always get higher.

“It will all be slop” is the line every team should have taped to a monitor. A passing test suite and a green PR don’t tell you whether the thing is actually any good. That judgment still lives with a human at the end of the workflow. Klaassen is correct that the bar keeps moving up, not down, and the teams who treat the polish step as optional are the ones whose products will look interchangeable in twelve months.

Klaassen, on the art-and-ownership argument:

But I do think that in the end, if you ship something—if you make a statement in the world—and you want it to be your own, you have to say yes or no at some point. You cannot fully automate everything. It’s a bit like making art. If you want it to be yours, it needs to come from you or somehow be connected. So I believe having those moments where you decide—where you choose what you enjoy—is so important. That’s why it’s so important to do things you enjoy and love.

Whatever your version of beautiful is, that’s the bread. Everything else is filling.

Cover art for "AI & I" podcast by Every, featuring a smiling man with glasses rendered in gold tones against a purple background.

The AI Sandwich: Where Humans Excel in an AI World

‘AI & I’ with compound engineering creator Kieran Klaassen

every.to iconevery.to

Karri Saarinen, Linear’s co-founder, calls out the confusion that most of the new design tooling is built on top of:

Design keeps being misunderstood in our industry. New tools keep promising to generate interfaces faster, move words to product instantly, or collapse design directly into code. The assumption behind them is clear: that design is the act of producing. That is the misunderstanding. The hard part of design is rarely generating the form. It is understanding the problem well enough to know what and how something should exist at all.

What I appreciate about Saarinen’s argument is that he doesn’t stop at the diagnosis. He reaches for Christopher Alexander’s Notes on the Synthesis of Form and recovers a vocabulary term the industry has been missing:

Christopher Alexander came closer than anyone to naming this clearly. In Notes on the Synthesis of Form, he describes design as the search for a good fit between a form and its context. Context, in his sense, is not a background condition. It is the full set of forces that make a problem what it is: human needs, technical constraints, conflicting requirements, habits, edge cases, and relationships that are easy to miss until you spend time with them. Bad design appears where those forces remain unresolved. Good design appears where those misfits have been worked through carefully.

Context as forces, not background. The current generation of prompt-to-code tools, including Lovable, Figma Make, and Claude Design, is very good at producing a plausible form against a thin slice of context. Saarinen describes the symptom directly:

You can already see the result in products that look polished, ambitious, and impressive at first glance, but begin to unravel the moment you actually use them. They feel brittle, poorly integrated, and full of decisions that were never fully worked through. The form is there. The fit is not.

That same bottleneck shows up on the workflow side: production speeds up, judgment doesn’t.

Saarinen’s closer:

The risk is mistaking generated form for solved problems.

That is the mistake to watch for, in your own work and on your team. Design is what happens when someone takes the time to understand the forces and works the misfits out of the form.

Loose, expressive ink and wash sketch of an abstract architectural structure with dense crosshatching and gestural line work.

Output isn’t design

Design keeps being misunderstood in our industry. New tools keep promising to generate interfaces faster, move words to product instantly, or collapse design directly into code. The assumption behind them is clear: that design is the act of producing.

x.com iconx.com
Pointillist-style painting of a formally dressed figure in a black top hat holding a glowing green laptop, surrounded by a crowd of early 20th-century people.

A Sunday Afternoon with Claude Design

It’s really hard to get momentum on a side project when you have a full-time job with lots of travel, an active blog, and a newsletter. But I had to recapture that momentum because this side project is important. It’s for a preschool website for my cousin.

Walking into My Little Learning Tree is like stepping into pure warmth. Yes, yes, preschools are inherently fun environments, but the kids and the teachers there create a visceral energy that is simply special. I wanted to capture that specialness in a long-overdue website redesign project.

Looking at my in-progress design, something felt off. I had these long horizontal lines preceding the eyebrows—the small text above a heading that names the section—that didn’t feel right. First, they were straight. Second, the lines only occurred before the text, not also after. I clicked on the Comment button to enter Comment mode, then clicked on the eyebrow and prompted, “These lines aren’t playful enough. Let’s make them squiggles and have them before and after the eyebrow text.”

And then Claude Design did its thing.