Alt-Text: The PM's Cut
"Content creators don't lack words; they lack bandwidth."
In my WebAI Summit 2025 talk, I talked about the “AI Feature Trap”: shipping loud, separate AI experiences that force users to learn new behaviors or abandon their main quest. Because of that, I fear that we are stuck in a Product Market Fit soul-searching phase, where the promise of AI is high, but the production value remains tepid.
I was recently reminded of this trap by an implementation of AI-assisted alt-text. It had all the symptoms: a standard description field with a “Generate with AI” button bolted to the side. This “after-thought” approach is risky. Because it fails to integrate with the user’s existing workflow, it is often ignored or perceived as an unwelcome detour.
The issue is the physics of friction. We often assume people skip alt-text because they don’t know what to say, but the reality is simpler: they lack the incentives or the cognitive bandwidth to leave their ‘Main Quest.’ By asking them to go out of their way to click a button and wait for an inference to finish, you haven’t lowered the friction—you’ve introduced a different tax. Worse, if a user generates three versions and realizes the first one was the best, they often find it’s gone forever, overwritten by the “helpful” magic.
I built an AI-Assisted Alt-Text Demo to explore an alternative. As I mentioned in my talk, I call this approach Artfully Applied AI because we are still in the early stages of learning and validating what works: it’s an art not yet a science. Take this deconstruction as food for thought, not a perfect recipe for all occasions. My hope is to help you move away from the “bolted-on AI” trap and toward experiences designed for intent, not just user action. This is a shift toward Omotenashi—the Japanese mindset of anticipating a user’s needs before they even have to express them.
1. Empathy: Or how to design doors
In the spirit of Omotenashi, we have to start by seeing the experience through the user’s eyes. This starts by being honest about who we are designing for.
In practice, this means identifying where our guests—the creators—experience the most friction. Those who already prioritize accessibility will appreciate the added efficiency, but they aren’t the ones we need to convince. Our real targets are the creators who haven’t built the habit yet. For them, alt-text is a high-friction detour. The “bolted-on AI” model fails because it ignores User Inertia. When a creator is in the flow of their “Main Quest,” any extra step feels like an energy drain. We shouldn’t ask them to change direction; we should design for their existing Momentum.
To reach the unconverted, we need an architecture that embraces momentum just like a push-bar on an exit door does. In the ‘PM’s Cut,’ the UX is designed to nudge users along their existing direction of travel:
-
The Mindset Shift: People like to tell stories to other people. Writing a description for a concept (accessibility) doesn’t ring the same. I decided to frame the alt-text as a “short story” for an audience: “Every image tells a short story. Share it for those who can’t see the pixels…”.
-
Zero-Click Drafts: Instead of hoping for a new habit to form, I seek to act in the user’s best interest by default. If a user moves to draft their post without providing alt-text, the app automatically populates the field using a background AI-generated result. This also helps reset expectations about AI latency or quality and doubles as an automatic tutorial of the feature.
-
Focus is King: The app is obsessive about respecting the user’s “Main Quest.” If the app renders a draft but detects the user has already moved on to writing the post content, it refuses to steal the focus back to the alt text. Features, AI or not, should always be in service of the user, not fighting for their attention.
-
The Human Fallback: When the AI fails to “see” the image, the placeholder pivots: “Even AI can’t see these pixels. Tell the story for everyone—and everything—who can’t see pixels…”. If the tech is entirely missing, it switches to: “Local AI is a no-show for this setup. Tell the story […]”. This reminds the user that their input is the ultimate, unbreakable fallback. It also highlights the inward angle: alt-text also benefits the Machine Audience (SEO crawlers, indexers, and recommendation engines) so that they can truly know the story you intended to tell, and help you reach more people.
2. Performance: Give it a head start
If the “first impression” of an AI feature is a long spinning loader, the user will build a mental performance tax that discourages future use. Device landscape and AI models will continue to evolve, but we must optimize for the hardware of today. In this demo, we lean into a unique advantage of client-side AI: inference is free. Unlike cloud APIs, there is no per-token monetary penalty for being proactive, allowing us to hide the “seams” of the technology by working ahead of the user.
-
Prewarming: While a developer’s high-end workstation might initialize an AI model quickly, we must design for the “everyday” device—the mid-range laptops that comprise the bulk of the web. On these machines, cold start latency is a conversion killer. I trigger a prewarming task the moment the user lands on the page to ensure the engine is hot and ready to serve by the time they reach for it. Always call
create()withinitialPrompts. This ensures the model isn’t just loaded into memory but also primed for the task ahead. You should also consider going one step further by kicking off a plausible task—like generating alt-text for the sample image. -
One Step Ahead: I don’t wait for the user to ask for another alt-text. While the user is busy reading the first draft, the app proactively triggers a second generation in the background. If they click the Generate with AI button, the result is already there or well on its way.
-
Anticipate: I use intermediary signals—like hovering over a “Refine” button—as triggers to start a background task for our best guess as to what the user will need next. You could imagine more advanced implementation where the triggers include monitoring for a long pause after a manual edit and other heuristics. It’s all about optimizing a prediction engine.
-
Temporal Illusions: Sometimes, the best UX is a white lie that cares more about embracing the user’s mental model rather than just raw technical throughput. Even if a result is ready, the app adds a “thinking” delay and uses a typewriter effect to stream the response manually. Because the output is short, I skip the
promptStreaming()in favor of the simplerprompt(). I did this for three reasons: it gives me control over the experience regardless of device speed; it buys us time to give the next speculative task a solid head start; and it builds trust in the output. Indeed, research into the “Labor Illusion” shows that for complex tasks, users value a result more when they see the “effort” of the work. -
Heuristic Aborting & In-Flight Hand-off: The Prompt API is a FIFO queue. As we lean heavily into proactive patterns as noted above, we also risk clogging the pipe when it matters the most. To avoid this, I use
AbortControllerto clear background tasks the moment the user expresses a specific need that isn’t already at the top of the queue. This ensures their actual intent is never stuck in line behind a hypothetical request. Conversely, if intent matches a task already in flight, the app ‘adopts’ the work in progress. This ensures no compute cycles are wasted on identical requests. -
The Kill Switch: Every user-facing AI feature must provide an exit strategy. Whether a user has “commit remorse” and wants to tweak their manual input, or a model exceeds the user’s patience or enters a repetitive loop, they shouldn’t be forced to wait it out. In this demo, I replaced the standard spinner with a shimmering “Stop” icon that reveals a clear abort button on hover. This grants the user immediate agency to kill a task they no longer want, and pivot back to their own writing or a previous history entry without friction.
3. Semantic UI: Typewriter vs. Morph
The “UI Handshake” is where the hidden orchestration meets the user. Since we have two distinct modes of operation—generation and refinement—we need visual signatures that signal exactly what the AI is doing:
-
The Typewriter: When the AI generates a “short story” from scratch, I use a typewriter effect. This signals the arrival of new information and respects the Labor Illusion—it looks like the system is “composing” the thought. It also creates a rhythm that allows the user to begin processing the result as it appears, while the app is already busy working on the next speculative task in the background.
-
The Word-Morph: When a user edits their own text or an AI-generated draft, the AI icon pivots to a “Refine” mode. I avoided re-using the typewriter effect here; forcing re-reading without context feels like regression. The semantic morph effect signals that the system is reformulating an existing idea rather than starting over.
4. Playfulness: The Collaborative Vibe
Efficiency gets users to use a feature; personality gets them to like it. In the “PM’s Cut,” I use small, intentional moments of playfulness to signal that the AI isn’t just a script, but a partner in the creative process.
-
The Humanized Typewriter: I use a typewriter effect that mimics “super-human” thought rather than a sterile data dump. It features jittered burst speeds—typing faster in the middle of words and pausing slightly in between words. I even added probabilistic “typos” where the AI makes a mistake, pauses, backspaces, and corrects itself. It’s a subtle “UI wink” that helps build the Labor Illusion while making the wait feel active and alive.
-
Deduplication Double-Take: AI can be repetitive, especially with the same input and short outputs. To keep the UI high-signal, I implemented deduplication logic. If the model generates a result that already exists in the history, the app doesn’t push a new, redundant entry. Instead, it performs a “double-take” animation on the icon and shows a witty tooltip like “Nailed it, twice” or “Deja vu?”. It’s a bit of theater: we’re pretending a rare duplicate is a sign of the AI’s “confidence” in a previous suggestion, adding a moment of levity while keeping the history stack clean.
5. Iteration: Safety in the Stack
"Preference is discovered, not declared. Success is a frictionless path to a version the user is proud to ship."
We are wired for discovery through variation. There is a certain creative dopamine in seeing a “take” that isn’t exactly what we expected, but reveals an angle we hadn’t considered. While the “Variable Reward” mechanism is often weaponized for addiction, here it is used for intent triangulation: we often only know what we want after seeing a few versions of what we don’t.
AI rarely nails “perfect” on the first try because perfection is subjective; it would require a “context dump” no user has the bandwidth to provide. By embracing our natural drive to tweak, we shift the goalpost: success isn’t about a single magical generation, but about providing a frictionless iterative path to a version the user is proud to ship.
-
The Non-Destructive Stack: Because client-side inference is effectively free, we can afford to unleash our user’s inner-prolific nature. Every time a user clicks the AI icon, the app doesn’t overwrite the previous result. Instead, it treats every generation as a new draft. Users can cycle through their history using a selector widget, or common keyboard shortcuts (like arrow keys from the top and bottom rows), ensuring that the first version is never more than a few keystrokes away.
-
The “Safety Net” Undo: The app maintains a clear distinction between user-edited text and AI-generated drafts. Even when refining a manual edit, the app preserves the user’s original text as a distinct entry in the stack. This lowers the stakes of experimentation: when the cost of ‘undo’ is near zero, users are more likely to play with the tool.
"While compute is free, human attention is a luxury. Our job is to wisely spend the former to spare the latter."
- The Case for Constraints: Despite the appeal of unlimited co-creation, I cap the history stack. An infinite history increases UI complexity and triggers choice paralysis. By maintaining a finite history and automatically pruning empty or redundant entries, I keep the user focused on the goal: finding a satisfying description and moving on.
Field Notes
- Cloning: Use
session.clone()for every independent inference to prevent “hallucination drift” from session history. - Metrics: Track “Time Saved” vs. “Time Lost” to make the value of anticipatory design tangible.
- Polling: Harden
availability()with background polling and exponential backoff to maximize reach while browser implementations stabilize.
Artfully Applied AI is still a frontier. I want to see your “UI winks” and your solutions for the physics of friction. Let’s keep writing the #WebAIPaybook together.