A less capable AI agent that shows its work beats a powerful one that operates in silence. That is the central finding from “Mapping the Design Space of User Experience for Computer Use Agents,” a study published on February 12, 2026 by four Apple researchers: Ruijia Cheng, Jenny T. Liang, Eldon Schoop, and Jeffrey Nichols. They tested 20 participants using a Wizard-of-Oz methodology where researchers simulated agent behavior in real-time, deliberately introducing failures and ambiguous situations. The participants consistently rated transparency, predictability, and the ability to intervene as more important than raw task-completion ability.

This finding directly contradicts the current industry trajectory. OpenAI, Google, and Microsoft have spent the past year competing on agent capability: faster execution, broader tool access, more autonomous decision-making. Apple’s research suggests they are optimizing for the wrong variable. The question is not “can the agent do it?” but “does the user trust the agent enough to let it?”

Related: Human-in-the-Loop AI Agents: When to Let Agents Act and When to Hit Pause

The Study: How Apple Tested Agent UX

The research ran in two phases. In Phase 1, the team analyzed nine existing AI agents, including Claude Computer Use, OpenAI Operator, and Google’s Project Mariner, and interviewed eight UX and AI practitioners at a large technology company. From this, they built a taxonomy of 55 UX features organized into four categories with 21 subcategories.

In Phase 2, they ran a hands-on study with 20 experienced AI users. Participants interacted with a chat-based interface to request tasks like booking vacation rentals or shopping for products. Behind the scenes, researchers manually executed the tasks in real-time, simulating how a computer-use agent would behave. The researchers deliberately introduced errors, ambiguous choices, and risky actions to observe how users reacted.

This Wizard-of-Oz approach stripped away the variability of actual AI performance. Every participant experienced the same agent behaviors, making the results about user expectations rather than model capabilities.

The Four UX Categories That Matter

The resulting taxonomy covers four areas that any team building computer-use agents needs to address:

User Query: How users communicate tasks to the agent. Natural language prompts, structured inputs, or hybrid approaches. Participants wanted to give high-level goals (“find me a vacation rental under $150/night near the beach”) without specifying every click.

Explainability: How the agent communicates what it is doing and why. This is where the study’s strongest findings emerged. Users wanted running commentary on agent actions, not just final results. They wanted to see which websites the agent visited, what options it considered, and why it made specific choices.

User Control: How users intervene, redirect, or override the agent. Participants wanted easy mechanisms to pause execution, undo actions, and modify the agent’s approach mid-task. The study found that too many confirmation prompts made the agent feel useless, while too few eroded trust entirely.

Mental Models: How users conceptualize what the agent is and how it should behave. This category produced some of the most revealing findings.

The Mental Model Split: Assistant vs. Tool

Participants defaulted to one of two mental models when interacting with the agent, and which model they used changed everything about their expectations.

The Assistant Model: Users who thought of the agent as an assistant expected it to exercise discretion, handle ambiguity gracefully, and proactively surface relevant information. If the agent found a vacation rental that was slightly over budget but had significantly better reviews, these users wanted the agent to mention it. They expected the agent to learn their preferences over time and make increasingly independent decisions.

The Tool Model: Users who thought of the agent as a tool expected precise, literal execution. If they said “under $150/night,” they meant it. These users became frustrated when the agent deviated from instructions, even when the deviation was arguably helpful. They wanted predictable behavior and explicit confirmation before any action that was not directly specified.

Here is the problem for agent developers: the same user often switches between these models depending on the task and the stakes involved. Someone might want assistant-level discretion when browsing vacation options but tool-level precision when entering payment details. As the Computerworld analysis noted, agent designs need to accommodate both mental models simultaneously.

This maps directly to what we see in production agent design. The best implementations use progressive disclosure of autonomy: starting with tightly supervised, reversible actions and gradually expanding the agent’s independence as it builds a track record with each user.

Familiarity Changes Everything

One of the study’s most practical findings: user expectations shift dramatically based on how familiar they are with the interface the agent is controlling.

When participants were unfamiliar with a website or application, they wanted maximum transparency. They asked for intermediate steps, explanations of what the agent was doing, confirmation pauses before actions, and the ability to see the screen the agent was interacting with. This held true even for low-risk scenarios. A user who had never used a particular travel booking site wanted confirmation before the agent clicked “search,” not because searching is risky, but because they could not predict what would happen next.

When participants knew the interface well, their tolerance for autonomous action increased sharply. They were comfortable with the agent executing multi-step sequences without interruption because they could predict the outcomes. If something went wrong, they knew how to fix it.

The implication for agent builders: transparency requirements are not static. A one-size-fits-all approach to confirmations and explanations will either annoy experienced users or frighten novice ones. The KPMG Trust in AI global study, which surveyed 48,000 people across 47 countries, found that only 46% of people who use AI regularly actually trust it. Interface familiarity is one of the strongest levers for closing that gap.

Related: AI Agent Guardrails: How to Stop Hallucinations Before They Hit Production

High Stakes Demand High Control

The study confirmed what most designers intuit but rarely measure: users demand significantly more control when an agent’s actions carry real-world consequences.

Actions that triggered the strongest control demands:

  • Financial transactions: Making purchases, changing payment details, processing refunds
  • Communication on behalf of the user: Sending emails, posting messages, contacting other people
  • Account modifications: Changing passwords, updating personal information, modifying subscription settings
  • Irreversible actions: Deleting files, canceling reservations, submitting applications

Trust broke down fastest when agents made silent assumptions during these high-stakes actions. A participant might tolerate the agent choosing a mid-range hotel room without asking, but the moment the agent entered credit card information without explicit confirmation, trust collapsed. Rebuilding that trust took significantly longer than building it in the first place.

The Apple researchers recommended that “agent designs could intentionally embrace ‘seamfulness’”, prioritizing user understanding and preserving users’ agency to intervene, particularly in situations involving ambiguity and uncertainty. “Seamfulness” here means deliberately visible seams in the interaction, moments where the agent makes its reasoning visible and hands control back to the user rather than optimizing for smooth, invisible execution.

This aligns with what we documented in our coverage of rogue AI agents in enterprise settings. The Gravitee 2026 report found that 88% of organizations experienced or suspected an AI agent security incident, and the root cause was almost always the same: agents acting without adequate human checkpoints.

What This Means for the Industry

Apple is not building a general-purpose AI agent yet. Siri has gained features, Apple Intelligence handles on-device tasks, but Apple has not shipped anything comparable to OpenAI’s Operator or Anthropic’s Claude Computer Use. This study reads like a design brief for what comes next: Apple systematically mapping what users actually want before building it.

That approach stands in contrast to the rest of the industry. Google’s Project Mariner, Microsoft’s Copilot agents, and OpenAI’s Operator all launched capability-first, then iterated on trust and transparency based on user feedback. Apple’s research suggests starting from trust and building capability on top of it.

Three concrete design principles emerge from the study:

1. Progressive disclosure of autonomy. Start agents in a supervised mode where they explain every action and request confirmation frequently. As the user gains familiarity with both the agent and the target interface, gradually reduce interruptions and expand autonomous action scope. This is not a new concept in UX design, but applying it to AI agents is.

2. Dual mental model support. Agent interfaces need to accommodate users who think of the agent as an assistant and users who think of it as a tool, sometimes within the same session. Practically, this means offering both proactive suggestions (assistant mode) and strict instruction-following (tool mode), with clear signals about which mode is active.

3. Seamful design for high-stakes actions. Rather than optimizing for invisible, frictionless execution in every scenario, intentionally create visible checkpoints for consequential actions. The cost of one extra confirmation dialog before a payment is negligible. The cost of an unauthorized payment is not.

The broader signal: the agent UX problem may be harder than the agent capability problem. Models will keep getting more capable. But KPMG’s finding that trust in AI has actually declined as usage increased shows that capability alone does not generate trust. The industry needs to solve for transparency and control with the same engineering rigor it applies to benchmarks and token throughput.

Related: Agentic AI Observability: Why It Is the New Control Plane

Frequently Asked Questions

What did Apple’s AI agent UX study find?

Apple’s February 2026 study “Mapping the Design Space of User Experience for Computer Use Agents” found that users prefer transparent AI agents over capable ones. Participants rated transparency, predictability, and the ability to intervene as more important than raw task-completion ability. The study tested 20 users with a Wizard-of-Oz methodology and identified four critical UX categories: user query, explainability, user control, and mental models.

Why do users prefer transparent AI agents?

Users prefer transparent AI agents because they need to understand what the agent is doing to trust it. The Apple study found that trust breaks down quickly when agents make silent assumptions or errors. Users want running commentary on agent actions, the ability to intervene at any point, and clear explanations for agent decisions, especially during high-stakes tasks like financial transactions or communications sent on their behalf.

What is the assistant vs. tool mental model in AI agents?

Apple’s study found users switch between two mental models. In the assistant model, users expect the agent to exercise discretion, handle ambiguity, and proactively suggest alternatives. In the tool model, users expect precise literal execution of instructions with no deviation. The same user often switches between these models depending on task complexity and stakes, meaning agent interfaces need to support both simultaneously.

How does user familiarity affect AI agent trust?

The Apple study found that familiarity with the interface the agent controls dramatically changes trust levels. Users unfamiliar with a website or application demanded maximum transparency, including confirmation pauses even for low-risk actions. Users who knew the interface well tolerated much more autonomous agent behavior. This means transparency requirements are not static and agents should adapt their confirmation frequency to user experience levels.

What is progressive disclosure of autonomy for AI agents?

Progressive disclosure of autonomy means starting AI agents in a supervised mode where they explain every action and request frequent confirmation, then gradually reducing interruptions as the user gains familiarity. Apple’s study supports this design approach because it found that user trust builds over time through positive interactions, and that experienced users want fewer interruptions while new users need more transparency.

Source