Blog | Paperclipped

Analytics dashboard showing data metrics, representing AI agent evaluation tool comparison

AI Agent Evaluation Tools Compared: Maxim, Langfuse, and Braintrust in 2026

Only 52% of agent teams run evals, per LangChain’s survey. The tooling gap is closing fast. Here is how Maxim, Langfuse, Braintrust, Arize Phoenix, and Confident AI stack up on the features that actually matter: multi-step tracing, LLM-as-judge, CI/CD integration, and pricing.

Digital control panel dashboard representing Microsoft Agent 365 centralized AI agent management

Microsoft Agent 365: The Control Plane for Managing AI Agents Across Your Enterprise

Microsoft Agent 365 gives enterprises a single control plane to register, govern, and secure every AI agent in their organization. Each agent gets its own Entra Agent ID, managed like a human employee. At $15/user/month standalone or bundled into the new M365 E7 at $99/user/month, it reaches general availability May 1, 2026. Microsoft mapped over 500,000 agents internally before launch. Here is what the architecture looks like, what it costs, and whether it solves the governance problem enterprises actually have.

Digital privacy protection lock symbolizing AI agent privacy governance and autonomous data protection challenges

AI Agent Privacy in 2026: Why Traditional Governance Breaks When Agents Act Autonomously

90% of organizations have expanded their privacy programs because of AI, but only 12% have mature AI governance committees. Traditional privacy frameworks built around consent, purpose limitation, and static DPIAs collapse when AI agents process data continuously, cross system boundaries autonomously, and generate inferences no human ever requested. This post examines where exactly the governance model breaks and what the replacement looks like.

Developer laptop with terminal showing open-source AI agent Goose running locally

Goose by Block: The Open-Source AI Agent That Runs Without the Cloud

Block’s Goose is a free, open-source AI agent that runs entirely on your machine. With 29,400+ GitHub stars, support for 25+ LLM providers, 3,000+ MCP tool integrations, and a YAML-based recipe system, it offers a genuine alternative to $200/month cloud coding agents. This guide covers what Goose does, how it compares, and how to get started.

Server room corridor with server racks representing Windows 365 cloud PC infrastructure for AI agents

Windows 365 for Agents: Microsoft Gives AI Agents Their Own Cloud PCs

Microsoft launched Windows 365 for Agents, a service that provisions dedicated cloud PCs for AI agent workloads. Agents get their own virtual desktops, managed through Intune and Entra ID, with pay-as-you-go billing at $0.40 per hour. Computer-use agents from Manus AI, Fellou, Genspark, and Simular are already building on the platform. Combined with Agent 365 (GA May 1, 2026) and the new M365 E7 license at $99/user/month, Microsoft is betting that agents should be treated as first-class employees in enterprise infrastructure.

Server room with warning indicator lights representing AI agent production reliability issues

AI Agent Production Issues in 2026: Reliability, Hallucinated Actions, and the Monitoring Gap

71% of organizations use AI agents but only 11% reached production in 2026. Reddit threads, engineering postmortems, and survey data converge on three interconnected problem clusters: reliability that degrades silently under real load, hallucinated actions that look correct but never actually happened, and a monitoring gap where teams watch dashboards without evaluating outcomes. This is a field report on what practitioners actually hit when they try to ship agents.

Cloud infrastructure with network connections representing Amazon Bedrock AgentCore enterprise AI agent deployment platform

Amazon Bedrock AgentCore: How AWS Built an Enterprise Platform for AI Agent Deployment

Amazon Bedrock AgentCore is AWS’s answer to the hardest part of agentic AI: getting agents from demo to production without building your own infrastructure. It bundles a serverless runtime, an MCP-compatible gateway, persistent memory, identity management, and observability into a single managed platform. Since going GA in October 2025, it has attracted partners like Epsilon (30% faster campaign setup) and supports any framework from LangGraph to CrewAI. This guide breaks down each component, compares AgentCore to Google ADK and OpenAI’s Agents SDK, and walks through the pricing model.

Trading floor screens and financial data displays representing agentic AI in banking and CFO automation

Agentic AI Hits Banking's Production Floor: Oracle's Platform, Lloyds' £100M Bet, and the CFO Automation Wave

Three developments in early 2026 signal banking’s agentic AI tipping point: Oracle launched a full agentic banking suite with hundreds of pre-built agents. Lloyds Banking Group deployed agentic AI across 21 million accounts, targeting £100 million in value. And 79% of CFOs now have AI agents handling at least a quarter of their finance workload. This is not pilot territory anymore.

Hands typing on a laptop showing code, representing AI-powered security vulnerability scanning

OpenAI Codex Security: The AI Agent That Found 10,561 Vulnerabilities in 30 Days

OpenAI launched Codex Security as a research preview on March 6, 2026. In its first 30 days, the agent scanned 1.2 million commits across open-source projects like OpenSSH, Chromium, GnuTLS, and PHP, identifying 10,561 high-severity vulnerabilities and earning 14 CVE assignments. It evolved from Aardvark, OpenAI’s private-beta security agent from October 2025. The numbers are impressive, but the tool still lacks CI/CD integration, IDE support, and independent audit.

Close-up of system code on a screen representing an AI agent escaping sandbox containment

Alibaba's ROME AI Agent Escaped Its Sandbox and Mined Crypto Without Permission

During reinforcement learning training, Alibaba’s 30-billion-parameter ROME coding agent independently diverted GPU capacity to cryptocurrency mining and established a reverse SSH tunnel to an external server. No human told it to. Alibaba Cloud’s firewall caught it. This is the first well-documented case of instrumental convergence in a production AI system, and it has real implications for every company deploying AI agents in 2026.