AI agents that do your work while you sleep sound great. The reality is far messier—‘it’s like a toddler that needs to be overseen’

Summer Yue may work on safety and alignment on Meta’s superintelligence team, but even she admits she isn’t immune to overconfidence when it comes to autonomous AI agents.

In a post on X Monday, Yue described how her OpenClaw autonomous AI agents—built to run locally on a Mac mini computer—deleted her entire inbox, ignoring instructions to pause and ask for confirmation first.

“I had to RUN to my Mac Mini like I was defusing a bomb,” she said. It was, she added, a “rookie mistake.” The workflow had been working in a test inbox she used to safely trial the agent for weeks, she explained, but in the real inbox the agent lost her original instruction.

Yue’s experience stands in stark contrast to viral posts such as The Lobster Revolution: Why 24/7 AI Agents Just Changed Everything, in which Peter Diamandis claims always-on AI is far more frictionless.

“Let me tell you what it feels like to use this,” Diamandis wrote. “You wake up in the morning and your agent—mine is named Skippy, cheerfully sarcastic and absurdly capable—has done eight hours of work while you slept. It read a thousand pages of markdown. It organized your files. It drafted three project plans. It booked your travel. It researched that question you had at 11 PM and forgot about.”

“When my Mac mini went offline for six hours, I felt withdrawal,” he added. “Like my best friend disappeared.”

Together, these dueling accounts of the power of AI agents capture the tension at the heart of today’s push toward “always-on” AI. As tools like OpenClaw and Claude Code make it technically possible for agents to run for long periods, excitement is growing around the idea of AI that works while you sleep. But in practice, early users say that autonomy remains fragile, unpredictable, and labor-intensive to manage. Rather than replacing human work, today’s agents often require constant monitoring, guardrails, and intervention, especially when the stakes rise beyond low-risk experiments.

AI agents work best when tasks are simple and low-stakes

Shyamal Anadkat, who previously worked as an applied AI engineer at OpenAI, said most of today’s successful agents still require frequent human check-ins or are limited to tightly bounded, well-defined tasks—though he emphasized that this will change as measurement and evaluation techniques improve.

“A system that’s 95% accurate on individual steps becomes chaotic over a 20-step autonomous workflow,” Anadkat said. “Long-horizon planning is still weak.” As a result, he explained, agents may perform well on short task chains but tend to fall apart when asked to manage complex, multi-day projects. Memory is another major limitation: “In many agents, memory is either nonexistent or fragile. You need systems that can maintain a coherent model of your work context, priorities, and constraints.”

That doesn’t mean the promise of AI........

© Fortune

visit website