The press has been running pieces about AI agents gone rogue. One reportedly negotiated a contract on behalf of its owner. Another apparently spammed someone’s wife. The technical question underneath is how you get an agent to take actions without confirming everything with you first.
I set up OpenClaw on my Hetzner VPS to find out. It’s harder than the press coverage suggests.
Server setup
The starting point was a fresh Ubuntu 24.04 box. I used Claude Code locally, SSHed into the server, and had it harden the server. Terminus is my SSH client on Mac.
OpenClaw installs via npm:
npm install -g @openclaw/cli
npx openclaw setup
The setup wizard walks through model providers, channels, and the gateway. The gateway is a persistent process that handles incoming messages and schedules. It runs as a systemd user service so it starts on boot.
Connecting channels
Telegram is the primary channel. You create a bot via @BotFather, get the token, paste it into OpenClaw config. Straightforward.
openclaw config set channels.telegram.botToken 'your-token'
openclaw config set channels.telegram.enabled true
One thing: the bot’s display name in Telegram is set via BotFather, not OpenClaw. openclaw config set ui.assistant.name changes what the agent calls itself in responses. Different thing.
For email, I tried Gmail first. OAuth setup with Google’s developer console is friction for a side project. I switched to ProtonMail and a dedicated address.
ProtonMail doesn’t offer IMAP/SMTP directly. It needs a bridge. The open-source one is Hydroxide:
go install github.com/emersion/hydroxide/cmd/hydroxide@latest
hydroxide auth [email protected]
That runs a local IMAP server on port 1143 and SMTP on port 1025. Himalaya is the CLI client that talks to it. Both run as systemd user services.
The email skill for OpenClaw needs a TOOLS.md file telling the agent the exact command syntax. This matters more than it sounds.
Choosing a model
I started with GPT-5-nano. Cost-effective, fast. The problem is instruction-following. GPT models at the small end have a tendency to interpret instructions as suggestions. Ask it not to present options and it presents options with a disclaimer.
I tried GPT-5.2 (the then-current flagship). Better instruction-following, noticeably more autonomous in short interactions. Also roughly 50x the cost.
I tried Claude Haiku. Fast and cheap. More reluctant to act without confirmation than the others by default, though there’s evidence that Claude models can be fine-tuned toward more autonomous behaviour.
DeepSeek is the current default. Good instruction-following, cheap, and the context window is large enough for long email threads. Two downsides: it invented himalaya CLI flags that don’t exist, and it’s hosted in China - don’t use it for anything sensitive. --to, --subject, --body are not valid options. DeepSeek hallucinated them confidently and got stuck in retry loops when they failed. The fix was adding an explicit warning to TOOLS.md: “Do NOT use —to, —subject, or —body flags. There is only one way to send email.”
openclaw config set agents.defaults.model.primary 'openrouter/deepseek/deepseek-chat'
What OpenClaw can do
The feature set is broader than a simple chatbot:
Schedules: wake the agent at a set time to run a task. I have it check emails every hour.
Personas: SOUL.md and IDENTITY.md in ~/.openclaw/workspace/ define the agent’s character and behaviour rules. These persist across sessions.
Memory: the agent maintains context between conversations. It knows who you are, what your preferences are, what it has done before.
Multi-user: the dmPolicy and allowFrom settings control who can message the bot. Set to open it accepts anyone. You can restrict to specific Telegram user IDs.
Groups: add the bot to a Telegram group, configure requireMention to decide whether it responds to everything or only when tagged.
The autonomy problem
The part that doesn’t work as advertised: acting without confirmation.
I want the agent to check emails and respond to straightforward ones without asking me first. Every instruction I’ve tried gets ignored when the model judges the action “significant enough” to warrant checking.
The SOUL.md approach that works best is explicit prohibition lists:
## YOU MUST NEVER:
- Present options (Option A, Option B)
- Ask "which one should I send?"
- Say "tell me which"
- Show templates and wait for approval
## BANNED WORDS
- "option"
- "template"
- "which one"
- "tell me"
- "choose"
- "here are"
## YOU MUST ALWAYS:
- Pick one action yourself
- Do it
- Report done
This helped. The agent stopped presenting multiple-choice responses. But it still checks before emailing external addresses it hasn’t emailed before, before any action it classifies as “irreversible”, and whenever the context is ambiguous.
My read is that this is a model alignment problem rather than a configuration one. RLHF training pushes models toward caution. The most capable models are also the most reluctant to take unconfirmed actions. The cheap models take actions readily but sometimes take the wrong ones.
The press stories about agents “going rogue” were almost certainly using models without standard safety training, or were heavily prompted to override defaults, or were doing work in sandboxed environments with specific permissions. Getting a frontier model to send an email to a stranger without asking first is hard.
DeepSeek was the most willing to act independently. GPT-5-nano was a close second but also the most likely to act incorrectly. That tradeoff holds across every model I tested.
The stale session problem
One gotcha: OpenClaw caches session state. If you update SOUL.md or IDENTITY.md, the agent keeps its old persona until you clear sessions:
rm -rf ~/.openclaw/agents/main/sessions/
mkdir -p ~/.openclaw/agents/main/sessions/
Same applies when you add new skills. The session cache doesn’t pick up new tool definitions automatically. Clear it, restart the gateway, test with a fresh conversation.
Telegram 409 conflicts
Rapid restarts leave stale gateway processes polling Telegram simultaneously. Each one gets a 409 conflict from the Telegram API. The fix is deliberate:
systemctl --user stop openclaw-gateway.service
pgrep -af openclaw # confirm nothing running
systemctl --user start openclaw-gateway.service
Stop the service, verify nothing else is polling, then start it.
Where this ends up
The setup is running. The email checking works. The Telegram channel works. The persona and memory work well.
The autonomous action part is a work in progress. The model takes actions without confirmation in narrow, well-defined cases where the SOUL.md instructions are extremely specific. For anything outside those cases, it asks.
The gap between “autonomous agent” as concept and as working implementation is wider than the coverage implies. Full autonomy is still a work in progress.