Two tools to rule them all

The tools provided to an agent encode assumptions about how things should be done. A search_web → get_reviews → check_availability pipeline implies a specific strategy. It limits the ability of the model to figure out how to reach the goal.

Bash and the file system in contrast are universal tools that Agent SDKs have made a choice to consider a given. In this part, I'll look into why and how those tools change the game.

The limits of predefined tools

Tools define what the agent can do. If you give it search_web, read_file, and send_email, those are its capabilities. Nothing more.

Every capability must be anticipated and implemented in advance:

Want the agent to compress a file? You need a compress_file tool.
Want it to resize an image? You need a resize_image tool.
Want it to check disk space, parse a CSV, or ping a server? Each one requires a tool.

Even slight changes in the task require updating the tool set. Say you built a send_email(to, subject, body) tool. Now the user wants to attach a file — you need an attachments parameter. Then they want to CC someone — another parameter. Each small requirement change means updating the tool's schema and implementation.

Designing an effective tool list is a hard balance to strike. Anthropic's guidance on tool design puts it directly: "Too many tools or overlapping tools can distract agents from pursuing efficient strategies." But too few tools, or tools that are too narrow, can prevent the agent from solving the problem at all.

Bash as the universal tool

Bash is the Unix shell: a command-line interface that has been around since 1989

It is the standard way to interact with Unix-like systems (Linux, macOS). You type commands, the shell executes them, you see the output.

Consider a task like: "find all log files from this week, check which ones contain errors, and count the number of errors in each."

With predefined tools, you would need list_files with date filtering, search_file to find matches, count_matches per file — three separate tools, plus the logic to combine the results.
With bash: 3 commands. No tool definitions, no schema changes if the task evolves.

# Find log files from the last 7 days
find . -name "*.log" -mtime -7

# Which ones contain errors
grep -l "ERROR" $(find . -name "*.log" -mtime -7)

# Count errors in each
for f in $(find . -name "*.log" -mtime -7); do
  echo "$f: $(grep -c 'ERROR' "$f") errors"
done

Why does bash matter for agents?

Bash scripts can replace specialized tools:

Giving an agent bash access is giving it access to the entire Unix environment: file operations, network requests, text processing, program execution
And the ability to combine them in ways you did not anticipate.

Vercel achieved 100% success rate. 3.5x faster. 37% fewer tokens :

Their text-to-SQL agent d0 had 17 specialized tools (query builders, schema inspectors, result formatters) and achieved an 80% success rate.
Then they "deleted most of it and stripped the agent down to a single tool: execute arbitrary bash commands."
The result: one general-purpose tool outperformed seventeen specialized ones.

Bash is not just more flexible — it is also faster.

Each tool call means an additional inference. Calling a lot of tools is expensive:

Remember the two-step pattern: the model requests a tool call, the system executes it, the result feeds back.
For a task requiring 10 tool calls, that is 10 inference passes.

With bash, the agent can write a script that chains multiple operations together and save on intermediate inferences:

The CodeAct research paper (ICML 2024) found code-based actions achieved up to 20% higher success rates than JSON-based tool calls.
Manus adopted a similar approach from their launch using fewer than 20 atomic functions, and offload the real work to generated scripts running inside a sandbox.
Anthropic and Cloudflare's Code Mode experiment confirmed that writing code beats tool calling

The filesystem as the universal persistence layer

To persist an information, a user-facing artifact, a plan or intermediate results, an agent needs a tool and a storage mechanism.

Predefined persistence tools have the same problem as predefined action tools:

A save_note(title, content) tool works for text notes. But what about images? JSON structures? Binary files? A directory of related files?
The tool's schema defines and limits what can be stored. Each storage mechanism has its own interface, its own constraints.

The filesystem has no predefined schema or constraints:

A file can contain anything: Markdown, JSON, images, binaries, code. A directory can organize files however makes sense.
The agent decides where to put it, what to write, what to name it, how to structure it.

The filesystem allows the agent to communicate with itself:

The agent can store information that it may need further down the road. Manus describes this as "File System as Extended Memory": "unlimited in size, persistent by nature, and directly operable by the agent itself."
The filesystem also allows the agent to share memories between sessions, removing the need for elaborate memorization / retrieval tools.

What to keep in mind

Bash is a universal tool. Instead of anticipating every capability and implementing a specific tool, you give the agent access to the Unix environment. It can compose arbitrary operations from basic primitives — and LLMs are already trained on how to do this.
The filesystem is universal persistence. Instead of defining schemas for what the agent can store, you give it a directory. It can write any file type, organize however makes sense, and the files persist across sessions for free.
All major agent SDKs assume both. The Claude Agent SDK, OpenCode, and Codex all ship bash and filesystem tools as built-in. Pi SDK is a notable exception — it can work without filesystem access.
This has architectural consequences. Bash and filesystem access require a runtime that provides them.
An alternative is emerging: reimplement the interpreter. Vercel's just-bash is a bash interpreter written in TypeScript: 75+ Unix commands reimplemented with a virtual in-memory filesystem. No real shell, no real filesystem, no container needed. Pydantic's monty does the same for Python: a subset interpreter written in Rust, where open(), subprocess, and exec() simply do not exist.