What's an agent, anyway?

An agent is an LLM running tools in a loop:

Simon Willison's one-liner — "An LLM agent runs tools in a loop to achieve a goal" — has become the closest thing to a consensus definition.
Harrison Chase (LangChain) said the same thing differently: "The core algorithm is actually the same — it's an LLM running in a loop calling tools."

So an agent has 3 ingredients:

An LLM
Tools
A loop.

Let's unpack each one.

What an LLM does

An LLM is a text-completion machine. You send it a chain of characters. It predicts the next most probable character, then the next, until it stops.

When you ask a question, the sequence of most probable next characters is likely to be a sentence that resembles an answer to your question.

An LLM can only produce text. It cannot browse the web. It cannot run a calculation using a program. It cannot read a file or call an API.

What a tool is

A tool gives an LLM capabilities it does not have natively. Tools enable:

Things LLM cannot do: access the internet, query a database, execute code.
Things LLM do badly: arithmetic, find exact-matches in a document...

The LLM cannot execute tools on its own though:

The LLM returns a structured object specifying which tool to call.
The host program runs the tool.
The host program passes the result back to the LLM.

How LLMs learned to call tools

We said that an LLM can only produce text. So how does it ask for calling a tool? Does it return a text saying "I need to run the calculator" or something like that?

To call a tool, the LLM returns a JSON object that says which tool it wants to run, and with which parameters.

For example, if the LLM wants to check the weather in Paris, instead of responding with text, it returns something like:

{
  "type": "tool_use",
  "name": "get_weather",
  "input": { "city": "Paris" }
}

But how did the LLM learn to generate such JSON objects as the most likely chain of characters in the middle of a conversation in plain English?

The original training data contains no tool-calling examples. Nobody writes "output a JSON object to invoke a calculator function" on the internet.

LLMs are specifically trained to learn when to use tools through fine-tuning on tool-use transcripts:

The models are trained on many examples of conversations where the assistant produces structured function invocations, receives results, and continues. OpenAI shipped this first commercially (June 2023, GPT-3.5/GPT-4), and other providers followed.
The model does not learn each specific tool. It learns the general pattern: when to invoke, how to format the call, how to integrate the result.
The specific tools available are described in the prompt — the model reads their names, descriptions, and parameter schemas as text.

Tool hallucination is a consequence of tool training. The model can generate calls to tools that were never provided, or fabricate parameters. UC Berkeley's Gorilla project (Berkeley Function-Calling Leaderboard) has documented this systematically — it is one reason agent frameworks invest in validation and error handling.

The two-step pattern

When you call an LLM with tools enabled, two things can happen:

The model responds with text — it has enough information to answer directly.
The model responds with a tool-call request — a structured object specifying which tool to call and what arguments to pass.

If the model requests a tool call, your code executes it. You send the result back as a follow-up message. The model uses that result to formulate its answer — or to request yet another tool call.

Tool use always involves at least two model calls.:

The first model call returns a tool call request
The second model call is provided the conversation + the result of the tool call.

messages = [system_prompt, user_message]

# LLM Call 1 — send the conversation + list of available tools
response = llm(messages, tools=available_tools)

# Did the model respond with text, or with a tool-call request?
if response.has_tool_call:
    tool_call = response.tool_call
    result = execute(tool_call.name, tool_call.arguments)
    messages.append(tool_call)
    messages.append(tool_result(tool_call.id, result))

    # LLM Call 2 — send the conversation again, now including the tool result
    response = llm(messages, tools=available_tools)

The agentic loop

Many tasks require more than one tool call. A coding assistant might read a file, edit it, run the tests, check output, fix a failing test — all in sequence. The model cannot know in advance how many steps it will need.

The solution: wrap the two-step pattern in a loop.

messages = [system_prompt, user_message]

loop:
    response = llm(messages, tools=available_tools)

    if response.has_tool_calls:
        for call in response.tool_calls:
            result = execute(call.name, call.arguments)
            messages.append(call)
            messages.append(tool_result(call.id, result))
        continue

    break  # no tool calls — done

print(response.text)

The model loops: calling tools, receiving results, deciding what to do next, until it produces text instead of another tool call.

In practice, you add guardrails: a maximum number of iterations, a cost budget, validation checks. But the core mechanism is the same.

What this looks like in practice. Here is a simplified trace of an agent booking a restaurant. Each block is one iteration of the loop:

User:      "Find me a good Italian restaurant near the office
            for Friday dinner, 4 people."

Agent:     [tool: search_web("Italian restaurants near 123 Main St")]
           → 3 results: Trattoria Roma, Pasta House, Il Giardino

Agent:     [tool: get_reviews("Trattoria Roma", "Pasta House", "Il Giardino")]
           → Trattoria Roma: 4.7★, Pasta House: 3.9★, Il Giardino: 4.5★

Agent:     [tool: check_availability("Trattoria Roma", friday, party=4)]
           → available at 7:30 PM and 8:00 PM

Agent:     "Trattoria Roma is the best rated (4.7★) and has two
            slots Friday for 4: 7:30 PM or 8:00 PM.
            Want me to book one?"

Four loop iterations. Three tool calls, then a text response that ends the loop. The agent decided which restaurants to look up, which one to check availability for first (the highest rated), and when it had enough information to stop. The program just ran the tools and passed results back.

What to keep in mind

An agent is an LLM + tools + a loop. Every agent framework — PydanticAI, LangGraph, Claude Agent SDK, OpenAI Agents SDK — implements some version of this loop. They differ in what they build around it.
Tool calling is a two-step pattern. The model requests, your code executes, the result feeds back.
The model decides when to stop. In the simplest case, it stops when it produces text instead of a tool call. But in real systems, stop conditions can be implemented based on budgets, validation, user acceptance, timeouts.