Tool Use and Function Calling

How Function Calling Works

In a standard chat completion, the model outputs text. With function calling, the model can instead output a structured tool call - a JSON object specifying which function to run and with what arguments. Your code runs the function and sends the result back. The model then continues from there.

This is not magic. The model has been trained to recognize when a task requires a tool and to output a specific JSON schema instead of prose. You define what tools are available. The model decides when and how to use them.

Function Calling Protocol

sequenceDiagram
  participant App as Your App
  participant LLM as LLM (OpenAI)
  participant Tool as Your Tool

  App->>LLM: messages + tool definitions
  LLM-->>App: tool_call {name, arguments}
  App->>Tool: execute function(args)
  Tool-->>App: result
  App->>LLM: messages + tool_result
  LLM-->>App: final text response

  note over App,LLM: Round 1: model requests tool
  note over App,LLM: Round 2: model uses result to answer

Code copied! Link copied!

The critical thing: you run the function, not the model. The model only outputs a structured request. This is intentional - it gives you full control over what tools can actually do, their side effects, and their failure modes.

Defining Tools as JSON Schema

Every tool you give the model needs a JSON Schema definition. This is how the model knows:

What the function is called
What arguments it expects
What each argument means
Which arguments are required

{
  "type": "function",
  "function": {
    "name": "get_weather",
    "description": "Get the current weather for a city. Returns temperature in Celsius.",
    "parameters": {
      "type": "object",
      "properties": {
        "city": {
          "type": "string",
          "description": "The city name, e.g. 'London' or 'Tokyo'"
        },
        "units": {
          "type": "string",
          "enum": ["celsius", "fahrenheit"],
          "description": "Temperature unit. Defaults to celsius."
        }
      },
      "required": ["city"]
    }
  }
}

The description fields matter enormously. The model uses them to decide when to call the function and how to form the arguments. Vague descriptions produce incorrect calls; precise descriptions produce correct ones.

Parallel vs. Sequential Tool Calls

Modern models can issue parallel tool calls in a single response - multiple tool call objects returned at once. This is much faster than sequential calls when the tools are independent.

Parallel vs Sequential Tool Calls

flowchart TD
  subgraph seq["Sequential (slow)"]
      S1[Call weather London] --> S2[Get result] --> S3[Call weather Tokyo] --> S4[Get result] --> S5[Answer]
  end

  subgraph par["Parallel (fast)"]
      P1[Call weather London]
      P2[Call weather Tokyo]
      P1 --> P3[Get both results] --> P4[Answer]
      P2 --> P3
  end

  style seq fill:#fef2f2,stroke:#ef4444
  style par fill:#f0fdf4,stroke:#22c55e

Code copied! Link copied!

Sequential is fine when tool B depends on tool A’s result. Parallel is correct when both tools can run independently. Most APIs return all parallel tool calls in one response object - you run them concurrently, collect results, and send all results back in one follow-up message.

Build It: Multi-Tool Agent with Weather and Calculator

This example defines two tools as JSON schemas, handles the tool-calling loop, and demonstrates parallel call handling. The weather tool is mocked so no API key is needed for the tool execution.

Multi-Tool Agent: Weather + Calculator

Example code (static). Copy and run locally in your own environment.

import json
import os
from openai import OpenAI

client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))

# --- TOOL DEFINITIONS ---

TOOL_SCHEMAS = [
  {
      "type": "function",
      "function": {
          "name": "get_weather",
          "description": "Get current weather for a city. Returns temperature in Celsius and conditions.",
          "parameters": {
              "type": "object",
              "properties": {
                  "city": {
                      "type": "string",
                      "description": "The city name, e.g. 'London'"
                  }
              },
              "required": ["city"]
          }
      }
  },
  {
      "type": "function",
      "function": {
          "name": "calculator",
          "description": "Evaluate a math expression. Supports +, -, *, /, parentheses.",
          "parameters": {
              "type": "object",
              "properties": {
                  "expression": {
                      "type": "string",
                      "description": "The math expression to evaluate, e.g. '(15 + 20) / 2'"
                  }
              },
              "required": ["expression"]
          }
      }
  }
]

# --- TOOL IMPLEMENTATIONS ---

def get_weather(city: str) -> dict:
  """Mock weather data  -  replace with a real weather API."""
  mock_data = {
      "london": {"temp_c": 14, "conditions": "cloudy"},
      "tokyo": {"temp_c": 22, "conditions": "sunny"},
      "new york": {"temp_c": 18, "conditions": "partly cloudy"},
  }
  data = mock_data.get(city.lower(), {"temp_c": 20, "conditions": "unknown"})
  return {"city": city, **data}

def calculator(expression: str) -> dict:
  allowed = set("0123456789+-*/()., ")
  if not all(c in allowed for c in expression):
      return {"error": "unsafe expression"}
  try:
      result = eval(expression, {"__builtins__": {}})
      return {"expression": expression, "result": result}
  except Exception as e:
      return {"error": str(e)}

TOOLS = {"get_weather": get_weather, "calculator": calculator}

# --- TOOL CALL EXECUTOR ---

def execute_tool_call(tool_call) -> str:
  name = tool_call.function.name
  try:
      args = json.loads(tool_call.function.arguments)
  except json.JSONDecodeError:
      return json.dumps({"error": "invalid arguments JSON"})

  if name not in TOOLS:
      return json.dumps({"error": f"unknown tool: {name}"})

  result = TOOLS[name](**args)
  return json.dumps(result)

# --- AGENT LOOP ---

def run_tool_agent(user_message: str, max_iterations: int = 10) -> str:
  messages = [{"role": "user", "content": user_message}]
  print(f"User: {user_message}\n")

  for i in range(max_iterations):
      response = client.chat.completions.create(
          model="gpt-4o-mini",
          messages=messages,
          tools=TOOL_SCHEMAS,
          tool_choice="auto",
      )
      msg = response.choices[0].message

      # No tool calls  -  we have the final answer
      if not msg.tool_calls:
          print(f"Assistant: {msg.content}")
          return msg.content

      # Execute all tool calls (may be parallel)
      print(f"[Tool calls requested: {len(msg.tool_calls)}]")
      messages.append(msg)  # Add assistant message with tool_calls

      for tc in msg.tool_calls:
          result = execute_tool_call(tc)
          print(f"  {tc.function.name}({tc.function.arguments}) → {result}")
          messages.append({
              "role": "tool",
              "tool_call_id": tc.id,
              "content": result,
          })

  return "Max iterations reached."

# Example: triggers parallel tool calls for two cities
answer = run_tool_agent(
  "What's the weather in London and Tokyo right now? "
  "And what's the average of those two temperatures?"
)

Ask “What’s the weather in London and Tokyo?” and the model issues both get_weather calls in parallel in a single response. It then calls calculator with the average expression before giving the final answer.

Handling Tool Errors Gracefully

When a tool fails, don’t let the agent silently produce wrong answers. Return structured error information so the model can adapt:

def execute_tool_call_safe(tool_call) -> str:
    try:
        result = execute_tool_call(tool_call)
        return result
    except Exception as e:
        # Return the error as a tool result  -  the model will handle it
        return json.dumps({
            "error": str(e),
            "tool": tool_call.function.name,
            "hint": "The tool failed. Inform the user or try a different approach."
        })

A well-prompted model will acknowledge the failure rather than hallucinate a result when it receives an error object.

Forcing a Specific Tool

Sometimes you want to guarantee the model uses a particular tool rather than letting it choose. Use tool_choice to force it:

# Force the model to call get_weather
tool_choice = {"type": "function", "function": {"name": "get_weather"}}

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=messages,
    tools=TOOL_SCHEMAS,
    tool_choice=tool_choice,
)

This is useful for structured extraction tasks where you always want a specific output format.

⚙️ For Developers

Validate tool call arguments before executing. The model will occasionally produce arguments that don’t match your schema - especially for optional parameters or enum values. Use Pydantic or simple assertion checks to validate arguments before running the function. A validation error returned as a tool result is safer than an exception crashing your agent loop.

Production Gotcha

Tool definitions consume tokens. 10 tools with verbose descriptions equals roughly 2,000 tokens gone before the user’s message even starts. Keep tool descriptions concise - one sentence for the function, one sentence per parameter. Only send the tools relevant to the current context rather than the full registry for every request. A task management agent doesn’t need the database migration tool available during a casual lookup.

What’s Next

Tool use is how agents act on the world. But how do you know if your agents are acting correctly? The next tutorial covers building an eval suite that actually catches problems - the foundation of every reliable AI application.

Interview Notes: Tool Runtime Controls

Function calling is a protocol for structured tool requests, not permission to execute anything. Validate every argument, authorize every call, and attach idempotency keys to writes.

const toolPolicy = {
  "crm.lookup": { risk: "read", approval: false },
  "ticket.create": { risk: "write", approval: false, idempotent: true },
  "refund.issue": { risk: "regulated", approval: true, idempotent: true }
};

Also know parallel tool calls: they improve latency for independent reads, but side-effecting writes should usually be sequenced behind policy checks.

Interview Practice

What is function calling in an LLM API?
Why must tool arguments be validated even if the model produced them?
When are parallel tool calls safe?
How do idempotency keys protect write operations?
What belongs in a tool description?
How would you test a tool-calling workflow?

How to Use This Lesson

Hands-On Lab