Tool Use & Function Calling

The bridge between language models and real-world actions. Tool use enables agents to interact with external systems, APIs, and databases.

What is Tool Use?

Tool use (also called function calling) allows language models to request the execution of external functions. Instead of just generating text, the model can:

  • Query databases and APIs
  • Perform calculations
  • Read and write files
  • Execute code
  • Interact with external services

Key Insight

Tools transform LLMs from text generators into agents that can take action in the world. The model decides when to use a tool, which tool to use, and what arguments to pass.

How Tool Calling Works

Tool Calling Flow
User Request
     │
     ▼
┌─────────────────────────────────────┐
│           Language Model            │
│   (with tool definitions loaded)    │
└─────────────────────────────────────┘
     │
     │ Model decides to call tool
     ▼
┌─────────────────────────────────────┐
│         Tool Call Response          │
│  { name: "get_weather",             │
│    arguments: { "location": "NYC" } │
│  }                                  │
└─────────────────────────────────────┘
     │
     │ Application executes tool
     ▼
┌─────────────────────────────────────┐
│          Tool Execution             │
│   get_weather("NYC") → result       │
└─────────────────────────────────────┘
     │
     │ Result sent back to model
     ▼
┌─────────────────────────────────────┐
│           Language Model            │
│    (generates final response)       │
└─────────────────────────────────────┘
     │
     ▼
Final Response to User

Basic Tool Execution

Simple Tool Calling Pattern
function executeWithTools(userRequest, tools):
    # Format tools for the model
    toolDefinitions = formatToolDefinitions(tools)

    # Send request with tool definitions
    response = llm.generate(
        messages: [userRequest],
        tools: toolDefinitions
    )

    # Check if model wants to call a tool
    if response.hasToolCall:
        toolName = response.toolCall.name
        toolArgs = response.toolCall.arguments

        # Execute the tool
        result = tools[toolName].execute(toolArgs)

        # Send result back to model for final response
        return llm.generate(
            messages: [userRequest, response, toolResult(result)]
        )

    return response.content
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langchain_core.messages import HumanMessage

# Define tools using the @tool decorator
@tool
def get_weather(location: str, unit: str = "celsius") -> str:
    """Get current weather for a location.

    Args:
        location: City name, e.g. 'San Francisco'
        unit: Temperature unit (celsius or fahrenheit)
    """
    # Implementation specific
    return weather_api.get_current(location, unit)

# Create LLM with tools bound
llm = ChatOpenAI(model="gpt-4")
llm_with_tools = llm.bind_tools([get_weather])

def execute_with_tools(user_message: str) -> str:
    messages = [HumanMessage(content=user_message)]

    # Get response (may include tool calls)
    response = llm_with_tools.invoke(messages)

    if response.tool_calls:
        # Execute each tool call
        messages.append(response)

        for tool_call in response.tool_calls:
            # LangChain routes to correct tool automatically
            result = get_weather.invoke(tool_call["args"])
            messages.append(
                ToolMessage(content=result, tool_call_id=tool_call["id"])
            )

        # Get final response with tool results
        return llm_with_tools.invoke(messages).content

    return response.content
using Microsoft.Agents.AI;
using Microsoft.Extensions.AI;
using System.ComponentModel;
using Azure.AI.OpenAI;
using Azure.Identity;

// Define tool as a function with descriptions
[Description("Get current weather for a location")]
static string GetWeather(
    [Description("City name")] string location,
    [Description("Temperature unit")] string unit = "celsius")
{
    return weatherService.GetCurrent(location, unit);
}

// Create agent with tools
AIAgent agent = new AzureOpenAIClient(
    new Uri("https://your-resource.openai.azure.com"),
    new AzureCliCredential())
    .GetChatClient("gpt-4o")
    .AsAIAgent(
        instructions: "You are a helpful weather assistant",
        tools: [AIFunctionFactory.Create(GetWeather)]
    );

// Run the agent - it automatically calls tools as needed
var result = await agent.RunAsync("What's the weather in Tokyo?");

Console.WriteLine(result);

Tool Definition Formats

Different LLM providers use slightly different formats for defining tools:

Provider Format Key Features
OpenAI JSON Schema Parallel calls, strict mode, function descriptions
Anthropic JSON Schema Tool use blocks, detailed descriptions encouraged
Google OpenAPI-style Function declarations with protobuf types
Open Models Varies Often use Hermes or ChatML format

Tool definition formats vary by provider but share common elements

Best Practice

Write detailed tool descriptions. Models use these descriptions to decide when to call a tool. Include examples of valid inputs and explain edge cases.

Parallel Tool Execution

Modern LLMs can request multiple tool calls simultaneously, dramatically reducing latency for complex tasks:

Executing Multiple Tools Concurrently
function executeParallelTools(userRequest, tools):
    response = llm.generate(
        messages: [userRequest],
        tools: tools,
        parallelToolCalls: true
    )

    if response.hasMultipleToolCalls:
        # Execute all tool calls concurrently
        results = parallel.map(response.toolCalls, (call) =>
            tools[call.name].execute(call.arguments)
        )

        # Collect all results
        toolResults = zip(response.toolCalls, results)

        return llm.generate(
            messages: [userRequest, response, ...toolResults]
        )

    return response.content
import asyncio
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool

@tool
async def get_weather(location: str) -> str:
    """Get current weather for a location."""
    return await weather_api.get_async(location)

@tool
async def search_web(query: str) -> str:
    """Search the web for information."""
    return await search_api.search_async(query)

@tool
async def get_stock_price(symbol: str) -> str:
    """Get current stock price."""
    return await stock_api.get_price_async(symbol)

# Bind multiple tools
llm = ChatOpenAI(model="gpt-4")
tools = [get_weather, search_web, get_stock_price]
llm_with_tools = llm.bind_tools(tools)

async def agent_with_parallel_tools(query: str) -> str:
    messages = [HumanMessage(content=query)]

    response = await llm_with_tools.ainvoke(messages)

    if response.tool_calls:
        messages.append(response)

        # Execute all tools in parallel
        tool_tasks = []
        for tool_call in response.tool_calls:
            tool_fn = next(t for t in tools if t.name == tool_call["name"])
            tool_tasks.append(tool_fn.ainvoke(tool_call["args"]))

        results = await asyncio.gather(*tool_tasks)

        # Add all results
        for tool_call, result in zip(response.tool_calls, results):
            messages.append(
                ToolMessage(content=result, tool_call_id=tool_call["id"])
            )

        return (await llm_with_tools.ainvoke(messages)).content

    return response.content
using Microsoft.Agents.AI;
using Microsoft.Extensions.AI;
using System.ComponentModel;

// Define multiple tools as functions
[Description("Get weather for a location")]
static async Task<string> GetWeather(string location)
    => await weatherService.GetAsync(location);

[Description("Search the web")]
static async Task<string> SearchWeb(string query)
    => await searchService.SearchAsync(query);

[Description("Get stock price")]
static async Task<string> GetStockPrice(string symbol)
    => await stockService.GetPriceAsync(symbol);

// Create agent with multiple tools
AIAgent agent = new AzureOpenAIClient(endpoint, credentials)
    .GetChatClient("gpt-4o")
    .AsAIAgent(
        instructions: "You are a helpful assistant",
        tools: [
            AIFunctionFactory.Create(GetWeather),
            AIFunctionFactory.Create(SearchWeb),
            AIFunctionFactory.Create(GetStockPrice)
        ]
    );

// Agent automatically handles parallel tool execution
var result = await agent.RunAsync(
    "What's the weather in NYC and Tokyo, and Apple's stock price?"
);

Consideration

Not all tool calls should be parallelized. If tools have dependencies (e.g., create record then update it), they must be executed sequentially.

Error Handling & Retries

Robust tool execution requires handling failures gracefully. The model should be informed of errors so it can try alternative approaches:

Resilient Tool Execution
function executeWithRetry(toolCall, maxRetries = 3):
    for attempt in range(maxRetries):
        try:
            result = tools[toolCall.name].execute(toolCall.args)
            return { success: true, data: result }
        catch error:
            if isRetryable(error) and attempt < maxRetries - 1:
                wait(exponentialBackoff(attempt))
                continue
            else:
                return {
                    success: false,
                    error: formatError(error)
                }

function agentLoop(userRequest):
    while not complete:
        response = llm.generate(messages, tools)

        if response.hasToolCall:
            result = executeWithRetry(response.toolCall)

            if not result.success:
                # Let the model know about the failure
                messages.append(toolError(result.error))
                # Model can try a different approach
            else:
                messages.append(toolResult(result.data))
        else:
            return response.content
from langchain_core.tools import tool, ToolException
from langchain_core.runnables import RunnableConfig
from tenacity import retry, stop_after_attempt, wait_exponential

# Tools can handle their own errors gracefully
@tool(handle_tool_error=True)
def search_database(query: str) -> str:
    """Search the database with automatic error handling."""
    try:
        return db.search(query)
    except DatabaseError as e:
        # Return error message instead of raising
        raise ToolException(f"Database error: {e}")

# Custom retry wrapper for tools
class RetryableTool:
    def __init__(self, tool_fn, max_retries=3):
        self.tool = tool_fn
        self.max_retries = max_retries

    @retry(
        stop=stop_after_attempt(3),
        wait=wait_exponential(multiplier=1, min=1, max=10)
    )
    def invoke(self, args: dict) -> str:
        return self.tool.invoke(args)

    def safe_invoke(self, args: dict) -> dict:
        try:
            result = self.invoke(args)
            return {"success": True, "data": result}
        except Exception as e:
            return {
                "success": False,
                "error": str(e),
                "error_type": type(e).__name__
            }

# Using LangGraph for agent with error recovery
from langgraph.prebuilt import create_react_agent

agent = create_react_agent(
    llm,
    tools,
    # Agent can see tool errors and try alternatives
    handle_tool_errors=True
)

# The agent will receive error messages and can adapt
result = agent.invoke({"messages": [("user", query)]})
using Microsoft.Agents.AI;
using Microsoft.Extensions.AI;
using System.ComponentModel;
using Polly;

// Create retry policy
var retryPolicy = Policy
    .Handle<HttpRequestException>()
    .Or<TimeoutException>()
    .WaitAndRetryAsync(
        3,
        attempt => TimeSpan.FromSeconds(Math.Pow(2, attempt)),
        onRetry: (ex, time, attempt, ctx) =>
            Console.WriteLine($"Retry {attempt}: {ex.Message}")
    );

// Define tool with built-in error handling
[Description("Search with automatic retry")]
async Task<string> SearchWithRetry(string query)
{
    try
    {
        return await retryPolicy.ExecuteAsync(async () =>
        {
            return await searchService.SearchAsync(query);
        });
    }
    catch (Exception ex)
    {
        // Return error info so agent can adapt
        return $"Error: {ex.Message}. Try a different approach.";
    }
}

// Create agent with resilient tools
AIAgent agent = new AzureOpenAIClient(endpoint, credentials)
    .GetChatClient("gpt-4o")
    .AsAIAgent(
        instructions: "You are a helpful assistant",
        tools: [AIFunctionFactory.Create(SearchWithRetry)]
    );

// Agent receives error messages and can adapt its approach
var result = await agent.RunAsync(prompt);

Trade-offs & Approaches

Approach Pros Cons
Static tool list Simple, predictable, easy to test Context bloat with many tools
Dynamic discovery Scales to many tools Additional latency, complexity
Tool clustering Balance of both approaches Routing logic complexity
Skills pattern Massive token savings (98%+) Requires filesystem access

Related Topic

For large tool libraries (50+ tools), see the Skills Pattern which can reduce context usage from 150K to 2K tokens.

Evaluation Approach

Measuring tool use quality requires evaluating multiple dimensions:

Metric What it Measures How to Calculate
Tool Selection Accuracy Did the model pick the right tool? Compare selected tool vs ground truth
Argument Accuracy Were the parameters correct? Exact match or semantic similarity of args
Unnecessary Calls Did model call tools when not needed? Count tool calls for simple queries
Latency Impact Time added by tool execution End-to-end time minus model inference
Error Recovery Can model handle tool failures? Success rate after simulated failures

Key metrics for evaluating tool calling capabilities

Evaluation Frameworks

  • DeepEval - ToolCorrectnessMetric, ArgumentCorrectnessMetric
  • ToolBench - Large-scale tool use benchmark
  • API-Bank - API call evaluation dataset
  • Custom datasets - Domain-specific tool calling tests

Common Pitfalls

Insufficient Tool Descriptions

Vague descriptions lead to incorrect tool selection. Always include examples and edge cases in your tool definitions.

Missing Error Information

When a tool fails, send the error details back to the model. Without this, the model can't reason about what went wrong.

Unbounded Tool Loops

Always set a maximum iteration limit. Without it, agents can get stuck in infinite tool-calling loops.

Security: Unvalidated Arguments

Never pass tool arguments directly to system commands. Validate and sanitize all inputs to prevent injection attacks.

Related Topics