Tool Use & Function Calling
The bridge between language models and real-world actions. Tool use enables agents to interact with external systems, APIs, and databases.
What is Tool Use?
Tool use (also called function calling) allows language models to request the execution of external functions. Instead of just generating text, the model can:
- Query databases and APIs
- Perform calculations
- Read and write files
- Execute code
- Interact with external services
Key Insight
How Tool Calling Works
User Request
│
▼
┌─────────────────────────────────────┐
│ Language Model │
│ (with tool definitions loaded) │
└─────────────────────────────────────┘
│
│ Model decides to call tool
▼
┌─────────────────────────────────────┐
│ Tool Call Response │
│ { name: "get_weather", │
│ arguments: { "location": "NYC" } │
│ } │
└─────────────────────────────────────┘
│
│ Application executes tool
▼
┌─────────────────────────────────────┐
│ Tool Execution │
│ get_weather("NYC") → result │
└─────────────────────────────────────┘
│
│ Result sent back to model
▼
┌─────────────────────────────────────┐
│ Language Model │
│ (generates final response) │
└─────────────────────────────────────┘
│
▼
Final Response to User Basic Tool Execution
function executeWithTools(userRequest, tools):
# Format tools for the model
toolDefinitions = formatToolDefinitions(tools)
# Send request with tool definitions
response = llm.generate(
messages: [userRequest],
tools: toolDefinitions
)
# Check if model wants to call a tool
if response.hasToolCall:
toolName = response.toolCall.name
toolArgs = response.toolCall.arguments
# Execute the tool
result = tools[toolName].execute(toolArgs)
# Send result back to model for final response
return llm.generate(
messages: [userRequest, response, toolResult(result)]
)
return response.content from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
from langchain_core.messages import HumanMessage
# Define tools using the @tool decorator
@tool
def get_weather(location: str, unit: str = "celsius") -> str:
"""Get current weather for a location.
Args:
location: City name, e.g. 'San Francisco'
unit: Temperature unit (celsius or fahrenheit)
"""
# Implementation specific
return weather_api.get_current(location, unit)
# Create LLM with tools bound
llm = ChatOpenAI(model="gpt-4")
llm_with_tools = llm.bind_tools([get_weather])
def execute_with_tools(user_message: str) -> str:
messages = [HumanMessage(content=user_message)]
# Get response (may include tool calls)
response = llm_with_tools.invoke(messages)
if response.tool_calls:
# Execute each tool call
messages.append(response)
for tool_call in response.tool_calls:
# LangChain routes to correct tool automatically
result = get_weather.invoke(tool_call["args"])
messages.append(
ToolMessage(content=result, tool_call_id=tool_call["id"])
)
# Get final response with tool results
return llm_with_tools.invoke(messages).content
return response.content using Microsoft.Agents.AI;
using Microsoft.Extensions.AI;
using System.ComponentModel;
using Azure.AI.OpenAI;
using Azure.Identity;
// Define tool as a function with descriptions
[Description("Get current weather for a location")]
static string GetWeather(
[Description("City name")] string location,
[Description("Temperature unit")] string unit = "celsius")
{
return weatherService.GetCurrent(location, unit);
}
// Create agent with tools
AIAgent agent = new AzureOpenAIClient(
new Uri("https://your-resource.openai.azure.com"),
new AzureCliCredential())
.GetChatClient("gpt-4o")
.AsAIAgent(
instructions: "You are a helpful weather assistant",
tools: [AIFunctionFactory.Create(GetWeather)]
);
// Run the agent - it automatically calls tools as needed
var result = await agent.RunAsync("What's the weather in Tokyo?");
Console.WriteLine(result); Tool Definition Formats
Different LLM providers use slightly different formats for defining tools:
| Provider | Format | Key Features |
|---|---|---|
| OpenAI | JSON Schema | Parallel calls, strict mode, function descriptions |
| Anthropic | JSON Schema | Tool use blocks, detailed descriptions encouraged |
| OpenAPI-style | Function declarations with protobuf types | |
| Open Models | Varies | Often use Hermes or ChatML format |
Tool definition formats vary by provider but share common elements
Best Practice
Parallel Tool Execution
Modern LLMs can request multiple tool calls simultaneously, dramatically reducing latency for complex tasks:
function executeParallelTools(userRequest, tools):
response = llm.generate(
messages: [userRequest],
tools: tools,
parallelToolCalls: true
)
if response.hasMultipleToolCalls:
# Execute all tool calls concurrently
results = parallel.map(response.toolCalls, (call) =>
tools[call.name].execute(call.arguments)
)
# Collect all results
toolResults = zip(response.toolCalls, results)
return llm.generate(
messages: [userRequest, response, ...toolResults]
)
return response.content import asyncio
from langchain_openai import ChatOpenAI
from langchain_core.tools import tool
@tool
async def get_weather(location: str) -> str:
"""Get current weather for a location."""
return await weather_api.get_async(location)
@tool
async def search_web(query: str) -> str:
"""Search the web for information."""
return await search_api.search_async(query)
@tool
async def get_stock_price(symbol: str) -> str:
"""Get current stock price."""
return await stock_api.get_price_async(symbol)
# Bind multiple tools
llm = ChatOpenAI(model="gpt-4")
tools = [get_weather, search_web, get_stock_price]
llm_with_tools = llm.bind_tools(tools)
async def agent_with_parallel_tools(query: str) -> str:
messages = [HumanMessage(content=query)]
response = await llm_with_tools.ainvoke(messages)
if response.tool_calls:
messages.append(response)
# Execute all tools in parallel
tool_tasks = []
for tool_call in response.tool_calls:
tool_fn = next(t for t in tools if t.name == tool_call["name"])
tool_tasks.append(tool_fn.ainvoke(tool_call["args"]))
results = await asyncio.gather(*tool_tasks)
# Add all results
for tool_call, result in zip(response.tool_calls, results):
messages.append(
ToolMessage(content=result, tool_call_id=tool_call["id"])
)
return (await llm_with_tools.ainvoke(messages)).content
return response.content using Microsoft.Agents.AI;
using Microsoft.Extensions.AI;
using System.ComponentModel;
// Define multiple tools as functions
[Description("Get weather for a location")]
static async Task<string> GetWeather(string location)
=> await weatherService.GetAsync(location);
[Description("Search the web")]
static async Task<string> SearchWeb(string query)
=> await searchService.SearchAsync(query);
[Description("Get stock price")]
static async Task<string> GetStockPrice(string symbol)
=> await stockService.GetPriceAsync(symbol);
// Create agent with multiple tools
AIAgent agent = new AzureOpenAIClient(endpoint, credentials)
.GetChatClient("gpt-4o")
.AsAIAgent(
instructions: "You are a helpful assistant",
tools: [
AIFunctionFactory.Create(GetWeather),
AIFunctionFactory.Create(SearchWeb),
AIFunctionFactory.Create(GetStockPrice)
]
);
// Agent automatically handles parallel tool execution
var result = await agent.RunAsync(
"What's the weather in NYC and Tokyo, and Apple's stock price?"
); Consideration
Error Handling & Retries
Robust tool execution requires handling failures gracefully. The model should be informed of errors so it can try alternative approaches:
function executeWithRetry(toolCall, maxRetries = 3):
for attempt in range(maxRetries):
try:
result = tools[toolCall.name].execute(toolCall.args)
return { success: true, data: result }
catch error:
if isRetryable(error) and attempt < maxRetries - 1:
wait(exponentialBackoff(attempt))
continue
else:
return {
success: false,
error: formatError(error)
}
function agentLoop(userRequest):
while not complete:
response = llm.generate(messages, tools)
if response.hasToolCall:
result = executeWithRetry(response.toolCall)
if not result.success:
# Let the model know about the failure
messages.append(toolError(result.error))
# Model can try a different approach
else:
messages.append(toolResult(result.data))
else:
return response.content from langchain_core.tools import tool, ToolException
from langchain_core.runnables import RunnableConfig
from tenacity import retry, stop_after_attempt, wait_exponential
# Tools can handle their own errors gracefully
@tool(handle_tool_error=True)
def search_database(query: str) -> str:
"""Search the database with automatic error handling."""
try:
return db.search(query)
except DatabaseError as e:
# Return error message instead of raising
raise ToolException(f"Database error: {e}")
# Custom retry wrapper for tools
class RetryableTool:
def __init__(self, tool_fn, max_retries=3):
self.tool = tool_fn
self.max_retries = max_retries
@retry(
stop=stop_after_attempt(3),
wait=wait_exponential(multiplier=1, min=1, max=10)
)
def invoke(self, args: dict) -> str:
return self.tool.invoke(args)
def safe_invoke(self, args: dict) -> dict:
try:
result = self.invoke(args)
return {"success": True, "data": result}
except Exception as e:
return {
"success": False,
"error": str(e),
"error_type": type(e).__name__
}
# Using LangGraph for agent with error recovery
from langgraph.prebuilt import create_react_agent
agent = create_react_agent(
llm,
tools,
# Agent can see tool errors and try alternatives
handle_tool_errors=True
)
# The agent will receive error messages and can adapt
result = agent.invoke({"messages": [("user", query)]}) using Microsoft.Agents.AI;
using Microsoft.Extensions.AI;
using System.ComponentModel;
using Polly;
// Create retry policy
var retryPolicy = Policy
.Handle<HttpRequestException>()
.Or<TimeoutException>()
.WaitAndRetryAsync(
3,
attempt => TimeSpan.FromSeconds(Math.Pow(2, attempt)),
onRetry: (ex, time, attempt, ctx) =>
Console.WriteLine($"Retry {attempt}: {ex.Message}")
);
// Define tool with built-in error handling
[Description("Search with automatic retry")]
async Task<string> SearchWithRetry(string query)
{
try
{
return await retryPolicy.ExecuteAsync(async () =>
{
return await searchService.SearchAsync(query);
});
}
catch (Exception ex)
{
// Return error info so agent can adapt
return $"Error: {ex.Message}. Try a different approach.";
}
}
// Create agent with resilient tools
AIAgent agent = new AzureOpenAIClient(endpoint, credentials)
.GetChatClient("gpt-4o")
.AsAIAgent(
instructions: "You are a helpful assistant",
tools: [AIFunctionFactory.Create(SearchWithRetry)]
);
// Agent receives error messages and can adapt its approach
var result = await agent.RunAsync(prompt); Trade-offs & Approaches
| Approach | Pros | Cons |
|---|---|---|
| Static tool list | Simple, predictable, easy to test | Context bloat with many tools |
| Dynamic discovery | Scales to many tools | Additional latency, complexity |
| Tool clustering | Balance of both approaches | Routing logic complexity |
| Skills pattern | Massive token savings (98%+) | Requires filesystem access |
Related Topic
Evaluation Approach
Measuring tool use quality requires evaluating multiple dimensions:
| Metric | What it Measures | How to Calculate |
|---|---|---|
| Tool Selection Accuracy | Did the model pick the right tool? | Compare selected tool vs ground truth |
| Argument Accuracy | Were the parameters correct? | Exact match or semantic similarity of args |
| Unnecessary Calls | Did model call tools when not needed? | Count tool calls for simple queries |
| Latency Impact | Time added by tool execution | End-to-end time minus model inference |
| Error Recovery | Can model handle tool failures? | Success rate after simulated failures |
Key metrics for evaluating tool calling capabilities
Evaluation Frameworks
- DeepEval - ToolCorrectnessMetric, ArgumentCorrectnessMetric
- ToolBench - Large-scale tool use benchmark
- API-Bank - API call evaluation dataset
- Custom datasets - Domain-specific tool calling tests