Tool Integration & Prompt Engineering

Tool Integration

Function Calling (OpenAI Style)

Define JSON schemas for functions. The LLM decides when and how to call them.

Tool schema:

{
  "name": "search_flights",
  "description": "Search available flights between cities",
  "parameters": {
    "type": "object",
    "properties": {
      "from": {"type": "string", "description": "IATA departure city code"},
      "to":   {"type": "string", "description": "IATA destination city code"},
      "date": {"type": "string", "description": "Date in YYYY-MM-DD format"}
    },
    "required": ["from", "to", "date"]
  }
}

Call flow:

1. App passes tool schemas via the `tools` API parameter (separate from the system prompt)
2. LLM output: {"name": "search_flights", "arguments": {"from":"NYC","to":"CHI","date":"2026-04-05"}}
3. Executor calls the real function
4. Result is returned to the LLM as a "tool" role message
5. LLM continues reasoning with the new data

Python example (OpenAI SDK 2026):

import json
import openai

client = openai.OpenAI()

tools = [
    {
        "type": "function",
        "function": {
            "name": "calc",
            "description": "Evaluate a math expression",
            "parameters": {
                "type": "object",
                "properties": {"expr": {"type": "string"}},
                "required": ["expr"]
            }
        }
    }
]

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "What is 42 * 17?"}],
    tools=tools,
    tool_choice="auto"
)

if response.choices[0].message.tool_calls:
    call = response.choices[0].message.tool_calls[0]
    args = json.loads(call.function.arguments)
    result = eval(args["expr"])  # use safe_eval in production
    print(f"Result: {result}")

Tool Registry

A centralized list of available tools:

TOOL_REGISTRY = {
    "search_flights": search_flights_fn,
    "search_hotels":  search_hotels_fn,
    "calculator":     safe_calculator_fn,
    "web_search":     web_search_fn,
}

Tool schemas are passed via API tools; an optional short tool policy can be in the system prompt
The agent can choose from a known action set
Easy to extend without changing agent core logic

Sandboxing and Tool Security

Requirement	Implementation
Code isolation	Container sandbox (gVisor, Firecracker)
Rate limits	Max calls per task + token budget
Input validation	JSON schema validation before execution
Output validation	Type check + range check on results
Least privilege	Read-only DB credentials if writes are not needed

Never: give the agent shell access without a sandbox. Always: validate inputs and outputs before and after each tool call.

Prompt Engineering Strategies

1. Chain-of-Thought (CoT)

Ask the LLM to reason step by step.

System: "Think step-by-step before answering."

User: "Plan a 3-day trip to Chicago under $1500."

LLM response:

Step 1: Estimate budget breakdown — flights ~$300, hotel ~$450, activities ~$200, food ~$300
Step 2: Search flights NYC→CHI for April 5-8
Step 3: Filter hotels downtown under $150/night
Step 4: Build daily itinerary with free/paid attractions
Answer: [structured plan]

When to use: Math, multi-step reasoning tasks, complex decisions.

2. ReAct (Reason + Act)

Alternates Thought: (reasoning) and Action: (tool call) in a loop.

System Prompt:
  You have access to tools: search_flights, search_hotels, calculator.
  Format:
    Thought: [reasoning]
    Action: tool_name({"param": "value"})
    Observation: [tool result, injected by system]
  Repeat until:
    Final Answer: [conclusion]

---

Thought: I need flights from NYC to Chicago on April 5.
Action: search_flights({"from":"NYC","to":"CHI","date":"2026-04-05"})
Observation: [{"price":189,"airline":"United"},{"price":210,"airline":"Delta"}]

Thought: United is cheapest. Now check hotels.
Action: search_hotels({"city":"Chicago","checkin":"2026-04-05","nights":3})
Observation: [{"name":"Marriott","price_per_night":120}]

Final Answer: United flight $189 + Marriott 3 nights $360 = $549 total.

Best for: Single-agent loops with multiple tools.

3. Tree-of-Thoughts (ToT)

Explores multiple reasoning paths in parallel.

Goal: "Find cheapest route"
          │
    ┌─────┴─────┐
  Path A      Path B      Path C
 (drive)      (fly)      (train)
   $320        $189        $210
    │           │
  Score        Score
   0.4         0.9  ← expand
                │
          [book flight]

Algorithm: 1. Generate N thought candidates (e.g. 3) 2. Evaluate / score each (LLM self-evaluation or heuristic) 3. Expand the most promising candidate path 4. Repeat until final answer

When to use: complex optimization where one CoT path is not enough.

Strategy Comparison

Strategy	Complexity	Token Cost	Quality	When
Zero-shot	Minimal	Lowest	Basic	Simple tasks
CoT	Low	Medium	Good	Reasoning, math
ReAct	Medium	Medium	High	Tool-using agents
Tree-of-Thoughts	High	High	Highest	Optimization, planning

System Prompt Best Practices

System prompt = identity + capabilities + rules
User message = specific task
Explicit stop conditions prevent infinite loops
Test prompt variants; simpler prompts often generalize better
For irreversible actions (for example, booking), require user confirmation