Building an AI Agent with LangChain and OpenAI

January 16, 2024 — #Python #AI #LangChain #OpenAI #Machine Learning #Agents

Introduction

AI agents represent a significant leap forward in how we interact with AI systems. Unlike traditional chatbots that simply generate responses, agents can reason about tasks, select appropriate tools, execute actions, and synthesize results. In this tutorial, we'll explore how to build a simple but powerful AI agent using LangChain and OpenAI that can answer questions using Wikipedia and perform mathematical calculations.

The agent we'll examine uses the ReAct (Reasoning + Acting) pattern, which allows it to:

Reason about what tool to use for a given question
Act by calling the selected tool
Observe the tool's output
Respond with a synthesized answer

Architecture Overview

The AI agent follows a modular architecture:

Language Model (LLM): OpenAI's GPT model that powers the agent's reasoning
Agent Framework: LangChain's zero-shot ReAct agent that orchestrates tool selection
Tools: Custom tools that the agent can use (Wikipedia search and calculator)
Tool Tracking: Callback system to monitor agent behavior and statistics

Core Components

1. Agent Initialization

The agent is set up using LangChain's initialize_agent function with a zero-shot ReAct description agent type:

from langchain_openai import OpenAI
from langchain.agents import initialize_agent, Tool, AgentType
from langchain.tools import BaseTool

# Initialize the LLM
llm = OpenAI(temperature=0, callbacks=[tool_tracker])

# Wrap custom tools into LangChain's Tool format
wiki_tool = WikipediaTool()
calc_tool = CalculatorTool()

tools = [
    Tool(
        name=wiki_tool.name,
        func=wiki_tool.run,
        description=wiki_tool.description
    ),
    Tool(
        name=calc_tool.name,
        func=calc_tool.run,
        description=calc_tool.description
    ),
]

# Create the agent
agent = initialize_agent(
    tools,
    llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True,
    handle_parsing_errors=True,
    callbacks=[tool_tracker]
)

Key Components Explained:

AgentType.ZERO_SHOT_REACT_DESCRIPTION: This agent type uses the ReAct pattern without requiring examples. It reads tool descriptions and decides which tool to use based on the question.
temperature=0: Set to 0 for deterministic, consistent responses
verbose=True: Enables detailed logging of the agent's reasoning process
handle_parsing_errors=True: Gracefully handles cases where the agent's output doesn't match expected format

How ReAct Works:

The ReAct pattern follows this loop:

Thought: The agent analyzes the question and decides what to do
Action: The agent selects a tool and prepares the input
Observation: The tool executes and returns a result
Thought: The agent processes the observation
Final Answer: The agent synthesizes the information and responds

2. Wikipedia Tool

The Wikipedia tool allows the agent to search Wikipedia for general knowledge questions:

from langchain.tools import BaseTool
import wikipedia

class WikipediaTool(BaseTool):
    name: str = "wikipedia"
    description: str = (
        "Useful for answering general knowledge questions. "
        "Input should be a single string; it returns the summary of the top page."
    )

    def _run(self, query: str) -> str:
        """
        Searches Wikipedia for the given query and returns a summary.

        Args:
            query: The search term to look up on Wikipedia

        Returns:
            str: A two-sentence summary of the Wikipedia page or an error message
        """
        try:
            # Get a concise 2-sentence summary
            summary = wikipedia.summary(query, sentences=2)
            return summary
        except Exception as e:
            return f"❗ Could not fetch Wikipedia page for '{query}'. Error: {e}"

How It Works:

Inheritance: Inherits from BaseTool, which provides the interface LangChain expects
Name & Description: The name and description are crucial - the agent uses these to decide when to use this tool
_run Method: This is where the actual tool logic lives. It:
- Takes a search query string
- Uses the wikipedia library to fetch a summary
- Returns a concise 2-sentence summary
- Handles errors gracefully

Why Two Sentences?

Limiting to 2 sentences keeps responses concise and reduces token usage. The agent can always ask follow-up questions if more detail is needed.

3. Calculator Tool

The calculator tool safely evaluates mathematical expressions without using eval(), which would be a security risk:

import re
import ast
import operator
from typing import Dict, Any, Union, Type, Callable

class CalculatorTool(BaseTool):
    name: str = "calculator"
    description: str = (
        "Useful for math questions. Input should be a valid math expression "
        "(e.g., 16 * 4 + 3). For square roots, use 'x ** 0.5' instead of 'sqrt(x)'. "
        "Do not use quotes around the expression."
    )

    def _run(self, expression: str) -> str:
        """Evaluates a simple math expression and returns the result."""
        try:
            # Strip any quotes from the expression
            expression = expression.strip("'\"")

            # Handle square root requests
            if "square root" in expression.lower() or "sqrt" in expression.lower():
                match = re.search(r'\d+', expression)
                if match:
                    number = int(match.group())
                    return str(number ** 0.5)
                else:
                    return "❗ Could not identify a number to calculate square root."

            # Validate expression - only allow safe characters
            if not re.fullmatch(r"[0-9+\-*/().\s\*]+", expression):
                return "❗ Invalid characters in expression."

            # Parse the expression into an Abstract Syntax Tree (AST)
            node = ast.parse(expression, mode='eval')

            # Define allowed operators (whitelist approach)
            operators: Dict[Type[ast.operator], Callable] = {
                ast.Add: operator.add,
                ast.Sub: operator.sub,
                ast.Mult: operator.mul,
                ast.Div: operator.truediv,
                ast.USub: operator.neg,  # Unary minus
                ast.Pow: operator.pow,   # Power operator
            }

            # Recursively evaluate the AST
            def eval_expr(node: Any) -> Union[int, float]:
                if isinstance(node, ast.Num):  # Number literal
                    return node.n
                elif isinstance(node, ast.BinOp):  # Binary operation (a + b)
                    if type(node.op) not in operators:
                        raise ValueError(f"Unsupported operation: {type(node.op).__name__}")
                    return operators[type(node.op)](
                        eval_expr(node.left),
                        eval_expr(node.right)
                    )
                elif isinstance(node, ast.UnaryOp):  # Unary operation (-a)
                    if type(node.op) not in operators:
                        raise ValueError(f"Unsupported operation: {type(node.op).__name__}")
                    return operators[type(node.op)](eval_expr(node.operand))
                elif isinstance(node, ast.Expression):  # Root expression node
                    return eval_expr(node.body)
                else:
                    raise ValueError(f"Unsupported node type: {type(node).__name__}")

            result = eval_expr(node)
            return str(result)
        except Exception as e:
            return f"❗ Error evaluating expression: {e}"

Security Features:

Input Validation: Uses regex to ensure only safe characters are present
AST Parsing: Converts the expression to an Abstract Syntax Tree instead of using eval()
Operator Whitelisting: Only allows specific operations (add, subtract, multiply, divide, power, unary minus)
Recursive Evaluation: Safely traverses the AST, only executing whitelisted operations

Why Not Use eval()?

Using eval() on user input is extremely dangerous. A malicious user could execute arbitrary Python code:

# DANGEROUS - Never do this!
result = eval(user_input)  # Could execute: __import__('os').system('rm -rf /')

# SAFE - Our approach
node = ast.parse(user_input, mode='eval')
result = eval_expr(node)  # Only evaluates whitelisted operations

AST Evaluation Process:

Parse: ast.parse() converts the string into a tree structure
Traverse: Recursively walk the tree
Evaluate: For each node type, apply the corresponding operation
Return: The final computed value

4. Tool Tracking with Callbacks

The agent includes a callback system to track tool usage and statistics:

from langchain.callbacks.base import BaseCallbackHandler
from typing import List, Optional, Any, Dict
from uuid import UUID

class ToolTracker(BaseCallbackHandler):
    def __init__(self, stats):
        super().__init__()
        self.stats = stats

    def on_tool_start(
        self,
        serialized: Dict[str, Any],
        input_str: str,
        *,
        run_id: UUID,
        parent_run_id: Optional[UUID] = None,
        tags: Optional[List[str]] = None,
        metadata: Optional[Dict[str, Any]] = None,
        **kwargs: Any
    ) -> None:
        tool_name = serialized.get("name", "")
        if tool_name == "wikipedia":
            self.stats["wiki_queries"] += 1
        elif tool_name == "calculator":
            self.stats["calculator_calls"] += 1

    def on_llm_start(
        self,
        serialized: Dict[str, Any],
        prompts: List[str],
        *,
        run_id: UUID,
        parent_run_id: Optional[UUID] = None,
        tags: Optional[List[str]] = None,
        metadata: Optional[Dict[str, Any]] = None,
        **kwargs: Any
    ) -> None:
        self.stats["openai_queries"] += 1

How Callbacks Work:

on_tool_start: Called whenever a tool is invoked. We track which tool is being used.
on_llm_start: Called whenever the LLM makes a request. We count API calls for cost tracking.
Statistics: The stats dictionary tracks usage patterns, which is useful for:
- Monitoring costs
- Understanding agent behavior
- Optimizing tool selection

5. Error Handling

The agent includes comprehensive error handling for various OpenAI API errors:

try:
    answer = agent.invoke({"input": question})["output"]
    print(f"Answer:\n{answer}")
except openai.AuthenticationError:
    print("Error: Authentication failed. Check your OpenAI API key.")
except openai.RateLimitError:
    print("Error: OpenAI API rate limit exceeded. Please try again later.")
except openai.APIError as e:
    print(f"OpenAI API Error: {e}")
except openai.APIConnectionError:
    print("Error: OpenAI service is currently unavailable. Please try again later.")
except Exception as e:
    print(f"An unexpected error occurred: {e}")

Error Types Handled:

AuthenticationError: Invalid or missing API key
RateLimitError: Too many requests (should implement retry with backoff in production)
APIError: General API errors
APIConnectionError: Network connectivity issues
Generic Exception: Catches any unexpected errors

Example Interactions

Let's see how the agent handles different types of questions:

Example 1: General Knowledge Question

question = "Who was Ada Lovelace and why is she important?"

# Agent's reasoning process (with verbose=True):
# Thought: I need to find information about Ada Lovelace. This is a general knowledge question.
# Action: Use Wikipedia tool
# Action Input: Ada Lovelace
# Observation: Ada Lovelace was an English mathematician and writer...
# Thought: I have enough information to answer the question.
# Final Answer: Ada Lovelace was an English mathematician and writer...

answer = agent.invoke({"input": question})["output"]

What Happens:

Agent recognizes this is a factual question
Selects the Wikipedia tool
Searches for "Ada Lovelace"
Receives a summary from Wikipedia
Synthesizes the answer

Example 2: Mathematical Calculation

question = "Calculate 17 * (24 - 5)"

# Agent's reasoning process:
# Thought: This is a mathematical calculation. I should use the calculator tool.
# Action: Use calculator tool
# Action Input: 17 * (24 - 5)
# Observation: 323
# Thought: The calculation is complete.
# Final Answer: The result is 323.

answer = agent.invoke({"input": question})["output"]

What Happens:

Agent identifies this as a math problem
Selects the calculator tool
Passes the expression to the calculator
Calculator safely evaluates: 17 * (24 - 5) = 17 * 19 = 323
Returns the result

Example 3: Square Root Calculation

question = "Calculate the square root of 144"

# The calculator tool handles this specially:
# - Detects "square root" in the expression
# - Extracts the number (144)
# - Calculates: 144 ** 0.5 = 12.0
# - Returns: "12.0"

Running the Agent

Here's how to set up and run the agent:

1. Installation

# Clone the repository
git clone https://github.com/nyandiekaFelix/ai-agent
cd ai-agent

# Create virtual environment
python3 -m venv myvenv
source myvenv/bin/activate  # On Windows: myvenv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

2. Configuration

Create a .env file with your OpenAI API key:

OPENAI_API_KEY="sk-your_openai_api_key_here"

Or export it in your shell:

export OPENAI_API_KEY="sk-your_openai_api_key_here"

3. Execution

python -m simple_agent.agent_demo

Output Example:

Simple Agent Demo (LangChain with OpenAI Compatible LLM)

--------------------------------------------------------------------------------

Question: Who was Ada Lovelace and why is she important?
Answer:
Ada Lovelace was an English mathematician and writer, chiefly known for her work on Charles Babbage's proposed mechanical general-purpose computer, the Analytical Engine. She is considered by many to be the first computer programmer because she wrote the first algorithm intended to be processed by a machine.

--------------------------------------------------------------------------------

Question: Calculate 17 * (24 - 5)
Answer:
The result is 323.

--------------------------------------------------------------------------------

Statistics:
- Runtime..............: 12.45 seconds
- OpenAI API calls.....: 8
- Wikipedia tool calls.: 2
- Calculator tool calls: 2

Extending the Agent

You can easily extend the agent by adding new tools:

Step 1: Create a New Tool

# tools.py

class WeatherTool(BaseTool):
    name: str = "weather"
    description: str = (
        "Useful for getting current weather information. "
        "Input should be a city name."
    )

    def _run(self, city: str) -> str:
        # Your weather API logic here
        try:
            # Call weather API
            weather_data = get_weather(city)
            return f"The weather in {city} is {weather_data['temperature']}°C"
        except Exception as e:
            return f"❗ Could not fetch weather for {city}. Error: {e}"

Step 2: Register the Tool

# agent_demo.py

from .tools import WikipediaTool, CalculatorTool, WeatherTool

# ... existing code ...

weather_tool = WeatherTool()

tools = [
    Tool(name=wiki_tool.name, func=wiki_tool.run, description=wiki_tool.description),
    Tool(name=calc_tool.name, func=calc_tool.run, description=calc_tool.description),
    Tool(name=weather_tool.name, func=weather_tool.run, description=weather_tool.description),
]

# ... rest of the code ...

The agent will automatically learn to use the new tool based on its description!

Best Practices

1. Tool Descriptions

Write clear, specific tool descriptions. The agent uses these to decide which tool to use:

# Good description
description = "Useful for math questions. Input should be a valid math expression."

# Bad description (too vague)
description = "Does calculations."

2. Error Handling

Always handle errors gracefully in tools:

def _run(self, input: str) -> str:
    try:
        # Tool logic
        return result
    except Exception as e:
        return f"❗ Error: {e}"  # Return error message, don't raise

3. Input Validation

Validate inputs in tools to prevent errors:

def _run(self, expression: str) -> str:
    if not expression or not expression.strip():
        return "❗ Empty expression provided."
    # ... rest of logic

4. Cost Monitoring

Use callbacks to track API usage:

stats = {"openai_queries": 0, "tool_calls": 0}
tracker = ToolTracker(stats)
llm = OpenAI(callbacks=[tracker])

Limitations and Future Improvements

Current Limitations

Single Tool per Question: The agent typically uses one tool per question. For complex queries requiring multiple tools, you'd need a more sophisticated agent.
No Memory: The agent doesn't remember previous conversations. Each question is independent.
LangChain Deprecation: LangChain agents are being phased out in favor of LangGraph.

Future Improvements

LangGraph Migration: Migrate to LangGraph for more flexible agent workflows
Conversation Memory: Add memory to maintain context across multiple turns
Tool Chaining: Enable the agent to use multiple tools in sequence
Streaming Responses: Stream agent responses for better user experience
Retry Logic: Add exponential backoff for rate limit errors

Dependencies Breakdown

langchain: Core framework for building LLM applications
langchain-openai: OpenAI integration for LangChain
openai: Official OpenAI Python client
wikipedia: Python wrapper for Wikipedia API

Conclusion

This AI agent demonstrates the power of combining LLMs with tools. By giving the agent access to Wikipedia and a calculator, we've created a system that can answer both factual and mathematical questions accurately.

Key Takeaways:

Tool Descriptions Matter: The agent relies heavily on tool descriptions to make decisions
Security First: Always validate and sanitize inputs, especially when evaluating code
Error Handling: Graceful error handling improves user experience
Monitoring: Track tool usage and API calls to understand costs and behavior
Extensibility: The modular design makes it easy to add new capabilities

The agent pattern is powerful because it combines the reasoning capabilities of LLMs with the precision of specialized tools. As you extend this agent, consider:

What tools would be useful for your use case?
How can you improve tool descriptions?
What error cases need handling?
How can you optimize for cost and performance?

For production use, consider migrating to LangGraph, which offers more flexibility, better state management, and support for complex agent workflows.