No frameworks needed — build a complete, production-ready agent in pure Python to truly understand how they work from the ground up.
A research agent that can autonomously:
┌────────────────────────────────────────────┐
│ INPUT: "Research quantum computing 2025" │
└────────────────────┬───────────────────────┘
│
┌────────────▼────────────┐
│ LLM DECIDES NEXT STEP │
│ (ReAct: Thought/Action)│
└────────────┬────────────┘
│
┌────────────────┼────────────────┐
│ │ │
▼ ▼ ▼
SEARCH READ_PAGE WRITE_FILE
web search read URL save output
│ │ │
└────────────────┼────────────────┘
│
┌───────────▼───────────┐
│ FEEDBACK TO LLM │
│ (Add to messages) │
└───────────┬───────────┘
│
Has goal been reached?
├─ Yes → Return answer
└─ No → Loop back to LLM decision
The foundation is simple: a while loop that reads model output, parses tool calls, executes them, and feeds results back to the model. Everything else is implementation details.
import anthropic import json client = anthropic.Anthropic() TOOLS = {} # Will populate this def run_agent(user_goal: str) -> str: """Run the agent loop until goal is reached.""" messages = [ {"role": "user", "content": user_goal} ] iteration = 0 while iteration < 10: iteration += 1 # 1. Ask the model what to do next response = client.messages.create( model="claude-opus-4-6", max_tokens=2048, tools=list(TOOLS.values()), messages=messages ) # 2. Parse response if response.stop_reason == "tool_use": for block in response.content: if block.type == "tool_use": # 3. Execute the tool result = execute_tool(block.name, block.input) # 4. Add result back to messages messages.append({ "role": "assistant", "content": response.content }) messages.append({ "role": "user", "content": [ { "type": "tool_result", "tool_use_id": block.id, "content": result } ] }) else: # Model finished, return final answer return response.content[0].text return "Max iterations reached"
The agent doesn't directly call tools. It thinks in prose, Claude's tool_use feature identifies which tool to call, and we parse and execute it. The result goes back as a tool_result block. This keeps the LLM's reasoning readable and auditable.
Each tool has two parts: a JSON schema (tells the model what the tool does) and an implementation (does the work).
TOOLS = {
"web_search": {
"name": "web_search",
"description": "Search the internet for current information",
"input_schema": {
"type": "object",
"properties": {
"query": {
"type": "string",
"description": "Search query (e.g., 'AI agents 2025')"
}
},
"required": ["query"]
}
},
"read_page": {
"name": "read_page",
"description": "Read the text content of a web page",
"input_schema": {
"type": "object",
"properties": {
"url": {
"type": "string",
"description": "URL to read (must start with http)"
}
},
"required": ["url"]
}
},
"write_file": {
"name": "write_file",
"description": "Save text content to a file",
"input_schema": {
"type": "object",
"properties": {
"filename": {"type": "string"},
"content": {"type": "string"}
},
"required": ["filename", "content"]
}
}
}
import requests from bs4 import BeautifulSoup def web_search(query: str) -> str: """Search Google and return top results.""" try: headers = {"User-Agent": "Research Agent"} # Use a free search API or implement custom url = f"https://api.duckduckgo.com/?q={query}&format=json" resp = requests.get(url, headers=headers, timeout=5) results = resp.json() formatted = "Top results:\n" for i, r in enumerate(results.get("Results", [])[:3]): formatted += f"{i+1}. {r.get('Title', 'N/A')}\n" formatted += f" URL: {r.get('FirstURL', 'N/A')}\n" return formatted except Exception as e: return f"Search failed: {str(e)}" def read_page(url: str) -> str: """Read and extract text from a web page.""" try: resp = requests.get(url, timeout=10) soup = BeautifulSoup(resp.content, "html.parser") # Remove script and style tags for tag in soup(["script", "style"]): tag.decompose() text = soup.get_text(separator="\n") return text[:2000] # Limit to 2000 chars except Exception as e: return f"Failed to read page: {str(e)}" def write_file(filename: str, content: str) -> str: """Write content to a file.""" try: with open(filename, "w") as f: f.write(content) return f"Successfully wrote {len(content)} chars to {filename}" except Exception as e: return f"Write failed: {str(e)}" def execute_tool(name: str, args: dict) -> str: """Execute a tool by name.""" if name == "web_search": return web_search(args["query"]) elif name == "read_page": return read_page(args["url"]) elif name == "write_file": return write_file(args["filename"], args["content"]) else: return "Unknown tool"
The system prompt shapes how the agent thinks and acts. Include instructions for goal-oriented behavior, tool use guidelines, and error handling.
You are a Research Agent. Your goal is to gather current information and synthesize it into a comprehensive report. INSTRUCTIONS: 1. For any research topic, systematically search for current information using web_search 2. Read the most promising pages using read_page 3. Synthesize findings into a structured report 4. Save the final report using write_file GUIDELINES: - Search multiple sources to get diverse perspectives - Prioritize recent information (2025 is current) - When you have enough information, synthesize and write the report. Do not over-research. - Be concise but thorough. Reports should be 800-1200 words. - If a page fails to read, try another source SUCCESS CRITERIA: Your task is complete when you have: 1. Searched for relevant information 2. Read at least 2-3 quality sources 3. Written a comprehensive report and saved it Always work towards completing the task efficiently.
Be explicit about the goal, tool usage guidelines, and success criteria. Tell the agent when to stop (avoid infinite loops). Provide constraints (e.g., "max 3 searches per topic"). Guide quality (e.g., "prioritize recent information").
A production agent needs safeguards: iteration limits, token budgets, tool approval, and error recovery.
class GuardedAgent: def __init__(self): self.max_iterations = 10 self.token_budget = 10000 self.tokens_used = 0 self.failed_tools = [] def run(self, goal: str) -> str: messages = [{"role": "user", "content": goal}] for iteration in range(self.max_iterations): # Check token budget if self.tokens_used > self.token_budget: return "Token budget exceeded" # Call model response = client.messages.create(..., messages=messages) # Track token usage self.tokens_used += response.usage.input_tokens self.tokens_used += response.usage.output_tokens if response.stop_reason == "tool_use": for block in response.content: if block.type == "tool_use": # Check if tool is disabled if block.name in self.failed_tools: result = "Tool disabled due to repeated failures" else: try: result = execute_tool(block.name, block.input) except Exception as e: result = f"Tool error: {str(e)}" self.failed_tools.append(block.name) messages.append({"role": "assistant", "content": ...}) messages.append({"role": "user", "content": ...}) else: return response.content[0].text return "Max iterations reached"
Infinite loops: Always set max_iterations. Cost explosions: Track token usage. Tool cascades: One failing tool causes a chain of failures. Add circuit breakers. Silent failures: Log everything. You'll debug with logs, not hunches.
Here's how to run the agent on a real task:
if __name__ == "__main__": agent = GuardedAgent() goal = """Research AI agents in 2025. Search for: 1. Latest agent frameworks and libraries 2. New agent applications in production 3. Recent breakthroughs or challenges Synthesize findings into a comprehensive report and save to 'ai_agents_report_2025.md'.""" result = agent.run(goal) print(f"Agent completed. Tokens used: {agent.tokens_used}") print(f"\nFinal output:\n{result}")
The agent will search for "AI agents 2025", read pages, synthesize findings, and save a markdown report. On subsequent iterations, it adds depth by searching for specific frameworks, production use cases, and recent papers. It stops when confident it has answered the research question.
Agents are notoriously hard to debug because behavior emerges from the LLM. Here are practical strategies:
Print every tool call, result, and message exchange. You'll understand agent behavior through logs.
Test each tool independently before integrating. Verify web_search works, then read_page, then write.
Start with smaller, faster models (Claude 3.5 Sonnet) to iterate quickly. Switch to Opus for final runs.
Define explicit success criteria. Did it search? Did it read pages? Did it save the file? Check each.
import logging logging.basicConfig(level=logging.DEBUG) logger = logging.getLogger(__name__) def run_agent_with_logging(goal): messages = [{"role": "user", "content": goal}] logger.info(f"Agent goal: {goal}") for i in range(10): logger.debug(f"Iteration {i}: {len(messages)} messages in context") response = client.messages.create(...) logger.debug(f"Response stop_reason: {response.stop_reason}") for block in response.content: if block.type == "tool_use": logger.info(f"Tool call: {block.name} with {block.input}") result = execute_tool(block.name, block.input) logger.info(f"Tool result: {result[:200]}...") elif block.type == "text": logger.debug(f"Text response: {block.text[:100]}...") logger.info("Agent completed")
1. What does the agent loop do each iteration?
2. Why do tools need both a JSON schema and an implementation?
3. What's the purpose of max_iterations guardrail?
4. How does the agent know when to stop?
You've now built a production-quality agent from scratch. Key takeaways: The agent loop is simple: ask LLM what to do → execute tool → feed result back → repeat. Tools need two parts: a JSON schema (for the LLM) and an implementation (to do the work). System prompts guide agent behavior — be explicit about goals, tool use, and success criteria. Guardrails are essential: max_iterations, token budgets, circuit breakers, error logging. Debugging agents requires logging everything. You understand agent behavior through its trace. This foundation lets you use frameworks like LangChain or CrewAI with deep understanding.
Next up → Topic 16: Multi-Agent Systems
Now you'll orchestrate multiple specialized agents working together on complex problems.