PHASE 3 ← Back to Course
16 / 23
🏛️

Agent Architecture

Deep dive into the technical foundations: planning, memory, tool use, and the reasoning loop that makes agents work at scale. Understand how to build robust, production-ready agent systems.

1

The Four Pillars of Agent Architecture

Every functional agent rests on four core components. These aren't optional — without any one of them, your agent will fail or behave unpredictably. Let's build a mental model of how they interact.

🎯

1. Planning & Reasoning

The agent's ability to break down a complex goal into subtasks and decide the next action. This is where the LLM's reasoning shines. It's the "thinking" phase.

🧠

Memory

Short-term (conversation), long-term (vector database), working (scratchpad). Agents need memory to avoid infinite loops and maintain context.

⚙️

Tools

External functions the agent can invoke: APIs, databases, calculators, web search. Tools ground the agent in reality.

🔄

Action Execution

The framework that calls tools, parses results, and feeds them back to the agent. The "doing" phase.

💡

Architecture Diagram

User Input → Planning (LLM decides what to do) → Tool Selection → Action Execution → Result Processing → Memory Update → Loop back to Planning (if goal not met). This cycle is the agent.

2

Planning & Reasoning — The ReAct Pattern

ReAct stands for "Reasoning + Acting". It's a proven pattern where the agent generates a reasoning step (what to do), then an action step (which tool to use), then observes the result. This cycle is remarkably effective.

Thought
Agent's internal reasoning: "I need to search for current AI agent frameworks to answer this question accurately."
Action
Tool to invoke: web_search("AI agent frameworks 2025")
Observation
Tool result: [search results with links to LangChain, Claude SDK, CrewAI docs]
Python — ReAct Pattern Implementation
def react_loop(user_input: str) -> str:
    # Build the context with tools available
    available_tools = ["web_search", "calculator", "code_executor"]

    # Keep track of the thought-action-observation history
    history = []

    while not goal_reached:
        # 1. THOUGHT: What should I do?
        thought = llm.generate(
            f"Given {user_input} and my tools {available_tools}, what's my next step?"
        )
        history.append({"type": "thought", "content": thought})

        # 2. ACTION: Execute the tool
        tool_name, tool_args = parse_action(thought)
        result = execute_tool(tool_name, tool_args)
        history.append({"type": "action", "tool": tool_name})

        # 3. OBSERVATION: What happened?
        history.append({"type": "observation", "content": result})

        # Check if we're done
        goal_reached = check_goal(result)

    return synthesize_final_answer(history)
🔑

Why ReAct Works

By separating thought from action, the model becomes more explicit about its reasoning. This makes debugging easier and helps the model avoid mistakes. It's slower than single-shot responses but far more reliable for complex tasks.

3

Memory Systems — Three Layers

Agents need different types of memory to function effectively. Think of it like human memory: short-term (what you're thinking right now), long-term (facts you learned long ago), and working (your notepad).

💭

Short-Term Memory

The conversation history. Recent messages and tool results. Limited to context window (4K-200K tokens). Cleared each session.

📚

Long-Term Memory

Persistent knowledge. Vector databases for semantic search. Facts learned across sessions. Can be enormous (millions of documents).

📝

Working Memory

Scratchpad for current task. Intermediate results, reasoning chains, state variables. Kept in short-term but separate from conversation.

⚙️

Examples

Short: Last 10 messages. Long: Customer history in vector DB. Working: Current subtask list, iteration count.

Python — Memory Architecture Pattern
class AgentMemory:
    def __init__(self):
        # Short-term: conversation history
        self.conversation = []

        # Long-term: vector database
        self.vector_db = VectorDatabase()

        # Working: current task state
        self.scratchpad = {
            "current_task": None,
            "subtasks_remaining": [],
            "iteration_count": 0,
            "max_iterations": 10
        }

    def add_observation(self, text: str):
        self.conversation.append({"role": "observation", "content": text})
        self.vector_db.add(text)  # Also store for long-term retrieval

    def get_context(self, query: str) -> str:
        # Combine short-term + relevant long-term
        recent = self.conversation[-10:]  # Last 10 messages
        relevant = self.vector_db.search(query, top_k=5)
        return format_context(recent + relevant)
4

Tool Integration — Dynamic Tool Selection

The agent needs to know what tools are available and when to use each one. This is solved with a tool registry pattern where each tool is described with a JSON schema.

Python — Tool Registry Pattern
TOOL_REGISTRY = {
    "web_search": {
        "description": "Search the web for current information",
        "parameters": {
            "query": {"type": "string", "description": "Search terms"}
        },
        "function": web_search_impl
    },
    "execute_code": {
        "description": "Run Python code safely",
        "parameters": {
            "code": {"type": "string", "description": "Python code to execute"}
        },
        "function": execute_code_impl
    }
}

def execute_tool(tool_name: str, args: dict) -> str:
    if tool_name not in TOOL_REGISTRY:
        raise ValueError(f"Tool {tool_name} not found")

    func = TOOL_REGISTRY[tool_name]["function"]
    try:
        result = func(**args)
        return result
    except Exception as e:
        return f"Error: {str(e)}"
💡

Declarative Tool Definitions

By defining tools in JSON, the agent can read the descriptions and decide which tool to use. This is how Claude's tool use works — the model sees the schema and chooses intelligently.

5

State Management — Tracking Agent Progress

As an agent works through a task, it needs to track state: what's been done, what's pending, error counts, etc. This is critical for resumability and debugging.

Python — State Dictionary Pattern
agent_state = {
    "goal": "Write a report on quantum computing",
    "status": "in_progress",  # pending, in_progress, completed, failed
    "iteration": 3,
    "max_iterations": 10,
    "current_step": "searching_for_recent_papers",
    "completed_steps": ["plan_outline", "search_quantum_basics"],
    "pending_steps": ["synthesize_findings", "write_draft", "review"],
    "errors": {
        "search_failed": 1,
        "rate_limited": 0
    },
    "results": [],
    "memory_tokens_used": 3456,
    "started_at": "2025-02-17T10:30:00Z",
    "updated_at": "2025-02-17T10:35:22Z"
}
⚠️

State Persistence

Long-running agents should persist state to disk or database. If the agent crashes, it can resume from where it left off. This is crucial for production systems handling expensive operations.

6

Error Recovery — When Things Go Wrong

Production agents will fail. Tools fail, networks are unreliable, models make mistakes. Great agents have strategies to recover gracefully.

✅ Retry Strategy
Tool fails → Wait (exponential backoff)
→ Retry up to 3 times → Fallback
to different tool or human
✅ Re-plan Strategy
Tool returns unexpected result
→ Agent re-reasons → Chooses
different tool or approach
✅ Escalation
Tool fails 3+ times → Mark
as failed → Skip step → Try
next step or ask human
✅ Circuit Breaker
Tool fails repeatedly → Disable
it temporarily → Use fallback
tools → Monitor for recovery
Python — Error Recovery Pattern
def execute_with_recovery(tool_name: str, args, max_retries=3):
    retry_count = 0
    wait_time = 1  # seconds, exponential backoff

    while retry_count < max_retries:
        try:
            result = execute_tool(tool_name, args)
            return {"success": True, "result": result}

        except ToolError as e:
            retry_count += 1
            if retry_count < max_retries:
                time.sleep(wait_time)
                wait_time *= 2  # exponential backoff
                continue
            else:
                return {
                    "success": False,
                    "error": str(e),
                    "fallback_tool": "manual_review"
                }
🔑

Degradation vs. Failure

Good agents degrade gracefully. If a search tool fails, try a cached version. If vision fails, ask the user. If code execution times out, simplify the problem. Graceful degradation keeps agents useful even when things break.

7

Putting It Together — Complete Architecture

Let's see how all four pillars connect in a complete, functional example. This is the mental model you need to build production agents.

Python — Full Agent Architecture
class ReasoningAgent:
    def __init__(self):
        self.memory = AgentMemory()
        self.tools = TOOL_REGISTRY
        self.state = {"status": "idle"}
        self.iteration_count = 0

    def run(self, user_goal: str):
        self.state["goal"] = user_goal
        self.state["status"] = "in_progress"

        while (self.iteration_count < 10
               and self.state["status"] == "in_progress"):

            # 1. PERCEIVE: Gather context
            context = self.memory.get_context(user_goal)

            # 2. REASON: What to do next?
            thought = self.llm.generate(
                f"Goal: {user_goal}\nContext: {context}\nWhat's next?"
            )

            # 3. ACT: Execute tool
            tool_name, tool_args = parse_tool(thought)
            result = execute_with_recovery(tool_name, tool_args)

            # 4. LEARN: Update memory and state
            self.memory.add_observation(result)
            self.iteration_count += 1

            # Check if goal is reached
            done = self.llm.judge(
                f"Is this goal achieved? {user_goal}\nEvidence: {result}"
            )
            if done:
                self.state["status"] = "completed"

        return synthesize_answer(self.memory)
💡

Key Insights

This simple architecture includes everything: planning (thought), tool use (action), error recovery, state tracking, and goal checking. Real production systems add more layers (logging, monitoring, caching), but this core loop is universal.

Check Your Understanding

Quick Quiz — 4 Questions

1. What does ReAct stand for?

2. Which memory type is best for semantic search across millions of documents?

3. What is a tool registry used for?

4. In the state management pattern, what does "completed_steps" track?

Topic 13 Summary

Agent architecture rests on four pillars: Planning & Reasoning (the ReAct pattern), Memory (short-term/long-term/working), Tools (a registry of available functions), and Action Execution (the framework that calls tools safely). ReAct (Reasoning + Acting) separates the thinking phase from the doing phase, making agents more reliable. Memory systems prevent infinite loops and enable knowledge reuse. Tool registries let agents choose what to do next intelligently. State management enables resumability and debugging. Error recovery strategies (retry, replan, escalate, circuit breaker) keep agents functional under stress.

Next up → Topic 14: Agent Frameworks
Now you'll explore existing frameworks (LangChain, CrewAI, Claude SDK) that implement these architectures for you.

← Topic 12 Topic 16 of 23 Next: Memory & Context →