Topic 11 — Project: AI-Powered Code Review Tool

1

Project Overview

You're going to build a CLI tool that reviews Python code using Claude. The tool will use:

RAG: Embed your team's coding standards and use them as context.
Tool Use: Read files and fetch git diffs from the filesystem.
Prompt Templates: Reusable system prompt and formatting.
Evaluation: Score code review quality and track improvements.

✅

What You'll Learn

How to integrate RAG, tool use, and templating into a real application. How to build a production-like system that's more than just a single API call. How all the Phase 2 concepts work together in practice.

2

Architecture

Here's how the system flows:

User Input (file path)

↓

1. File Reader Tool
Read code from filesystem

↓

2. RAG Context Retrieval
Fetch relevant coding standards

↓

3. Prompt Builder
Assemble system prompt + code + standards

↓

4. Claude API Call
Get code review response

↓

Output (formatted review)

💎

Key Components

Tools: File reader, git diff reader. RAG: Embedding store for coding standards. Templates: System prompt, output format. Main Loop: Orchestrates all components.

3

The System Prompt — Iteration

Let's show how a system prompt evolves through iteration.

Version 1: Basic

Text

You are a code reviewer.
Review the provided Python code and give feedback.

Version 2: More Specific

Text

You are a senior Python code reviewer.
Review code for:
1. Correctness (bugs, edge cases)
2. Style (PEP 8 compliance)
3. Performance (inefficiencies)
4. Security (vulnerabilities)
5. Documentation (clarity, completeness)

Format: ## [Category] header for each, list issues with severity.

Version 3: Production Grade (Final)

Text

You are a senior Python developer at a tech company.
Your job is to review code for quality, correctness, and adherence to standards.

REVIEW CRITERIA:
1. **Correctness**: Does it work? Handle edge cases? Any bugs?
2. **Style**: PEP 8, naming conventions, readability
3. **Performance**: Inefficient algorithms, unnecessary loops
4. **Security**: SQL injection, unvalidated input, hardcoded secrets
5. **Testing**: Are there tests? Are they sufficient?
6. **Documentation**: Docstrings, comments, clarity

STANDARDS TO APPLY (provided below):
{standards_context}

OUTPUT FORMAT:
## Summary
[1-2 sentence overview of code quality]

## Issues Found
[Numbered list with severity level (🔴 Critical, 🟡 Warning, 🔵 Info)]

## Positive Notes
[What the code does well]

## Recommendations
[Top 3 actionable improvements]

## Overall Rating
[One of: Excellent / Good / Needs Review / Critical Issues]

Be constructive, specific, and cite the standards where applicable.
If code is correct but could be improved, suggest alternatives with examples.

✅

Evolution Pattern

Notice how each iteration adds: specificity (what to review), format (how to output), standards (what to apply against), and tone (constructive, not dismissive). Start simple, then iterate based on actual output quality.

4

Building Tool Definitions

Define the tools Claude can use to access files and get context.

Python — Tool Definitions

tools = [
    {
        "name": "read_file",
        "description": "Read a Python file from disk",
        "input_schema": {
            "type": "object",
            "properties": {
                "path": {
                    "type": "string",
                    "description": "File path (absolute or relative)"
                }
            },
            "required": ["path"]
        }
    },
    {
        "name": "get_coding_standards",
        "description": "Retrieve relevant coding standards via RAG",
        "input_schema": {
            "type": "object",
            "properties": {
                "topic": {
                    "type": "string",
                    "description": "Topic (error handling, testing, security, etc)"
                }
            },
            "required": ["topic"]
        }
    }
]

5

The Review Pipeline

Here's the core logic that orchestrates everything.

Python — Complete Review Pipeline

import anthropic
import os

client = anthropic.Anthropic()

def read_file_tool(path: str) -> str:
    """Read file from filesystem"""
    with open(path, "r") as f:
        return f.read()

def get_standards_tool(topic: str) -> str:
    """Mock RAG retrieval of coding standards"""
    standards = {
        "error_handling": "Always use specific exceptions. Never bare except.",
        "testing": "Every function should have at least one test case.",
        "naming": "Use descriptive names. Avoid abbreviations.",
        "security": "Validate all user input. Use parameterized queries."
    }
    return standards.get(topic, "No standards found")

def execute_tool(name: str, input_dict: dict) -> str:
    """Execute tool and return result"""
    if name == "read_file":
        return read_file_tool(input_dict["path"])
    elif name == "get_coding_standards":
        return get_standards_tool(input_dict["topic"])
    return "Unknown tool"

def review_code(file_path: str) -> str:
    """Main review pipeline"""

    # Step 1: Read the code
    code = read_file_tool(file_path)

    # Step 2: Get standards
    standards = get_standards_tool("general")

    # Step 3: Build augmented prompt
    system_prompt = """You are a senior Python code reviewer..."""

    user_message = f"""
<standards>
{standards}
</standards>

<code>
{code}
</code>

Please review this code."""

    # Step 4: Call Claude
    response = client.messages.create(
        model="claude-opus-4-6",
        max_tokens=1024,
        system=system_prompt,
        messages=[{
            "role": "user",
            "content": user_message
        }]
    )

    return response.content[0].text

# Usage:
review = review_code("example.py")
print(review)

6

Adding RAG for Coding Standards

Instead of hardcoding standards, embed them and use retrieval.

Python — RAG Setup

import chromadb

# Initialize Chroma for vector storage
chroma_client = chromadb.Client()
standards_collection = chroma_client.create_collection(
    name="coding_standards"
)

# Embed company coding standards
standards_docs = [
    "Always validate user input to prevent injection attacks.",
    "Use type hints for all function signatures.",
    "Write tests for all public functions.",
    "Avoid nested loops; optimize with built-ins like map/filter.",
    "Use descriptive variable names, avoid single letters except loop counters.",
]

standards_collection.add(
    documents=standards_docs,
    ids=[f"std_{i}" for i in range(len(standards_docs))]
)

def retrieve_standards(code: str) -> str:
    """RAG: Retrieve relevant standards for code"""
    results = standards_collection.query(
        query_texts=[code],  # Chroma embeds automatically
        n_results=3  # Top 3 most relevant standards
    )
    return "\n".join(results["documents"][0])

✅

Why RAG Here?

Instead of including ALL standards in every review (which adds tokens and noise), retrieve only the relevant ones. If code has database queries, retrieve security standards. If code lacks tests, retrieve testing standards.

7

The Complete Application

Here's the full working code (~100 lines) that ties everything together.

Python — Full Code Review Tool

#!/usr/bin/env python3
"""AI-Powered Code Review Tool"""
import anthropic
import sys
import chromadb

client = anthropic.Anthropic()
chroma_client = chromadb.Client()

# Setup RAG
collection = chroma_client.create_collection(
    name="standards"
)
collection.add(
    documents=[
        "Use type hints for all functions",
        "Validate all user input",
        "Write tests for critical paths",
        "Use descriptive variable names",
    ],
    ids=["std_1", "std_2", "std_3", "std_4"]
)

SYSTEM_PROMPT = """You are a senior Python code reviewer.
Review for: correctness, style, performance, security, testing.

Standards:
{standards}

Format:
## Summary
[1 sentence overview]

## Issues Found
[List with severity: 🔴 Critical, 🟡 Warning, 🔵 Info]

## Rating
[Excellent/Good/Needs Review/Critical]"""

def review(file_path: str):
    """Review a Python file"""
    # Read code
    with open(file_path) as f:
        code = f.read()

    # Retrieve relevant standards via RAG
    results = collection.query(
        query_texts=[code],
        n_results=2
    )
    standards = "\n".join(
        results["documents"][0]
    )

    # Call Claude
    response = client.messages.create(
        model="claude-opus-4-6",
        max_tokens=800,
        system=SYSTEM_PROMPT.format(
            standards=standards
        ),
        messages=[{
            "role": "user",
            "content": f"""Review this code:

<code>
{code}
</code>"""
        }]
    )

    print(response.content[0].text)

if __name__ == "__main__":
    if len(sys.argv) < 2:
        print("Usage: python solution.py ")
        sys.exit(1)
    review(sys.argv[1])

Usage:

Bash

pip install anthropic chromadb
export ANTHROPIC_API_KEY="..."
python solution.py mycode.py

8

Extensions & Challenges

Here are ways to extend the tool and make it production-grade:

🔗

GitHub Integration

Connect to GitHub API. Run reviews on PRs automatically. Post reviews as comments.

🤖

Auto-Fix Suggestions

Have Claude generate fixed code snippets, not just critiques. Users can apply patches.

📊

Review Metrics

Track code quality scores over time. Identify most common issues. Build dashboards.

🧪

Evaluation Pipeline

Create test cases with expected reviews. Measure review quality. Iterate on system prompt.

💎

Challenge Ideas

Add support for multiple languages (JS, Go, etc.) with language-specific standards. Implement a "compliance checker" that ensures code meets security/privacy standards. Build a web UI where teams can upload code and get reviews interactively. Create a feedback loop where developers can rate reviews, improving the system over time.

✨

Phase 2 Complete!

Congratulations! You've completed Phase 2: Applied Prompt Engineering. In 5 topics, you've learned:

Topic 7 (RAG): How to augment LLMs with external knowledge.
Topic 8 (Tools): How to let LLMs take actions in the real world.
Topic 9 (Templates & Chains): How to build reusable, orchestrated workflows.
Topic 10 (Evaluation): How to measure and improve systematically.
Topic 11 (Project): How to integrate it all into a real application.

✅

What's Next?

Phase 3 covers advanced topics: Agents (autonomous decision-making), Memory (context management), Multimodal (handling images, audio), and Safety (preventing misuse). But the foundation you've built in Phase 2 applies everywhere.

✓

Check Your Understanding

Quick Quiz — 3 Questions

1. In the code review tool, what is RAG used for?

2. Why does the system prompt evolve through multiple versions?

3. What are the main components of the complete code review tool?

🎉

Phase 2 Summary

You've learned the core patterns of production AI systems: RAG for knowledge, tool use for action, templates for reusability, chains for complexity, and evaluation for improvement. You've applied them in a real code review tool that integrates all these concepts. The tool reads files, retrieves coding standards, builds a dynamic prompt, calls Claude, and returns a structured review.

The key insight: Production AI systems are rarely just a single API call. They're orchestrations of multiple patterns working together — RAG + tools + templates + evaluation. Master this combination, and you can build almost anything.

Ready for Phase 3?
Topics 12-20 cover Agents, Advanced Prompting, Memory, Multimodal, and Safety. You've built a strong foundation.