Apply everything from Phase 2 — build a complete prompt-powered application from scratch.
You're going to build a CLI tool that reviews Python code using Claude. The tool will use:
How to integrate RAG, tool use, and templating into a real application. How to build a production-like system that's more than just a single API call. How all the Phase 2 concepts work together in practice.
Here's how the system flows:
Tools: File reader, git diff reader. RAG: Embedding store for coding standards. Templates: System prompt, output format. Main Loop: Orchestrates all components.
Let's show how a system prompt evolves through iteration.
Version 1: Basic
You are a code reviewer.
Review the provided Python code and give feedback.
Version 2: More Specific
You are a senior Python code reviewer.
Review code for:
1. Correctness (bugs, edge cases)
2. Style (PEP 8 compliance)
3. Performance (inefficiencies)
4. Security (vulnerabilities)
5. Documentation (clarity, completeness)
Format: ## [Category] header for each, list issues with severity.
Version 3: Production Grade (Final)
You are a senior Python developer at a tech company.
Your job is to review code for quality, correctness, and adherence to standards.
REVIEW CRITERIA:
1. **Correctness**: Does it work? Handle edge cases? Any bugs?
2. **Style**: PEP 8, naming conventions, readability
3. **Performance**: Inefficient algorithms, unnecessary loops
4. **Security**: SQL injection, unvalidated input, hardcoded secrets
5. **Testing**: Are there tests? Are they sufficient?
6. **Documentation**: Docstrings, comments, clarity
STANDARDS TO APPLY (provided below):
{standards_context}
OUTPUT FORMAT:
## Summary
[1-2 sentence overview of code quality]
## Issues Found
[Numbered list with severity level (🔴 Critical, 🟡 Warning, 🔵 Info)]
## Positive Notes
[What the code does well]
## Recommendations
[Top 3 actionable improvements]
## Overall Rating
[One of: Excellent / Good / Needs Review / Critical Issues]
Be constructive, specific, and cite the standards where applicable.
If code is correct but could be improved, suggest alternatives with examples.
Notice how each iteration adds: specificity (what to review), format (how to output), standards (what to apply against), and tone (constructive, not dismissive). Start simple, then iterate based on actual output quality.
Define the tools Claude can use to access files and get context.
tools = [
{
"name": "read_file",
"description": "Read a Python file from disk",
"input_schema": {
"type": "object",
"properties": {
"path": {
"type": "string",
"description": "File path (absolute or relative)"
}
},
"required": ["path"]
}
},
{
"name": "get_coding_standards",
"description": "Retrieve relevant coding standards via RAG",
"input_schema": {
"type": "object",
"properties": {
"topic": {
"type": "string",
"description": "Topic (error handling, testing, security, etc)"
}
},
"required": ["topic"]
}
}
]
Here's the core logic that orchestrates everything.
import anthropic import os client = anthropic.Anthropic() def read_file_tool(path: str) -> str: """Read file from filesystem""" with open(path, "r") as f: return f.read() def get_standards_tool(topic: str) -> str: """Mock RAG retrieval of coding standards""" standards = { "error_handling": "Always use specific exceptions. Never bare except.", "testing": "Every function should have at least one test case.", "naming": "Use descriptive names. Avoid abbreviations.", "security": "Validate all user input. Use parameterized queries." } return standards.get(topic, "No standards found") def execute_tool(name: str, input_dict: dict) -> str: """Execute tool and return result""" if name == "read_file": return read_file_tool(input_dict["path"]) elif name == "get_coding_standards": return get_standards_tool(input_dict["topic"]) return "Unknown tool" def review_code(file_path: str) -> str: """Main review pipeline""" # Step 1: Read the code code = read_file_tool(file_path) # Step 2: Get standards standards = get_standards_tool("general") # Step 3: Build augmented prompt system_prompt = """You are a senior Python code reviewer...""" user_message = f""" <standards> {standards} </standards> <code> {code} </code> Please review this code.""" # Step 4: Call Claude response = client.messages.create( model="claude-opus-4-6", max_tokens=1024, system=system_prompt, messages=[{ "role": "user", "content": user_message }] ) return response.content[0].text # Usage: review = review_code("example.py") print(review)
Instead of hardcoding standards, embed them and use retrieval.
import chromadb # Initialize Chroma for vector storage chroma_client = chromadb.Client() standards_collection = chroma_client.create_collection( name="coding_standards" ) # Embed company coding standards standards_docs = [ "Always validate user input to prevent injection attacks.", "Use type hints for all function signatures.", "Write tests for all public functions.", "Avoid nested loops; optimize with built-ins like map/filter.", "Use descriptive variable names, avoid single letters except loop counters.", ] standards_collection.add( documents=standards_docs, ids=[f"std_{i}" for i in range(len(standards_docs))] ) def retrieve_standards(code: str) -> str: """RAG: Retrieve relevant standards for code""" results = standards_collection.query( query_texts=[code], # Chroma embeds automatically n_results=3 # Top 3 most relevant standards ) return "\n".join(results["documents"][0])
Instead of including ALL standards in every review (which adds tokens and noise), retrieve only the relevant ones. If code has database queries, retrieve security standards. If code lacks tests, retrieve testing standards.
Here's the full working code (~100 lines) that ties everything together.
#!/usr/bin/env python3 """AI-Powered Code Review Tool""" import anthropic import sys import chromadb client = anthropic.Anthropic() chroma_client = chromadb.Client() # Setup RAG collection = chroma_client.create_collection( name="standards" ) collection.add( documents=[ "Use type hints for all functions", "Validate all user input", "Write tests for critical paths", "Use descriptive variable names", ], ids=["std_1", "std_2", "std_3", "std_4"] ) SYSTEM_PROMPT = """You are a senior Python code reviewer. Review for: correctness, style, performance, security, testing. Standards: {standards} Format: ## Summary [1 sentence overview] ## Issues Found [List with severity: 🔴 Critical, 🟡 Warning, 🔵 Info] ## Rating [Excellent/Good/Needs Review/Critical]""" def review(file_path: str): """Review a Python file""" # Read code with open(file_path) as f: code = f.read() # Retrieve relevant standards via RAG results = collection.query( query_texts=[code], n_results=2 ) standards = "\n".join( results["documents"][0] ) # Call Claude response = client.messages.create( model="claude-opus-4-6", max_tokens=800, system=SYSTEM_PROMPT.format( standards=standards ), messages=[{ "role": "user", "content": f"""Review this code: <code> {code} </code>""" }] ) print(response.content[0].text) if __name__ == "__main__": if len(sys.argv) < 2: print("Usage: python solution.py" ) sys.exit(1) review(sys.argv[1])
Usage:
pip install anthropic chromadb export ANTHROPIC_API_KEY="..." python solution.py mycode.py
Here are ways to extend the tool and make it production-grade:
Connect to GitHub API. Run reviews on PRs automatically. Post reviews as comments.
Have Claude generate fixed code snippets, not just critiques. Users can apply patches.
Track code quality scores over time. Identify most common issues. Build dashboards.
Create test cases with expected reviews. Measure review quality. Iterate on system prompt.
Add support for multiple languages (JS, Go, etc.) with language-specific standards. Implement a "compliance checker" that ensures code meets security/privacy standards. Build a web UI where teams can upload code and get reviews interactively. Create a feedback loop where developers can rate reviews, improving the system over time.
Congratulations! You've completed Phase 2: Applied Prompt Engineering. In 5 topics, you've learned:
Phase 3 covers advanced topics: Agents (autonomous decision-making), Memory (context management), Multimodal (handling images, audio), and Safety (preventing misuse). But the foundation you've built in Phase 2 applies everywhere.
1. In the code review tool, what is RAG used for?
2. Why does the system prompt evolve through multiple versions?
3. What are the main components of the complete code review tool?
You've learned the core patterns of production AI systems: RAG for knowledge, tool use for action, templates for reusability, chains for complexity, and evaluation for improvement. You've applied them in a real code review tool that integrates all these concepts. The tool reads files, retrieves coding standards, builds a dynamic prompt, calls Claude, and returns a structured review.
The key insight: Production AI systems are rarely just a single API call. They're orchestrations of multiple patterns working together — RAG + tools + templates + evaluation. Master this combination, and you can build almost anything.
Ready for Phase 3?
Topics 12-20 cover Agents, Advanced Prompting, Memory, Multimodal, and Safety. You've built a strong foundation.