Autonomous AI agents go beyond simple chatbots. They plan, execute tools, remember context, and self-correct โ all without human intervention. In this guide, you'll build one from scratch using Python, any LLM provider, and a simple ReAct loop.
1. Architecture Overview
The ReAct Agent Loop
Reasoning Engine
Code, Search, Files
Vector Store
Final Answer
The agent follows a Reason โ Act โ Observe cycle until the task is complete. Each iteration, the LLM decides whether to call a tool or return a final answer.
2. Core Agent Class
Start with a minimal agent that manages conversation history and tool execution:
import json, httpx
class Agent:
def __init__(self, model="gpt-5", tools=None):
self.model = model
self.tools = tools or []
self.history = []
self.max_iterations = 10
async def run(self, task: str) -> str:
self.history.append({"role": "user", "content": task})
for i in range(self.max_iterations):
response = await self._call_llm()
if response.get("tool_calls"):
for call in response["tool_calls"]:
result = await self._execute_tool(call)
self.history.append({
"role": "tool",
"tool_call_id": call["id"],
"content": json.dumps(result)
})
else:
return response["content"]
return "Max iterations reached"
3. Tool Registration
Tools are Python functions with JSON Schema descriptions. The LLM uses these schemas to decide when and how to call each tool:
def register_tool(name, description, parameters, handler):
return {
"type": "function",
"function": {
"name": name,
"description": description,
"parameters": parameters
},
"_handler": handler
}
# Example: Web search tool
async def web_search(query: str) -> dict:
async with httpx.AsyncClient() as client:
resp = await client.get(f"https://api.search.io?q={query}")
return resp.json()
search_tool = register_tool(
"web_search",
"Search the web for current information",
{"type": "object", "properties": {
"query": {"type": "string", "description": "Search query"}
}, "required": ["query"]},
web_search
)
4. Adding Persistent Memory
Without memory, your agent forgets everything between sessions. Use a vector database to store and retrieve relevant context:
import chromadb
class AgentMemory:
def __init__(self, persist_dir="./agent_memory"):
self.client = chromadb.PersistentClient(path=persist_dir)
self.collection = self.client.get_or_create_collection("agent_kb")
def store(self, text: str, metadata: dict = None):
doc_id = f"mem_{hash(text)}"
self.collection.upsert(
documents=[text],
ids=[doc_id],
metadatas=[metadata or {}]
)
def recall(self, query: str, n=5) -> list:
results = self.collection.query(
query_texts=[query], n_results=n
)
return results["documents"][0]
5. Self-Healing Error Recovery
Production agents must handle failures gracefully. Wrap tool execution in retry logic with exponential backoff:
import asyncio
async def safe_execute(tool_fn, args, max_retries=3):
for attempt in range(max_retries):
try:
return await tool_fn(**args)
except Exception as e:
if attempt == max_retries - 1:
return {"error": str(e)}
wait = 2 ** attempt
await asyncio.sleep(wait)
# Agent can learn from failures
print(f"โ ๏ธ Retry {attempt+1}/{max_retries}: {e}")
max_iterations limit on your agent loop. Without one, runaway agents can burn through API credits in minutes.
6. Multi-Model Arbitrage
Don't lock into one provider. Route requests based on complexity and cost:
MODEL_TIERS = {
"simple": {"model": "gemini-2.5-flash", "cost": 0.001},
"medium": {"model": "gpt-5-mini", "cost": 0.01},
"complex": {"model": "claude-4.5", "cost": 0.05},
}
def select_model(task_complexity: str) -> dict:
return MODEL_TIERS.get(task_complexity, MODEL_TIERS["medium"])
This is exactly what MoltBot Cloud's arbitrage router does โ routing 10+ models to minimize cost while maximizing quality.
7. Putting It All Together
async def main():
memory = AgentMemory()
agent = Agent(
model="gpt-5",
tools=[search_tool, file_tool, code_tool]
)
agent.memory = memory
result = await agent.run(
"Research the top 3 AI agent frameworks in 2026, "
"compare their pricing, and write a summary report."
)
print(result)
memory.store(result, {"task": "framework_comparison"})
asyncio.run(main())
8. Production Checklist
- โ
Set
max_iterations(10-20 for most tasks) - โ Add retry logic with exponential backoff
- โ Use persistent vector memory (ChromaDB, Pinecone)
- โ Implement cost tracking per request
- โ Add structured logging for debugging
- โ Rate-limit tool calls to prevent abuse
- โ Monitor token usage across providers
- โ Use model arbitrage to cut costs 60-80%
Skip the Build โ Deploy in 60 Seconds
MoltBot Cloud gives you production-ready autonomous agents with built-in memory, tool use, multi-model arbitrage, and Stripe billing. From $49/month.
Start Free Trial โ