Front Matter · Why This Book Exists
5 entries- F1Why This Book ExistsAgency spans sixty years of AI ideas, from STRIPS planners to GPT-powered coding agents; this book teaches them as one connected story.
front-matter/foreword.html - F2What This Book CoversThe four-part arc: foundations, learning, LLM agents, and deployed systems.
front-matter/what-this-book-covers.html - F3How to Use This BookReading paths for engineers, researchers, and self-study learners; how the parts depend on each other.
front-matter/how-to-use.html - F4About the AuthorsWho wrote this book and how.
front-matter/about-authors.html - F5About the Hands-On AI Science SeriesThe nine-book Hands-On AI Science series and where this volume fits.
front-matter/about-the-series.html
Part I · Foundations of Agency
7 chaptersWhat agents are, how they represent preferences as utility, how they plan toward goals, and how they reason with knowledge and uncertainty. Closes with the classical agent tooling stack.
-
0The Agent Development Stack Set up Python, Gymnasium, the Anthropic SDK, LangGraph, Qdrant, and Stable-Baselines3, then wire them into a minimal agent loop that runs end-to-end by the chapter's close.
part-1-foundations-of-agency/module-00-agent-development-stack/ -
1What Is an Agent? The PEAS framework, the agent-environment loop, rationality, and the taxonomy of task environments define what it means for a system to act intelligently in the world.
part-1-foundations-of-agency/module-01-what-is-an-agent/ -
2Agent Architectures Reactive, deliberative, and hybrid architectures; the BDI model; subsumption; the sense-plan-act loop. The design choices here echo across every chapter that follows.
part-1-foundations-of-agency/module-02-agent-architectures/ -
3Utility and Decision Theory Expected utility, von Neumann-Morgenstern axioms, risk attitudes, and multi-attribute utility. The utility function introduced here becomes the reward signal in Chapter 7.
part-1-foundations-of-agency/module-03-utility-decision-theory/ -
4Planning as Goal Pursuit Classical planning in STRIPS and PDDL, state-space search, A*, relaxation heuristics, and hierarchical task networks. The search tree here returns as MCTS in Chapter 8 and chain-of-thought in Chapter 18.
part-1-foundations-of-agency/module-04-planning-goal-pursuit/ -
5Knowledge and Reasoning in Agents Propositional and first-order logic, knowledge bases, inference, ontologies, and Bayesian networks for reasoning under uncertainty. The knowledge base returns as agentic RAG in Chapter 20.
part-1-foundations-of-agency/module-05-knowledge-reasoning/ -
6Tools of the Trade: Classical Agent Stack Comparative guide to unified-planning, py-pddl, owlready2, pgmpy, networkx, and Mesa. Code templates and decision guide carried forward through the book.
part-1-foundations-of-agency/module-06-tools-classical-agent-stack/
Part II · Learning Agents
9 chaptersAgents that improve from experience: Markov decision processes, tabular and deep reinforcement learning, model-based and hierarchical agents, memory systems, lifelong learning, and meta-learning.
-
7Markov Decision Processes The MDP formalism, Bellman equations, value functions, policy evaluation, policy iteration, value iteration, and the POMDP extension. The reward function operationalizes the utility function from Chapter 3.
part-2-learning-agents/module-07-markov-decision-processes/ -
8Reinforcement Learning Foundations Exploration-exploitation, multi-armed bandits, Monte Carlo methods, TD learning, Q-learning, and SARSA, with convergence guarantees for each.
part-2-learning-agents/module-08-reinforcement-learning-foundations/ -
9Deep Reinforcement Learning DQN and its extensions, the policy gradient theorem, REINFORCE, actor-critic, PPO, SAC, and TD3. Stability analysis and the transition from tabular to function approximation.
part-2-learning-agents/module-09-deep-reinforcement-learning/ -
10Model-Based Agents World models, the Dyna architecture, MBPO, Dreamer, and planning in latent space. Agents that learn to imagine before they act.
part-2-learning-agents/module-10-model-based-agents/ -
11Hierarchical Agents The options framework, semi-MDPs, feudal RL, goal-conditioned RL, temporal abstraction, and subgoal discovery. Agents that decompose long horizons into manageable steps.
part-2-learning-agents/module-11-hierarchical-agents/ -
12Agent Memory Systems The full taxonomy of agent memory: sensory, working, episodic, semantic, and procedural. Indexing, retrieval, and forgetting, built with Qdrant and mem0. Returns as agentic RAG in Chapter 20.
part-2-learning-agents/module-12-agent-memory-systems/ -
13Lifelong and Meta-Learning Agents Catastrophic forgetting, EWC, progressive networks, and replay buffers for continual learning. MAML, Reptile, and in-context learning as meta-learning for few-shot adaptation.
part-2-learning-agents/module-13-lifelong-meta-learning/ -
14Self-Reflective Agents Metacognition, introspective monitoring, uncertainty estimation, calibration, and the Reflexion architecture. Agents that monitor and critique their own reasoning before committing to an action.
part-2-learning-agents/module-14-self-reflective-agents/ -
15Tools of the Trade: Learning Agent Stack Comparative guide to Gymnasium, Stable-Baselines3, CleanRL, RLlib, Avalanche, learn2learn, and Dreamer-v3. Training loop templates and experiment tracking patterns.
part-2-learning-agents/module-15-tools-learning-agent-stack/
Part III · LLM-Powered Agents
11 chaptersLanguage models as policies: prompt engineering, extended thinking, tool use and MCP, agentic RAG, autonomous task agents, computer use, self-improvement, and multi-modal agents.
-
16Foundations of LLM Agents Emergent capabilities, in-context learning, instruction following, and the LLM as a conditional policy over actions. A minimal agent loop with the Anthropic SDK.
part-3-llm-powered-agents/module-16-foundations-llm-agents/ -
17Prompt Engineering for Agents Zero-shot and few-shot prompting, chain-of-thought, ReAct, Tree of Thoughts, Reflexion, self-consistency, and automatic prompt optimization with DSPy and TextGrad.
part-3-llm-powered-agents/module-17-prompt-engineering-agents/ -
18Reasoning Agents Process reward models, outcome reward models, test-time compute scaling and extended thinking, and MCTS over reasoning steps. Formal analysis of why longer chains improve accuracy.
part-3-llm-powered-agents/module-18-reasoning-agents/ -
19Tool-Using Agents Tool selection policies, function calling, the Model Context Protocol architecture, parallel tool use, and error recovery. Tool use from Chapter 4 returns here at LLM scale.
part-3-llm-powered-agents/module-19-tool-using-agents/ -
20Memory-Augmented Agents and Agentic RAG RAG, dense and sparse retrieval, hybrid search. Agentic RAG: the retriever as a policy, query rewriting, multi-hop reasoning, and confidence-gated termination. Knowledge base meets memory system.
part-3-llm-powered-agents/module-20-memory-augmented-agentic-rag/ -
21Autonomous Task Agents Task decomposition, plan-and-execute architectures, long-horizon completion, and failure recovery. Evaluated on GAIA, AgentBench, and tau-bench.
part-3-llm-powered-agents/module-21-autonomous-task-agents/ -
22Computer-Use and GUI Agents Pixel-level observation spaces, OS-level action spaces, GUI grounding, web navigation, and the accessibility tree versus screenshot tradeoff. Benchmarked on OSWorld and WebArena.
part-3-llm-powered-agents/module-22-computer-use-gui-agents/ -
23Self-Improving Agents STaR, Constitutional AI, RLHF, DPO, and iterative self-play. Agents that bootstrap their own training signal and the safety constraints that keep self-modification bounded.
part-3-llm-powered-agents/module-23-self-improving-agents/ -
24Multi-Modal Agents Vision-language models as policies, audio-language agents, cross-modal reasoning, and tokenizing non-text modalities. Benchmarked on MMBench, MMMU, and MathVista.
part-3-llm-powered-agents/module-24-multimodal-agents/ -
25Agent Evaluation Static versus dynamic evaluation, benchmark contamination, LLM-as-judge, process versus outcome metrics. A practical harness for evaluating any agent you build in this book.
part-3-llm-powered-agents/module-25-agent-evaluation/ -
26Tools of the Trade: LLM Agent Stack Comparative guide to LangGraph, AutoGen/AG2, CrewAI, smolagents, and the OpenAI and Anthropic agent SDKs. LangSmith for tracing, evaluation, and prompt management.
part-3-llm-powered-agents/module-26-tools-llm-agent-stack/
Part IV · Multi-Agent and Deployed Systems
12 chaptersAgents working together and at scale: game theory, MARL, swarms, software engineering agents, research agents, observability and cost, embodied and robotics agents, safety, and governance.
-
27Foundations of Multi-Agent Systems Game theory, Nash equilibrium, mechanism design, communication protocols, and the Dec-POMDP formulation of cooperative multi-agent problems.
part-4-multi-agent-deployed-systems/module-27-foundations-multi-agent-systems/ -
28Coordination and Negotiation Contract nets, combinatorial auctions, coalition formation, DCOP, and debate and critique protocols in LLM multi-agent systems.
part-4-multi-agent-deployed-systems/module-28-coordination-negotiation/ -
29Multi-Agent Reinforcement Learning CTDE, QMIX, MADDPG, MAPPO, non-stationarity, and credit assignment. Benchmarked on SMAC and Google Research Football.
part-4-multi-agent-deployed-systems/module-29-multi-agent-reinforcement-learning/ -
30Swarm Intelligence and Emergence Ant colony optimization, particle swarms, boid flocking, self-organization, and emergence as a formal information-theoretic concept. From biological swarms to LLM populations.
part-4-multi-agent-deployed-systems/module-30-swarm-intelligence-emergence/ -
31Software Engineering Agents Code generation as search, program synthesis, test-driven development by agents, repository-level reasoning, and the CodeAct action space. Benchmarked on SWE-bench Verified. The book's own codebase is the working example.
part-4-multi-agent-deployed-systems/module-31-software-engineering-agents/ -
32Research and Scientific Agents The scientific method as an agent loop, hypothesis generation, experiment design, automated literature review, novelty detection, and the AI Scientist architecture.
part-4-multi-agent-deployed-systems/module-32-research-scientific-agents/ -
33AI Teams and Orchestration Role specialization, debate and critique, majority voting, mixture-of-agents, and orchestrator-worker patterns. A multi-agent software company built with AutoGen.
part-4-multi-agent-deployed-systems/module-33-ai-teams-orchestration/ -
34Observability, Tracing, and Cost Engineering Span-based tracing, token accounting, latency attribution. Token economy, model routing by difficulty, KV-cache reuse, and the latency-quality Pareto frontier.
part-4-multi-agent-deployed-systems/module-34-observability-tracing-cost-engineering/ -
35Embodied and Robotics Agents Kinematics, motion planning, manipulation, the sim-to-real gap, and vision-language-action models including OpenVLA and pi-zero. Benchmarked on LIBERO and Meta-World.
part-4-multi-agent-deployed-systems/module-35-embodied-robotics-agents/ -
36Agent Safety and Red-Teaming Threat model for agentic systems: prompt injection, goal hijacking, reward hacking. Alignment, corrigibility, Constitutional AI, scalable oversight, and debate. Benchmarked on AgentHarm and HarmBench.
part-4-multi-agent-deployed-systems/module-36-agent-safety-red-teaming/ -
37Governance and Alignment EU AI Act, NIST AI RMF, human oversight requirements, liability in autonomous systems, auditing, and international coordination for agentic AI.
part-4-multi-agent-deployed-systems/module-37-governance-alignment/ -
38Tools of the Trade: Deployed Agent Stack Modal for serverless agent execution, Prefect for orchestration, LangSmith for observability, Qdrant for vector memory, and Docker plus API gateway for agent services. Architecture of a production agentic system.
part-4-multi-agent-deployed-systems/module-38-tools-deployed-agent-stack/
Appendices & Capstone
5 appendices & a capstone- AMathematical FoundationsProbability, linear algebra for policies, dynamic programming, graph theory, information theory, and convex optimization.
appendices/appendix-a-mathematical-foundations/ - BAgent Framework EcosystemLangChain, LangGraph, AutoGen/AG2, CrewAI, Semantic Kernel, Haystack, OpenAI Agents SDK, Anthropic Claude Code SDK, smolagents, Google ADK.
appendices/appendix-b-framework-ecosystem/ - CSimulation and Environment PlatformsGymnasium, Isaac Lab, MuJoCo, CARLA, NetLogo, Mesa, PettingZoo, BrowserGym. Hardware requirements and cloud alternatives.
appendices/appendix-c-simulation-platforms/ - DResearch Frontiers IndexLiving table of open problems mapped to ArXiv IDs, benchmarks, and key research groups, organized by chapter.
appendices/appendix-d-research-frontiers/ - ECapstone ProjectsTen end-to-end agentic AI projects with starter code, architecture diagrams, grading rubrics, and AGENT_PROMPT.md files for AI-assisted extension.
capstone/