Table of Contents

The theory and practice of autonomous AI: from goal-directed reasoning and planning to multi-agent coordination and real-world deployment.

First Edition · 2026

4 parts · 39 chapters, plus front matter, 5 appendices, and a capstone.

Front Matter · Why This Book Exists

5 entries
  1. F1
    Why This Book ExistsAgency spans sixty years of AI ideas, from STRIPS planners to GPT-powered coding agents; this book teaches them as one connected story.
    front-matter/foreword.html
  2. F2
    What This Book CoversThe four-part arc: foundations, learning, LLM agents, and deployed systems.
    front-matter/what-this-book-covers.html
  3. F3
    How to Use This BookReading paths for engineers, researchers, and self-study learners; how the parts depend on each other.
    front-matter/how-to-use.html
  4. F4
    About the AuthorsWho wrote this book and how.
    front-matter/about-authors.html
  5. F5
    About the Hands-On AI Science SeriesThe nine-book Hands-On AI Science series and where this volume fits.
    front-matter/about-the-series.html

Part I · Foundations of Agency

7 chapters

What agents are, how they represent preferences as utility, how they plan toward goals, and how they reason with knowledge and uncertainty. Closes with the classical agent tooling stack.

  1. 0
    The Agent Development Stack Set up Python, Gymnasium, the Anthropic SDK, LangGraph, Qdrant, and Stable-Baselines3, then wire them into a minimal agent loop that runs end-to-end by the chapter's close.
    part-1-foundations-of-agency/module-00-agent-development-stack/
  2. 1
    What Is an Agent? The PEAS framework, the agent-environment loop, rationality, and the taxonomy of task environments define what it means for a system to act intelligently in the world.
    part-1-foundations-of-agency/module-01-what-is-an-agent/
  3. 2
    Agent Architectures Reactive, deliberative, and hybrid architectures; the BDI model; subsumption; the sense-plan-act loop. The design choices here echo across every chapter that follows.
    part-1-foundations-of-agency/module-02-agent-architectures/
  4. 3
    Utility and Decision Theory Expected utility, von Neumann-Morgenstern axioms, risk attitudes, and multi-attribute utility. The utility function introduced here becomes the reward signal in Chapter 7.
    part-1-foundations-of-agency/module-03-utility-decision-theory/
  5. 4
    Planning as Goal Pursuit Classical planning in STRIPS and PDDL, state-space search, A*, relaxation heuristics, and hierarchical task networks. The search tree here returns as MCTS in Chapter 8 and chain-of-thought in Chapter 18.
    part-1-foundations-of-agency/module-04-planning-goal-pursuit/
  6. 5
    Knowledge and Reasoning in Agents Propositional and first-order logic, knowledge bases, inference, ontologies, and Bayesian networks for reasoning under uncertainty. The knowledge base returns as agentic RAG in Chapter 20.
    part-1-foundations-of-agency/module-05-knowledge-reasoning/
  7. 6
    Tools of the Trade: Classical Agent Stack Comparative guide to unified-planning, py-pddl, owlready2, pgmpy, networkx, and Mesa. Code templates and decision guide carried forward through the book.
    part-1-foundations-of-agency/module-06-tools-classical-agent-stack/

Part II · Learning Agents

9 chapters

Agents that improve from experience: Markov decision processes, tabular and deep reinforcement learning, model-based and hierarchical agents, memory systems, lifelong learning, and meta-learning.

  1. 7
    Markov Decision Processes The MDP formalism, Bellman equations, value functions, policy evaluation, policy iteration, value iteration, and the POMDP extension. The reward function operationalizes the utility function from Chapter 3.
    part-2-learning-agents/module-07-markov-decision-processes/
  2. 8
    Reinforcement Learning Foundations Exploration-exploitation, multi-armed bandits, Monte Carlo methods, TD learning, Q-learning, and SARSA, with convergence guarantees for each.
    part-2-learning-agents/module-08-reinforcement-learning-foundations/
  3. 9
    Deep Reinforcement Learning DQN and its extensions, the policy gradient theorem, REINFORCE, actor-critic, PPO, SAC, and TD3. Stability analysis and the transition from tabular to function approximation.
    part-2-learning-agents/module-09-deep-reinforcement-learning/
  4. 10
    Model-Based Agents World models, the Dyna architecture, MBPO, Dreamer, and planning in latent space. Agents that learn to imagine before they act.
    part-2-learning-agents/module-10-model-based-agents/
  5. 11
    Hierarchical Agents The options framework, semi-MDPs, feudal RL, goal-conditioned RL, temporal abstraction, and subgoal discovery. Agents that decompose long horizons into manageable steps.
    part-2-learning-agents/module-11-hierarchical-agents/
  6. 12
    Agent Memory Systems The full taxonomy of agent memory: sensory, working, episodic, semantic, and procedural. Indexing, retrieval, and forgetting, built with Qdrant and mem0. Returns as agentic RAG in Chapter 20.
    part-2-learning-agents/module-12-agent-memory-systems/
  7. 13
    Lifelong and Meta-Learning Agents Catastrophic forgetting, EWC, progressive networks, and replay buffers for continual learning. MAML, Reptile, and in-context learning as meta-learning for few-shot adaptation.
    part-2-learning-agents/module-13-lifelong-meta-learning/
  8. 14
    Self-Reflective Agents Metacognition, introspective monitoring, uncertainty estimation, calibration, and the Reflexion architecture. Agents that monitor and critique their own reasoning before committing to an action.
    part-2-learning-agents/module-14-self-reflective-agents/
  9. 15
    Tools of the Trade: Learning Agent Stack Comparative guide to Gymnasium, Stable-Baselines3, CleanRL, RLlib, Avalanche, learn2learn, and Dreamer-v3. Training loop templates and experiment tracking patterns.
    part-2-learning-agents/module-15-tools-learning-agent-stack/

Part III · LLM-Powered Agents

11 chapters

Language models as policies: prompt engineering, extended thinking, tool use and MCP, agentic RAG, autonomous task agents, computer use, self-improvement, and multi-modal agents.

  1. 16
    Foundations of LLM Agents Emergent capabilities, in-context learning, instruction following, and the LLM as a conditional policy over actions. A minimal agent loop with the Anthropic SDK.
    part-3-llm-powered-agents/module-16-foundations-llm-agents/
  2. 17
    Prompt Engineering for Agents Zero-shot and few-shot prompting, chain-of-thought, ReAct, Tree of Thoughts, Reflexion, self-consistency, and automatic prompt optimization with DSPy and TextGrad.
    part-3-llm-powered-agents/module-17-prompt-engineering-agents/
  3. 18
    Reasoning Agents Process reward models, outcome reward models, test-time compute scaling and extended thinking, and MCTS over reasoning steps. Formal analysis of why longer chains improve accuracy.
    part-3-llm-powered-agents/module-18-reasoning-agents/
  4. 19
    Tool-Using Agents Tool selection policies, function calling, the Model Context Protocol architecture, parallel tool use, and error recovery. Tool use from Chapter 4 returns here at LLM scale.
    part-3-llm-powered-agents/module-19-tool-using-agents/
  5. 20
    Memory-Augmented Agents and Agentic RAG RAG, dense and sparse retrieval, hybrid search. Agentic RAG: the retriever as a policy, query rewriting, multi-hop reasoning, and confidence-gated termination. Knowledge base meets memory system.
    part-3-llm-powered-agents/module-20-memory-augmented-agentic-rag/
  6. 21
    Autonomous Task Agents Task decomposition, plan-and-execute architectures, long-horizon completion, and failure recovery. Evaluated on GAIA, AgentBench, and tau-bench.
    part-3-llm-powered-agents/module-21-autonomous-task-agents/
  7. 22
    Computer-Use and GUI Agents Pixel-level observation spaces, OS-level action spaces, GUI grounding, web navigation, and the accessibility tree versus screenshot tradeoff. Benchmarked on OSWorld and WebArena.
    part-3-llm-powered-agents/module-22-computer-use-gui-agents/
  8. 23
    Self-Improving Agents STaR, Constitutional AI, RLHF, DPO, and iterative self-play. Agents that bootstrap their own training signal and the safety constraints that keep self-modification bounded.
    part-3-llm-powered-agents/module-23-self-improving-agents/
  9. 24
    Multi-Modal Agents Vision-language models as policies, audio-language agents, cross-modal reasoning, and tokenizing non-text modalities. Benchmarked on MMBench, MMMU, and MathVista.
    part-3-llm-powered-agents/module-24-multimodal-agents/
  10. 25
    Agent Evaluation Static versus dynamic evaluation, benchmark contamination, LLM-as-judge, process versus outcome metrics. A practical harness for evaluating any agent you build in this book.
    part-3-llm-powered-agents/module-25-agent-evaluation/
  11. 26
    Tools of the Trade: LLM Agent Stack Comparative guide to LangGraph, AutoGen/AG2, CrewAI, smolagents, and the OpenAI and Anthropic agent SDKs. LangSmith for tracing, evaluation, and prompt management.
    part-3-llm-powered-agents/module-26-tools-llm-agent-stack/

Part IV · Multi-Agent and Deployed Systems

12 chapters

Agents working together and at scale: game theory, MARL, swarms, software engineering agents, research agents, observability and cost, embodied and robotics agents, safety, and governance.

  1. 27
    Foundations of Multi-Agent Systems Game theory, Nash equilibrium, mechanism design, communication protocols, and the Dec-POMDP formulation of cooperative multi-agent problems.
    part-4-multi-agent-deployed-systems/module-27-foundations-multi-agent-systems/
  2. 28
    Coordination and Negotiation Contract nets, combinatorial auctions, coalition formation, DCOP, and debate and critique protocols in LLM multi-agent systems.
    part-4-multi-agent-deployed-systems/module-28-coordination-negotiation/
  3. 29
    Multi-Agent Reinforcement Learning CTDE, QMIX, MADDPG, MAPPO, non-stationarity, and credit assignment. Benchmarked on SMAC and Google Research Football.
    part-4-multi-agent-deployed-systems/module-29-multi-agent-reinforcement-learning/
  4. 30
    Swarm Intelligence and Emergence Ant colony optimization, particle swarms, boid flocking, self-organization, and emergence as a formal information-theoretic concept. From biological swarms to LLM populations.
    part-4-multi-agent-deployed-systems/module-30-swarm-intelligence-emergence/
  5. 31
    Software Engineering Agents Code generation as search, program synthesis, test-driven development by agents, repository-level reasoning, and the CodeAct action space. Benchmarked on SWE-bench Verified. The book's own codebase is the working example.
    part-4-multi-agent-deployed-systems/module-31-software-engineering-agents/
  6. 32
    Research and Scientific Agents The scientific method as an agent loop, hypothesis generation, experiment design, automated literature review, novelty detection, and the AI Scientist architecture.
    part-4-multi-agent-deployed-systems/module-32-research-scientific-agents/
  7. 33
    AI Teams and Orchestration Role specialization, debate and critique, majority voting, mixture-of-agents, and orchestrator-worker patterns. A multi-agent software company built with AutoGen.
    part-4-multi-agent-deployed-systems/module-33-ai-teams-orchestration/
  8. 34
    Observability, Tracing, and Cost Engineering Span-based tracing, token accounting, latency attribution. Token economy, model routing by difficulty, KV-cache reuse, and the latency-quality Pareto frontier.
    part-4-multi-agent-deployed-systems/module-34-observability-tracing-cost-engineering/
  9. 35
    Embodied and Robotics Agents Kinematics, motion planning, manipulation, the sim-to-real gap, and vision-language-action models including OpenVLA and pi-zero. Benchmarked on LIBERO and Meta-World.
    part-4-multi-agent-deployed-systems/module-35-embodied-robotics-agents/
  10. 36
    Agent Safety and Red-Teaming Threat model for agentic systems: prompt injection, goal hijacking, reward hacking. Alignment, corrigibility, Constitutional AI, scalable oversight, and debate. Benchmarked on AgentHarm and HarmBench.
    part-4-multi-agent-deployed-systems/module-36-agent-safety-red-teaming/
  11. 37
    Governance and Alignment EU AI Act, NIST AI RMF, human oversight requirements, liability in autonomous systems, auditing, and international coordination for agentic AI.
    part-4-multi-agent-deployed-systems/module-37-governance-alignment/
  12. 38
    Tools of the Trade: Deployed Agent Stack Modal for serverless agent execution, Prefect for orchestration, LangSmith for observability, Qdrant for vector memory, and Docker plus API gateway for agent services. Architecture of a production agentic system.
    part-4-multi-agent-deployed-systems/module-38-tools-deployed-agent-stack/

Appendices & Capstone

5 appendices & a capstone
  1. A
    Mathematical FoundationsProbability, linear algebra for policies, dynamic programming, graph theory, information theory, and convex optimization.
    appendices/appendix-a-mathematical-foundations/
  2. B
    Agent Framework EcosystemLangChain, LangGraph, AutoGen/AG2, CrewAI, Semantic Kernel, Haystack, OpenAI Agents SDK, Anthropic Claude Code SDK, smolagents, Google ADK.
    appendices/appendix-b-framework-ecosystem/
  3. C
    Simulation and Environment PlatformsGymnasium, Isaac Lab, MuJoCo, CARLA, NetLogo, Mesa, PettingZoo, BrowserGym. Hardware requirements and cloud alternatives.
    appendices/appendix-c-simulation-platforms/
  4. D
    Research Frontiers IndexLiving table of open problems mapped to ArXiv IDs, benchmarks, and key research groups, organized by chapter.
    appendices/appendix-d-research-frontiers/
  5. E
    Capstone ProjectsTen end-to-end agentic AI projects with starter code, architecture diagrams, grading rubrics, and AGENT_PROMPT.md files for AI-assisted extension.
    capstone/