RAG vs AI Agents vs Agentic RAG¶
/// caption
RAG vs AI Agents vs Agentic RAG — drop rag-vs-agents.png into docs/assets/images/ to display. Original credit: Rakesh Gohel (@rakeshgohel01).
///
Comparing the frontier of GenAI architectures. Three terms that sound similar but solve different levels of problem.
Simple mental model: - RAG → find information - Agents → use tools and reason - Agentic RAG → coordinate multiple systems to solve complex problems
1. RAG (Retrieval-Augmented Generation)¶
The foundation layer. Instead of relying only on what the model learned during training, the system retrieves external information first, then generates an answer grounded in that information.
Workflow¶
flowchart LR
U([👤 User]) -->|Query| E[Embedding]
E --> V[(Vector DB)]
V -->|Retrieved Docs| A[Augmented Prompt]
U -.->|Query| A
A --> L[🤖 LLM]
L -->|Output| O([Response])
style U fill:#e3f2fd,stroke:#1976d2,color:#000
style V fill:#fff3e0,stroke:#f57c00,color:#000
style A fill:#fce4ec,stroke:#c2185b,color:#000
style L fill:#e8f5e9,stroke:#2e7d32,color:#000
style O fill:#e3f2fd,stroke:#1976d2,color:#000
Step-by-step¶
- User asks a question
- Query is converted into embeddings
- Vector database retrieves relevant documents
- Retrieved info is added to the prompt (augmentation)
- LLM generates the final answer (generation)
Best for¶
- Internal knowledge bases
- Documentation search
- Company-data assistants
- Customer-support bots over a fixed corpus
Key property¶
The model doesn't act. It only answers with better context.
2. AI Agents¶
Agents go a step further than RAG. Instead of just retrieving information, an agent can reason about a task and take actions in the world.
Workflow¶
flowchart LR
U([👤 User]) -->|Query| AG[🤖 Agent]
M[🧠 Memory] --> AG
P[📋 Planning<br/>ReAct/CoT] --> AG
AG <-->|Use| T[🔧 Tools]
T --> D[(Data Sources)]
AG -->|Output| O([Response])
style U fill:#e3f2fd,stroke:#1976d2,color:#000
style M fill:#fce4ec,stroke:#c2185b,color:#000
style P fill:#fff3e0,stroke:#f57c00,color:#000
style AG fill:#e8f5e9,stroke:#2e7d32,color:#000
style T fill:#f3e5f5,stroke:#7b1fa2,color:#000
style D fill:#fff8e1,stroke:#fbc02d,color:#000
style O fill:#e3f2fd,stroke:#1976d2,color:#000
Step-by-step¶
- User sends a query
- Agent analyses the task
- Agent uses memory and planning frameworks (ReAct, Reflexion, Chain-of-Thought)
- Agent calls external tools or APIs
- Agent synthesises results into a response
Best for¶
- Automating workflows
- Multi-step reasoning tasks
- Tool-based execution (search, APIs, databases, file systems)
- Anything where the answer requires acting, not just looking up
Key property¶
Agents can think and act, not just retrieve.
Planning frameworks worth knowing¶
| Framework | What It Does |
|---|---|
| ReAct (Reasoning + Acting) | Interleaves reasoning steps with tool actions |
| Reflexion | Adds a self-critique loop — agent reviews its own output and retries |
| Chain-of-Thought (CoT) | Explicit step-by-step reasoning before answering |
| Tree-of-Thoughts | Explores multiple reasoning paths in parallel |
| Plan-and-Execute | Separates a planning step from individual action steps |
3. Agentic RAG¶
Where things get powerful. Agentic RAG combines retrieval with autonomous agents — instead of a single retrieval pipeline, multiple agents handle different retrieval tasks, coordinated by an aggregator.
Workflow¶
flowchart LR
U([👤 User]) -->|Query| AGG[🎯 Aggregator<br/>Agent]
M[🧠 Memory<br/>ST + LT] --> AGG
P[📋 Planning<br/>ReAct/CoT] --> AGG
AGG -->|Plan| A1[🤖 Agent 1]
AGG -->|Plan| A2[🤖 Agent 2]
AGG -->|Plan| A3[🤖 Agent 3]
A1 -->|MCP| S1[💾 Local Data<br/>Server]
A2 -->|MCP| S2[🔍 Search<br/>Engine]
A3 -->|MCP| S3[☁️ Cloud<br/>AWS / Azure]
A1 -.->|Results| GEN[✨ Generative<br/>Model]
A2 -.->|Results| GEN
A3 -.->|Results| GEN
GEN -->|Output| O([Response])
style U fill:#e3f2fd,stroke:#1976d2,color:#000
style AGG fill:#fff3e0,stroke:#f57c00,color:#000,stroke-width:3px
style M fill:#fce4ec,stroke:#c2185b,color:#000
style P fill:#fce4ec,stroke:#c2185b,color:#000
style A1 fill:#e8f5e9,stroke:#2e7d32,color:#000
style A2 fill:#e8f5e9,stroke:#2e7d32,color:#000
style A3 fill:#e8f5e9,stroke:#2e7d32,color:#000
style S1 fill:#f3e5f5,stroke:#7b1fa2,color:#000
style S2 fill:#f3e5f5,stroke:#7b1fa2,color:#000
style S3 fill:#f3e5f5,stroke:#7b1fa2,color:#000
style GEN fill:#fff8e1,stroke:#fbc02d,color:#000
style O fill:#e3f2fd,stroke:#1976d2,color:#000
Step-by-step¶
- User query goes to a coordinator / aggregator agent
- The aggregator plans how to retrieve information
- Tasks are distributed to specialised retrieval agents (each owns a domain or source)
- Agents gather data from multiple sources in parallel
- Results are aggregated, deduplicated, ranked
- The generative model produces the final answer
Best for¶
- Enterprise AI assistants spanning many data systems
- Research copilots that combine web + internal + structured data
- Legal AI tools that draw on case law + statute + internal precedent
- Complex workflow automation across multiple tools
Key properties¶
- Dynamic — the path through the system adapts per query
- Parallel — agents work concurrently across sources
- Specialised — each agent can be optimised for its source
- Modular — add a new source by adding an agent + MCP server, no pipeline rewrite
MCP's role here¶
In modern Agentic RAG deployments, the retrieval agents reach external systems through MCP servers — local data, search engines, cloud services all expose their capabilities through MCP. See mcp-servers-faq.md for protocol detail.
Side-by-Side Comparison¶
| Dimension | RAG | AI Agents | Agentic RAG |
|---|---|---|---|
| Primary action | Retrieve + generate | Reason + act | Coordinate + retrieve + generate |
| Tool use | No | Yes | Yes — distributed across agents |
| Multi-step? | No (single retrieval → answer) | Yes | Yes, parallel across agents |
| Sources | Usually one vector DB | One or many tools | Many, accessed via specialised agents |
| Planning | Implicit in prompt | ReAct, Reflexion, CoT | Multi-level — aggregator + per-agent |
| Memory | None (stateless) | Short-term + sometimes long-term | Short-term + long-term, shared and per-agent |
| Latency | Lowest | Medium (each tool call adds cost) | Higher — but parallelism mitigates |
| Cost | Lowest per query | Medium | Highest per query |
| Failure modes | Bad retrieval; hallucination on top of bad context | Tool misuse; cascading errors across steps | All of the above + coordination failures |
| When to choose | Q&A over a corpus | Task automation needing actions | Complex tasks spanning multiple sources/domains |
Why This Matters¶
Traditional RAG = static retrieval pipeline. Agentic RAG = dynamic, intelligent retrieval system.
That's why many advanced AI products now use multi-agent RAG architectures: - Enterprise AI assistants - Research copilots - Legal AI tools - Complex workflow automation - Customer-support escalation systems - Software-engineering assistants (read code + search docs + run tests + open PRs)
This architecture is quickly becoming the backbone of modern AI systems.
Testing Implications by Architecture¶
For QE/AI testers, the test surface expands sharply with each step up the architectural ladder.
| Architecture | Test Surface |
|---|---|
| RAG | Retrieval quality (precision/recall), generation faithfulness, hallucination, citation validity, context-window handling |
| Agents | All of the above + tool-call correctness (right tool, right args, right order), trace-level assertions, latency/cost budgets, error handling on tool failure, authority/authorisation checks |
| Agentic RAG | All of the above + coordination correctness (aggregator decisions), parallel-execution determinism, conflict resolution between agents, end-to-end latency under concurrency, cascading-failure handling |
Cross-references:
- RAG eval specifics → ragas-faq.md
- Agent/tool testing → mcp-servers-faq.md
- Adversarial/red-team coverage → red-blue-purple-team-ai-faq.md
Interview Sound-Bites¶
- "RAG retrieves, agents act, Agentic RAG coordinates — they're layers, not alternatives. Most production systems start at RAG and grow into Agentic RAG as the use case demands more sources and more reasoning."
- "The interesting testing problem in Agentic RAG isn't any single component — it's the coordination. The aggregator's decisions are where most subtle quality regressions show up, and trace-level assertions are how you catch them."
- "MCP is what makes Agentic RAG practical at enterprise scale — without a standard protocol, every new source means a bespoke integration; with MCP it's just another server the aggregator can discover."
Credits¶
Visual reference: original diagram by Rakesh Gohel (@rakeshgohel01) — see his newsletter for more.