Autonomous QA Multi-Agent Pipeline — Real-World Case Study¶
Source: Sumanth Reddy (LinkedIn, May 2026) — QA Automation Engineer, 8+ years
Stack: TypeScript · Python · Kafka · Playwright · MongoDB · Splunk · Jira
Pattern: Fully autonomous multi-agent pipeline for zero-intervention QA
Overview¶
A real-world implementation of a 7-agent autonomous QA system built for a logistics platform. The pipeline runs end-to-end from Jira ticket ingestion through to test generation, evidence verification, and gap reporting — with no manual intervention required.
Key outcomes: - 80% reduction in manual QA effort - Self-healing tests (up to 3 retries with automated diagnosis) - 10,000+ production log events auto-classified - Jira tickets automatically created with supporting evidence
The 7-Agent Pipeline¶
Jira Intake Agent
│
▼
Test Finder Agent
│
▼
Dev Code Inspector
│
▼
Test Author
│
▼
Evidence Verifier
│
▼
Gap Reporter
│
▼
QA Orchestrator ←──── runs the full pipeline end-to-end
Agent Breakdown¶
1. Jira Intake Agent¶
Role: Entry point — fetches Jira tickets and extracts structured requirements
Inputs: Jira project/sprint/ticket ID
Outputs: Structured requirements object { ticket_id, summary, acceptance_criteria, story_points }
Testing considerations:
- Does it correctly parse tickets with missing or malformed acceptance criteria?
- Does it handle Jira API rate limits gracefully?
- Are extracted requirements semantically complete?
2. Test Finder Agent¶
Role: Searches existing test coverage to avoid duplication
Inputs: Extracted requirements from Jira Intake Agent
Outputs: List of existing tests that cover (fully or partially) the requirement
Testing considerations:
- Semantic similarity matching — does it find tests that cover the same behaviour described differently?
- False negative rate — how often does it miss existing coverage and trigger redundant test authoring?
3. Dev Code Inspector¶
Role: Reads Java source code and maps implementation behaviour
Inputs: Repository path, changed files from the relevant PR/branch
Outputs: Behavioural summary — what the code actually does, edge cases, data flows
Testing considerations:
- Does it correctly infer intent from code, not just structure?
- Does it flag code paths with no corresponding test coverage (gap detection)?
- Handles complex inheritance / polymorphism in Java?
4. Test Author¶
Role: Generates TypeScript test cases (Playwright) based on requirements + code behaviour
Inputs: Requirements (from Intake Agent) + behavioural map (from Code Inspector) + coverage gaps (from Test Finder)
Outputs: TypeScript / Playwright test files
Testing considerations:
- Are generated tests executable without modification?
- Do they test behaviour, not implementation detail?
- Self-healing on failure — up to 3 retries with automated diagnosis of failure reason
- Test quality: are assertions meaningful, not just "page loaded"?
5. Evidence Verifier¶
Role: Queries MongoDB and Splunk to gather proof that functionality works in production
Inputs: Test assertions + production query parameters
Outputs: Evidence bundle { db_records, log_entries, pass/fail verdict }
Testing considerations:
- Does it query the correct data sources for each assertion type?
- Are evidence queries scoped correctly (not pulling unrelated records)?
- How does it handle evidence that contradicts the test result?
6. Gap Reporter¶
Role: Produces coverage reports identifying what is and isn't tested
Inputs: Full pipeline output — requirements, existing tests, generated tests, evidence
Outputs: Coverage report + list of uncovered acceptance criteria
Testing considerations:
- Completeness: does it identify all genuine gaps, or only obvious ones?
- Does it auto-create Jira tickets with evidence for uncovered gaps?
- Report format suitable for both engineers and non-technical stakeholders?
7. QA Orchestrator¶
Role: Runs the full pipeline end-to-end; coordinates all 6 agents
Inputs: Jira project/ticket reference
Outputs: Complete test suite + gap report + evidence bundle + auto-created Jira tickets
Testing considerations:
- Error handling — what happens when one agent in the chain fails?
- Retry and fallback strategy per agent
- Observability — can you trace which agent produced which output?
- Idempotency — re-running the orchestrator on the same ticket should not create duplicate tickets/tests
Key Capabilities¶
Self-Healing Tests¶
When a generated test fails: 1. Failure is captured with full diagnostic context (stack trace, screenshot, logs) 2. Agent re-analyses the failure — code change? environment issue? flaky selector? 3. Test is auto-corrected and retried (up to 3 attempts) 4. If still failing after 3 retries, escalation ticket created in Jira with full evidence
Production Alert Auto-Triage¶
- 10,000+ log events (Splunk) classified per run
- Events mapped to: known issue / new issue / noise
- New issues generate Jira tickets with log evidence attached
Automatic Jira Ticket Creation¶
Gap Reporter and Alert Triage both create Jira tickets with: - Description of the gap or issue - Evidence (DB records, log excerpts, test failure traces) - Acceptance criteria for the fix
Architectural Patterns¶
Pipeline vs Graph¶
This is a linear pipeline (not a graph/DAG) — each agent feeds the next sequentially. Suitable for ticket-level QA workflows where each stage depends on the previous output.
Compare with LangGraph-style multi-agent (like the Test Cases Generator reference) where agents can run in parallel subgraphs.
Tool-Use Agents¶
Each agent is a tool-calling LLM that invokes: - Jira REST API (Intake, Gap Reporter) - Code repository / file system (Code Inspector) - MongoDB queries (Evidence Verifier) - Splunk search API (Evidence Verifier, Alert Triage) - Test file writer (Test Author)
Self-Healing Pattern¶
A retry-with-diagnosis loop:
test_fails → capture_diagnostic → analyse_failure → patch_test → retry
│
(up to 3 attempts)
│
escalate_to_jira
Interview Sound-Bites¶
"The future of QA isn't just automation — it's intelligent automation that thinks, investigates, and acts. This pipeline demonstrates what that looks like in practice."
"Self-healing tests don't just retry blind — they diagnose first. The agent reads the failure, understands whether it's a code change, a flaky selector, or a genuine regression, and patches accordingly."
"Evidence-based testing changes the conversation. Every Jira ticket the system creates comes with MongoDB records and Splunk log excerpts as proof — no more 'works on my machine'."
"80% reduction in manual QA effort doesn't mean 80% fewer QA engineers — it means QA engineers spend their time on judgment calls, exploratory testing, and risk assessment rather than writing boilerplate."
"The Orchestrator is the test of the whole system's resilience — if the Code Inspector fails, does the Test Author degrade gracefully or does the whole pipeline collapse?"
QA Testing This System (Meta-Level)¶
How would you test this autonomous QA pipeline itself?
| Agent | Key Test | Risk |
|---|---|---|
| Jira Intake | Malformed / incomplete tickets | Garbage in → garbage out downstream |
| Test Finder | Semantic miss (false negative) | Duplicate tests generated |
| Code Inspector | Misread inheritance / DI patterns | Wrong behavioural map → wrong tests |
| Test Author | Generates syntactically broken tests | Pipeline halts at execution |
| Evidence Verifier | Queries wrong data scope | False evidence → incorrect verdict |
| Gap Reporter | Misses genuine gaps | Coverage appears complete when it isn't |
| Orchestrator | Agent N fails mid-pipeline | Partial output with no clear error signal |
Related Reference¶
- Test Cases Generator AI Agent — parallel multi-agent QA pattern (LangGraph)
- RAG vs Agents vs Agentic RAG — agent architecture patterns
- LLM & Agent Evaluation Matrix — how to evaluate agent outputs
- LLM Testing Lifecycle — lifecycle framing for AI system QA