Skip to content

Autonomous QA Multi-Agent Pipeline — Real-World Case Study

Source: Sumanth Reddy (LinkedIn, May 2026) — QA Automation Engineer, 8+ years
Stack: TypeScript · Python · Kafka · Playwright · MongoDB · Splunk · Jira
Pattern: Fully autonomous multi-agent pipeline for zero-intervention QA


Overview

A real-world implementation of a 7-agent autonomous QA system built for a logistics platform. The pipeline runs end-to-end from Jira ticket ingestion through to test generation, evidence verification, and gap reporting — with no manual intervention required.

Key outcomes: - 80% reduction in manual QA effort - Self-healing tests (up to 3 retries with automated diagnosis) - 10,000+ production log events auto-classified - Jira tickets automatically created with supporting evidence


The 7-Agent Pipeline

Jira Intake Agent │ ▼ Test Finder Agent │ ▼ Dev Code Inspector │ ▼ Test Author │ ▼ Evidence Verifier │ ▼ Gap Reporter │ ▼ QA Orchestrator ←──── runs the full pipeline end-to-end


Agent Breakdown

1. Jira Intake Agent

Role: Entry point — fetches Jira tickets and extracts structured requirements
Inputs: Jira project/sprint/ticket ID
Outputs: Structured requirements object { ticket_id, summary, acceptance_criteria, story_points }
Testing considerations: - Does it correctly parse tickets with missing or malformed acceptance criteria? - Does it handle Jira API rate limits gracefully? - Are extracted requirements semantically complete?


2. Test Finder Agent

Role: Searches existing test coverage to avoid duplication
Inputs: Extracted requirements from Jira Intake Agent
Outputs: List of existing tests that cover (fully or partially) the requirement
Testing considerations: - Semantic similarity matching — does it find tests that cover the same behaviour described differently? - False negative rate — how often does it miss existing coverage and trigger redundant test authoring?


3. Dev Code Inspector

Role: Reads Java source code and maps implementation behaviour
Inputs: Repository path, changed files from the relevant PR/branch
Outputs: Behavioural summary — what the code actually does, edge cases, data flows
Testing considerations: - Does it correctly infer intent from code, not just structure? - Does it flag code paths with no corresponding test coverage (gap detection)? - Handles complex inheritance / polymorphism in Java?


4. Test Author

Role: Generates TypeScript test cases (Playwright) based on requirements + code behaviour
Inputs: Requirements (from Intake Agent) + behavioural map (from Code Inspector) + coverage gaps (from Test Finder)
Outputs: TypeScript / Playwright test files
Testing considerations: - Are generated tests executable without modification? - Do they test behaviour, not implementation detail? - Self-healing on failure — up to 3 retries with automated diagnosis of failure reason - Test quality: are assertions meaningful, not just "page loaded"?


5. Evidence Verifier

Role: Queries MongoDB and Splunk to gather proof that functionality works in production
Inputs: Test assertions + production query parameters
Outputs: Evidence bundle { db_records, log_entries, pass/fail verdict }
Testing considerations: - Does it query the correct data sources for each assertion type? - Are evidence queries scoped correctly (not pulling unrelated records)? - How does it handle evidence that contradicts the test result?


6. Gap Reporter

Role: Produces coverage reports identifying what is and isn't tested
Inputs: Full pipeline output — requirements, existing tests, generated tests, evidence
Outputs: Coverage report + list of uncovered acceptance criteria
Testing considerations: - Completeness: does it identify all genuine gaps, or only obvious ones? - Does it auto-create Jira tickets with evidence for uncovered gaps? - Report format suitable for both engineers and non-technical stakeholders?


7. QA Orchestrator

Role: Runs the full pipeline end-to-end; coordinates all 6 agents
Inputs: Jira project/ticket reference
Outputs: Complete test suite + gap report + evidence bundle + auto-created Jira tickets
Testing considerations: - Error handling — what happens when one agent in the chain fails? - Retry and fallback strategy per agent - Observability — can you trace which agent produced which output? - Idempotency — re-running the orchestrator on the same ticket should not create duplicate tickets/tests


Key Capabilities

Self-Healing Tests

When a generated test fails: 1. Failure is captured with full diagnostic context (stack trace, screenshot, logs) 2. Agent re-analyses the failure — code change? environment issue? flaky selector? 3. Test is auto-corrected and retried (up to 3 attempts) 4. If still failing after 3 retries, escalation ticket created in Jira with full evidence

Production Alert Auto-Triage

  • 10,000+ log events (Splunk) classified per run
  • Events mapped to: known issue / new issue / noise
  • New issues generate Jira tickets with log evidence attached

Automatic Jira Ticket Creation

Gap Reporter and Alert Triage both create Jira tickets with: - Description of the gap or issue - Evidence (DB records, log excerpts, test failure traces) - Acceptance criteria for the fix


Architectural Patterns

Pipeline vs Graph

This is a linear pipeline (not a graph/DAG) — each agent feeds the next sequentially. Suitable for ticket-level QA workflows where each stage depends on the previous output.

Compare with LangGraph-style multi-agent (like the Test Cases Generator reference) where agents can run in parallel subgraphs.

Tool-Use Agents

Each agent is a tool-calling LLM that invokes: - Jira REST API (Intake, Gap Reporter) - Code repository / file system (Code Inspector) - MongoDB queries (Evidence Verifier) - Splunk search API (Evidence Verifier, Alert Triage) - Test file writer (Test Author)

Self-Healing Pattern

A retry-with-diagnosis loop: test_fails → capture_diagnostic → analyse_failure → patch_test → retry │ (up to 3 attempts) │ escalate_to_jira


Interview Sound-Bites

"The future of QA isn't just automation — it's intelligent automation that thinks, investigates, and acts. This pipeline demonstrates what that looks like in practice."

"Self-healing tests don't just retry blind — they diagnose first. The agent reads the failure, understands whether it's a code change, a flaky selector, or a genuine regression, and patches accordingly."

"Evidence-based testing changes the conversation. Every Jira ticket the system creates comes with MongoDB records and Splunk log excerpts as proof — no more 'works on my machine'."

"80% reduction in manual QA effort doesn't mean 80% fewer QA engineers — it means QA engineers spend their time on judgment calls, exploratory testing, and risk assessment rather than writing boilerplate."

"The Orchestrator is the test of the whole system's resilience — if the Code Inspector fails, does the Test Author degrade gracefully or does the whole pipeline collapse?"


QA Testing This System (Meta-Level)

How would you test this autonomous QA pipeline itself?

Agent Key Test Risk
Jira Intake Malformed / incomplete tickets Garbage in → garbage out downstream
Test Finder Semantic miss (false negative) Duplicate tests generated
Code Inspector Misread inheritance / DI patterns Wrong behavioural map → wrong tests
Test Author Generates syntactically broken tests Pipeline halts at execution
Evidence Verifier Queries wrong data scope False evidence → incorrect verdict
Gap Reporter Misses genuine gaps Coverage appears complete when it isn't
Orchestrator Agent N fails mid-pipeline Partial output with no clear error signal