AI QA Agents Catalogue — The 5 Essential Agents¶

"The future of QA isn't just automation. It's intelligent automation that thinks, investigates, and acts."
Practical, deployable AI agents every QA engineer should know — and be able to build or evaluate.

Why AI Agents in QA?¶

Traditional automation executes instructions. AI QA agents make decisions — they read context, reason about it, and take targeted action. The result: less time on repetitive triage, more time on judgment-intensive testing.

Traditional Automation	AI QA Agent
Follows a fixed script	Adapts to context
Fails on unexpected output	Classifies and explains the failure
Generates boilerplate	Generates intent-driven test cases
Requires manual input preparation	Sources its own inputs from artefacts
Reports pass/fail	Reports why and what to do next

Agent 1 — Test Case Generation Agent¶

What It Does¶

Reads user stories, acceptance criteria, or feature descriptions and generates structured, comprehensive test cases — including functional, negative, edge case, and boundary scenarios.

Saves: 3–4 hours per sprint for most teams. Immediately.

Inputs¶

Jira ticket / user story text
Acceptance criteria
(Optional) existing test coverage for deduplication

Outputs¶

Test Case ID: TC-001 Title: Valid login with correct credentials Type: Functional / Positive Precondition: User account exists and is active Steps: 1. Navigate to /login 2. Enter valid email and password 3. Click "Sign In" Expected: Redirect to dashboard; session token issued Priority: High Coverage: AC-1 (user can authenticate)

Key Design Considerations¶

Coverage mapping — every test case traces to an acceptance criterion
Deduplication — checks existing test suite before generating (avoid redundancy)
Test type routing — intent-driven: if AC mentions "error handling", negative tests generated
Format flexibility — output to Jira, Gherkin, markdown, or custom template

QA Evaluation Metrics¶

Metric	How to Measure
Coverage completeness	% of ACs with at least one test case
False positive rate	Human review: % of generated tests that are irrelevant
Deduplication accuracy	% of existing tests correctly identified and skipped
Executability	% of generated tests runnable without modification

Agent 2 — Regression Triage Agent¶

What It Does¶

Runs the regression suite, identifies new failures, and separates them from known flaky tests — so engineers start the day knowing exactly which failures need attention.

Saves: Eliminates morning stand-up time spent explaining which failures matter.

Inputs¶

Current regression run results (JUnit XML, pytest output, etc.)
Historical test run data (baseline failure patterns)
Flaky test registry

Outputs¶

``` Regression Run: build-2847 ───────────────────────────────────────────── NEW FAILURES (action required): ✗ test_checkout_payment_declined — NEW · likely code regression ✗ test_user_profile_update — NEW · likely code regression

KNOWN FLAKY (monitor, no action): ✗ test_email_delivery_timing — flaky (failed 12/30 recent runs)

UNCHANGED PASS: 847 tests ───────────────────────────────────────────── Verdict: 2 genuine regressions. Assign to dev team. ```

Key Design Considerations¶

Flaky detection — statistical model on historical run data (fail rate, variance)
Root cause hypothesis — correlate failure with recent commits/deployments
Zero false negatives — err on the side of flagging; missing a real failure is worse than a false alarm
CI/CD gate — blocks deployment on new failures; passes on known flaky

QA Evaluation Metrics¶

Metric	How to Measure
True positive rate	% of genuine regressions correctly flagged
False positive rate	% of flaky tests incorrectly escalated
Classification latency	Time from run completion to triage report
Noise reduction	% reduction in manual triage time

Agent 3 — Bug Report Enrichment Agent¶

What It Does¶

Takes a raw bug report and automatically enriches it — pulling relevant logs, attaching screenshots, mapping to affected components, and formatting it for the development team.

Saves: Eliminates the back-and-forth between QA and dev asking "can you attach the logs?"

Inputs¶

Raw bug description (free text, screenshot, or ticket)
Log sources (Splunk, CloudWatch, Datadog, local log files)
Screenshot / screen recording path or URL

Outputs¶

``` Bug Report: BUG-4421 — Payment timeout on checkout ───────────────────────────────────────────────────── Summary: Payment gateway times out after 30s on /checkout Severity: P1 — Revenue impacting Environment: Staging · Build 2847 · Chrome 124

Reproduction: 1. Add item to cart 2. Proceed to checkout 3. Enter card details and submit 4. Observe: spinner runs for 30s, then "Payment failed" error

Relevant Logs (auto-attached): [ERROR] 2026-05-27 14:22:11 payment-service — gateway timeout after 30000ms [WARN] 2026-05-27 14:22:08 payment-service — retry 3/3 exhausted

Screenshots: [attached — checkout_timeout_01.png] Affected Component: payment-service, morphe-gateway Related Tickets: BUG-4388 (similar timeout, Jan 2026 — resolved) ```

Key Design Considerations¶

Log scoping — time-window and service-scope filtering to avoid noise
PII scrubbing — strip sensitive data (card numbers, emails) from attached logs
Similar bug linkage — semantic search over historical bugs to surface related issues
Auto-severity classification — payment failure = P1; cosmetic = P3

QA Evaluation Metrics¶

Metric	How to Measure
Log relevance precision	% of attached logs actually relevant to the bug
PII leakage rate	Must be zero — automated scan
Enrichment completeness	% of required fields populated without manual input
Dev team time-to-understand	Before/after enrichment: time for dev to reproduce

Agent 4 — API Response Validation Agent¶

What It Does¶

Monitors API responses in real time — flagging anomalies in payload structure, field types, response times, and status codes. Alerts before users notice and before on-call engineers get paged at 2am.

Catches: Breaking schema changes, silent nulls, latency regressions, unexpected status codes.

What It Monitors¶

For each API response: ✓ Status code matches contract (200, 201, 4xx as expected) ✓ Response schema matches OpenAPI spec (no missing/extra fields) ✓ Field types correct (no string where int expected) ✓ Required fields present (no unexpected nulls) ✓ Response time within SLA (p95 < threshold) ✓ Payload size within bounds (no runaway responses) ✓ Pagination structure correct (next/prev links, total count)

Alert Output Example¶

ANOMALY DETECTED — /api/orders/{id} ────────────────────────────────────── Type: Schema drift Endpoint: GET /api/orders/12345 Field: shipping_address.postcode Issue: Expected string, received null (was populated 99.8% of calls) Since: Build 2844 (deployed 14:30 today) Frequency: 234 occurrences in last 10 minutes Action: Review Order Service PR #892 merged at 14:25

Key Design Considerations¶

Contract-first — validates against OpenAPI/Swagger spec, not hardcoded expectations
Statistical anomaly detection — flags unusual nulls, not all nulls (some fields are legitimately nullable)
Deployment correlation — links anomaly onset to recent deploy/commit
Noise suppression — known acceptable deviations suppressed to avoid alert fatigue

QA Evaluation Metrics¶

Metric	How to Measure
Detection rate	% of real schema breaks caught before user report
False positive rate	Anomaly alerts that were benign (acceptable)
Mean time to alert	Time from anomaly onset to alert firing
Latency SLA coverage	% of endpoints with defined and monitored SLAs

Agent 5 — Test Data Generation Agent¶

What It Does¶

Creates realistic, edge-case-rich test data sets on demand — seeded to the test database with one command. No more blocking sprints on "I need good test data."

Covers: Happy-path data, boundary values, nulls and empty strings, Unicode/special chars, max-length strings, referential integrity across tables.

Inputs¶

Schema definition (database schema, Pydantic models, OpenAPI spec)
Generation profile: { volume: 100, edge_case_ratio: 0.2, locale: "en-GB" }
Domain rules: { min_age: 18, email_must_be_unique: true }

Output Example¶

```python

Generated test dataset — users table¶

[ # Happy path {"id": 1, "name": "Alice Johnson", "email": "alice@example.com", "age": 34},

# Edge cases (20% of set) {"id": 2, "name": "Ø̈", "email": "unicode@tëst.com", "age": 18}, # min age {"id": 3, "name": "A" * 255, "email": "maxlen@example.com", "age": 99}, # max name length {"id": 4, "name": "O'Brien-MacDonald", "email": "sql'@test.com", "age": 25}, # injection chars {"id": 5, "name": " Leading spaces ", "email": "ws@test.com", "age": 21}, # whitespace ] ```

Generation Profiles¶

Profile	Use Case
`happy_path`	Standard functional testing
`boundary`	Min/max values, just-inside/just-outside limits
`negative`	Invalid types, out-of-range values, missing required fields
`edge_case`	Unicode, special chars, SQL injection patterns, XSS strings
`volume`	Large datasets for performance/load testing
`referential`	Maintains FK integrity across related tables

Key Design Considerations¶

Schema-driven — derives rules from the actual data model, not manual config
Referential integrity — parent records created before child records
Deterministic seed — same seed = same dataset (reproducible test runs)
PII-safe — generates fake-but-realistic data, never uses real customer data

QA Evaluation Metrics¶

Metric	How to Measure
Schema conformance	% of generated records that pass schema validation
Edge case coverage	% of boundary conditions represented in generated set
Referential integrity	FK violations in generated dataset = 0
Generation latency	Time to generate and seed 1,000 records

The 5 Agents as a System¶

Run together, these agents cover the full QA lifecycle:

Sprint Planning │ ▼ [1] Test Case Generation Agent ← User stories in → test cases out │ ▼ [5] Test Data Generation Agent ← Schema in → test database seeded │ ▼ Test Execution (automated suite) │ ▼ [2] Regression Triage Agent ← Run results in → prioritised failures out │ ▼ [3] Bug Report Enrichment Agent ← Raw bug in → enriched ticket out → dev team │ ▼ Production Monitoring │ ▼ [4] API Validation Agent ← Live traffic → anomalies flagged proactively

Interview Sound-Bites¶

"Test case generation agents save 3–4 hours per sprint. That's not just efficiency — it's redirecting QA effort from boilerplate generation to judgment-intensive exploratory and adversarial testing."

"Regression triage is where most QA teams lose the most time. An agent that separates genuine regressions from flaky noise before stand-up means the team starts the day with signal, not noise."

"Bug report enrichment closes the QA-dev handoff loop. The most common dev response to a bug report is 'can you add the logs?' An enrichment agent answers that question before it's asked."

"API validation agents catch schema drift between deploys — the class of bug that's invisible in unit tests, invisible in E2E tests, but immediately visible to users. Monitoring live traffic is the only reliable detection layer."

"Test data generation unblocks QA from the most common sprint dependency: 'I can't test this until I have good test data.' On-demand, schema-driven generation removes that dependency entirely."

Test Cases Generator AI Agent — full architecture of a production test-gen agent (JIRA + RAG + LangGraph)
Autonomous QA Multi-Agent Pipeline — 7-agent pipeline implementing several of these patterns
LLM & Agent Evaluation Matrix — how to evaluate agent output quality
RAG Automation Testing Roadmap — if the agents use RAG internally

AI QA Agents Catalogue — The 5 Essential Agents¶

Why AI Agents in QA?¶

Agent 1 — Test Case Generation Agent¶

What It Does¶

Inputs¶

Outputs¶

Key Design Considerations¶

QA Evaluation Metrics¶

Agent 2 — Regression Triage Agent¶

What It Does¶

Inputs¶

Outputs¶

Key Design Considerations¶

QA Evaluation Metrics¶

Agent 3 — Bug Report Enrichment Agent¶

What It Does¶

Inputs¶

Outputs¶

Key Design Considerations¶

QA Evaluation Metrics¶

Agent 4 — API Response Validation Agent¶

What It Does¶

What It Monitors¶

Alert Output Example¶

Key Design Considerations¶

QA Evaluation Metrics¶

Agent 5 — Test Data Generation Agent¶

What It Does¶

Inputs¶

Output Example¶

Generated test dataset — users table¶

Generation Profiles¶

Key Design Considerations¶

QA Evaluation Metrics¶

The 5 Agents as a System¶

Interview Sound-Bites¶

Related Reference¶