Review and Future of AI in QA

8 min read

You've worked through the brief and the walkthrough. This final lesson does three things: gives you a self-assessment checklist to validate your own pilot design, suggests stretch goals for teams ready to push further, and looks at where AI in QA is heading over the next few years. The point isn't to predict the future — it's to make sure you're investing in skills that compound rather than ones the next model release makes obsolete.

Self-assessment checklist

Before you call your pilot design "done," make sure you can answer each of these honestly:

  • Can you state the specific pain point each tool addresses? (If you can't, you bought a solution looking for a problem.)
  • Do you have baseline metrics captured before any tool was introduced? (Without baselines, you can't tell if the pilot worked.)
  • Is each tool's kill criterion written down? (Tools that miss their kill criteria need to be dropped, not extended.)
  • Have you addressed the team-resistance question? (Engineers who feel threatened by AI don't adopt AI well.)
  • Is your governance one page or shorter? (Longer governance documents don't get read or followed.)
  • Does your prompt library have a named owner? (Unowned prompt libraries decay within three months.)
  • Can you describe the pilot to the CTO in two minutes? (If not, the framing isn't sharp enough yet.)
  • Is the rollback plan specific? (If AI tools cause a regression, what's the explicit response?)

If you stalled on any of these, that's the part to revisit before launch.

Reflection — what worked, what didn't

Once a real pilot runs, you'll be writing your own version of this section. The honest pattern from teams that have done this:

  • Coding assistants almost always pay back. This is the safest bet in the AI-for-QA toolbox. Teams that adopt one rarely go back.
  • Chat assistants pay back in unexpected ways. The biggest wins are often not in writing tests but in test design, triage, and onboarding new engineers.
  • Self-healing tools deliver less than the demos suggest. They reduce some maintenance burden, but rarely as much as marketing claims, and they introduce new silent-failure modes.
  • Visual AI is high-leverage where the surface area is big, low-leverage otherwise. Teams with consumer-facing UIs love it; teams with internal tools find the cost hard to justify.
  • MCP-based exploration is genuinely new. The teams that figure out how to integrate it into their release flow get a real advantage; the teams that treat it as a curiosity see no value.
  • AI test platforms (Mabl, Testim) are great for some teams and wrong for others. The decision turns on whether the team wants to outsource test infrastructure. Engineering-led teams typically don't.

Stretch goals — beyond the 90-day pilot

For teams that complete a successful pilot and want to push further:

  1. Build a custom internal MCP server. Expose FlexBank-specific test data, fixture creation, environment provisioning, and database queries to the AI. The AI can then drive your testing loop with deep knowledge of your domain.
  2. Train a fine-tuned model on your codebase. For very large QA organisations, a model fine-tuned on your conventions, your terminology, and your domain produces noticeably better suggestions than a generic model — at the cost of training infrastructure.
  3. Implement chatops triggered AI testing. A Slack-bot that lets non-engineers say "run an exploratory loop on the new checkout flow" and get a report back. Lowers the barrier to ad-hoc testing.
  4. Build a unified QA AI dashboard. Centralise metrics from Cursor usage, Applitools runs, Healenium healing reports, MCP token spend, flake rate. Turns "we adopted AI" into "AI is moving these specific numbers."
  5. Contribute back to open-source AI testing tools. Healenium, Playwright MCP, and similar projects benefit from real-world feedback. Teams that contribute often get features prioritised.

These are real organisations' next moves, not hypotheticals. Each represents a year-plus of work for a serious team.

Where AI in QA is heading

Future of AI in QA
  • – AI explores apps independently
  • – Designs tests, runs them, reports findings
  • – Closer for greenfield, far for legacy
  • – Smaller, faster, fine-tuned
  • – Better at domain-specific reasoning
  • – Self-hostable for regulated environments
  • – Tests detect their own quality issues
  • – Auto-refactor for clarity
  • – Auto-prune redundant coverage
  • – AI links production incidents to test gaps
  • – Auto-suggests new tests post-incident
  • – Closes the observability-to-coverage loop
  • Voice interfaces, video flows, AR/VR –
  • AI handles modalities scripts can't –
  • AI ensures coverage of regulatory requirements –
  • GDPR, PCI-DSS, SOX, HIPAA –
  • Audit-ready evidence trails –

Several of these are already shipping in early form. Autonomous testing agents — AI that plans, runs, and triages tests with minimal human input — work on simple greenfield apps and struggle on real legacy ones. Production-test integration is appearing in observability platforms. Compliance automation is being built into enterprise tooling. Multimodal testing is in early academic and research stages. The skills you build now apply across all of them — the principles outlast the specific products.

Skills that matter more in an AI-augmented world

  • Strategic thinking. What to test, what to skip, where the risk lives. Becomes more valuable, not less.
  • System design awareness. Architectures that are testable. AI can write tests; it can't redesign your monolith.
  • Code review and judgement. Catching subtle issues in AI-generated code is the highest-leverage QA skill of the next few years.
  • Domain expertise. Knowing why a banking transaction matters, what regulations apply, where the customer pain lives. AI doesn't have this.
  • Communication. Explaining complex bugs to engineers, customers, and stakeholders. Pure human skill.
  • Tool curation. Choosing tools, integrating them, dropping them when they stop earning their keep.

Skills that matter less

  • Memorising syntax. AI handles it. Stop drilling on expect() matchers.
  • Hand-typing boilerplate. Page objects, fixtures, basic assertions — AI generates passable first drafts in seconds.
  • Generating simple test data. AI does it faster than you can type.
  • Mechanical regression test writing. AI scaffolds this; humans spend their time on harder problems.

If your career identity is built around the second list, the next few years will feel uncomfortable. If it's built around the first list, they'll be the most exciting years your career has had.

Where to go next

This course was deliberately broad — survey rather than depth. Adjacent areas worth exploring:

  • Playwright and Cypress courses on this site. Deep on the frameworks AI assistants help you write tests in.
  • API Testing Masterclass. Where AI is most effective today.
  • Test Automation Frameworks. Broader principles that apply regardless of AI.
  • CI/CD for QA Engineers. Where AI tooling slots into your delivery pipeline.
  • The Model Context Protocol spec (modelcontextprotocol.io). If MCP-based testing intrigues you, the spec itself is short and worth reading.
  • Prompt engineering literature. OpenAI, Anthropic, and Google all publish prompt-engineering guides — the principles transfer across providers.

A closing thought on career relevance

AI is not coming for QA jobs. It's coming for the parts of QA jobs that nobody enjoyed anyway — boilerplate, locator maintenance, manual triage, regression typing. The QA engineers who lean in, build prompt fluency, and master the review-and-curate skill set are the most valuable hires of the next few years.

The opportunity isn't theoretical and it isn't far away. It's this quarter, this team, this codebase, this pilot. The teams that move now have a meaningful head start on the ones that wait for the dust to settle. The dust isn't settling.

Good luck with your pilot. The course is over; the work is starting.

// tip to track lessons you complete and pick up where you left off across devices.