Cost, Latency, and When NOT to Use AI

8 min read

The honest economics: AI-driven testing has costs that hand-written tests don't, and pretending otherwise is how teams end up with surprise bills and slow CI. This lesson is about putting numbers on those costs, naming the workloads where Playwright MCP is the wrong tool, and building the mental model that decides deterministic Playwright vs MCP by reflex. By the end you should be able to glance at a proposed use case and predict, within a factor of two, what it would cost in a month.

The headline pattern: AI shines on low-frequency, high-variability work and loses on high-frequency, low-variability work. The four chapters before this one were almost entirely the first quadrant. This lesson is about not crossing into the second.

What an MCP session actually costs

Three numbers compound:

  • Tokens per turn. A snapshot of a moderate page is a few KB of text — call it 1–3K tokens including overhead. A vision-mode screenshot can hit 5–15K image-equivalent tokens. Add the model's reasoning output on top.
  • Turns per session. A login flow is 5–10 turns. A bug-reproduction-with-variation session is 30–60. An exploratory charter can hit 100+.
  • Sessions per day across the team. Five testers each running three sessions a day is 75 sessions a week.

Multiply through and a typical mid-sized team's MCP usage runs $200–$1,500 a month at current Claude pricing. That number is fine if it replaces hours of manual work; it's wasteful if it's running in CI on every commit. The dial is what kind of work the sessions do, not how many sessions — five exploratory charters a week deliver more value than five hundred regression runs.

The other cost is latency. Every tool call adds 1–3 seconds for the server round-trip plus the model's thinking time. A 10-step flow that runs in 2 seconds in Playwright takes 30+ seconds via MCP. Acceptable for interactive sessions; unacceptable for the hot path.

And then there's non-determinism. Same prompt, same app, occasionally different paths. Acceptable for exploration where you'd accept several plausible answers. Not acceptable for regression where you need exactly this answer, every time.

Five workloads where MCP is the wrong tool

  1. CI regression suites. Slow, costly, non-deterministic — three strikes against the hot path. Use deterministic Playwright code; let MCP generate the tests, but never be the tests.
  2. High-volume runs. A thousand test runs a day at $1 a session is $30,000 a month. The same suite as deterministic Playwright is essentially free. The economics aren't close.
  3. Performance testing. AI sessions add their own latency on top of the system under test. Numbers come out meaningless for SLA work. Reach for k6 or JMeter (covered in their own qa.codes courses).
  4. Headless cron jobs. A scheduled health check runs the same script every five minutes for years. Determinism is a feature, not a limitation. Write the cron job as a Playwright script.
  5. Tasks that take seconds with code. "Click this button on this URL" — three lines of Playwright. Crafting a prompt and waiting for tool calls is slower than just typing it.

Six workloads where MCP earns its bill

  1. One-off investigations. "Why does this one button not work for that one user?" Single session, sharp answer, gone.
  2. Exploratory charters. Boundary-value fuzzing, input-space probing — the work humans skip from boredom and the AI doesn't.
  3. Flow understanding. "How does this app actually work?" — for an unfamiliar codebase or a vendor product, an AI walkthrough is faster than reading docs.
  4. Test and POM scaffolding. Generate the first draft, harden by hand, commit once, run as deterministic code thereafter.
  5. Mysterious failure debugging. A flake that passes locally but fails in CI — the agent re-runs against the live env and writes back a verdict.
  6. Bug reproduction from vague descriptions. The triage workflow from Chapter 4 — the highest-ROI use case in the course.

The pattern: each of these is occasional and information-rich. You're paying for insight, not throughput.

Where each workload sits in the cost/value matrix

The whole adoption strategy is in that bottom-right cell. Anything else and the cost-benefit falls apart.

Cost-optimisation habits

If you've decided MCP is the right tool, four habits keep the bill predictable:

  • Stay in snapshot mode unless vision is genuinely required. Image tokens cost 5–15× text tokens. Use opportunistic screenshots rather than --vision always-on.
  • Avoid huge pages. A 50-screen-long marketing page produces a giant snapshot that fills the context window for the rest of the session. If the assistant only needs the header, navigate to a narrower URL or scroll/scope the snapshot.
  • End sessions promptly. A long, drifting chat re-sends the entire history with each turn. Start a fresh chat for unrelated work; keep ongoing sessions tightly focused.
  • Generate once, run forever. The single highest-ROI move in this course is converting a successful AI session into a deterministic Playwright test. The session pays its bill once; the test pays back across every CI run thereafter.

Quick budget arithmetic

Three back-of-envelope numbers worth carrying:

  • Bug-reproduction session, average flow — ~$0.30–$1. Triaging 30 vague tickets a week is $10–$30 a week, replacing 15+ hours of manual triage.
  • Test-generation session for a new feature — ~$0.50–$2. A team generating 5 new tests a week is $5–$10 a week, replacing 5–10 hours of authoring.
  • Vision-mode visual review across two URLs — ~$0.50–$2 per comparison. Daily staging-vs-prod check is $15–$60 a month.

The pattern: hundreds of dollars a month for hundreds of saved hours. The arithmetic only fails when teams use MCP for the workloads where deterministic code is far cheaper. Avoid those, and the ROI story is straightforward.

⚠️ Common mistakes

  • Treating MCP as free because "the test ran." Each session bills tokens whether or not it found a bug. Run pointless sessions long enough and the bill catches up. Track which sessions produce committed artefacts; cull workflows that don't.
  • Putting MCP in the CI hot path. "Let's just have the AI run our smoke tests on every push" is the most expensive mistake on this list. Per-commit MCP usage at scale runs into thousands of dollars a month and adds non-determinism to a suite whose entire job is to be deterministic.
  • Ignoring the generate once, run forever pivot. A workflow you run weekly should not be a weekly AI session — it should be a deterministic Playwright test the AI helped you write once. The pivot is the difference between MCP as a useful tool and MCP as a recurring tax.

🎯 Practice task

Audit a month of (real or hypothetical) MCP usage. 30 minutes.

  1. List every place your team currently uses — or is considering using — Playwright MCP. Be specific: per-commit smoke, daily staging-vs-prod, ad-hoc bug repro, weekly POM generation, etc.
  2. For each, mark the quadrant from the matrix above. High-frequency low-variability items are red flags.
  3. Estimate cost: tokens per session × turns per session × frequency. Don't aim for accuracy — aim for order-of-magnitude. "$5 a month" and "$5,000 a month" are different decisions.
  4. For any item in the wrong quadrant, write down what the deterministic alternative looks like — "replace daily AI smoke with a Playwright cron job seeded from one AI session per quarter."
  5. Stretch: for one workload that is in the right quadrant, write a generate-once-run-forever plan: how would you turn this from a recurring AI session into a one-off generation step plus a deterministic test? File it as a backlog ticket and revisit at the next planning round.

The last lesson of the chapter is the security-and-privacy envelope — the constraints that should shape which environments and which credentials you point all of this at.

// tip to track lessons you complete and pick up where you left off across devices.