Walkthrough Part 1 — Performance and Security Plans

This walkthrough shows one way to approach the ShopRight non-functional test plan. It is not the only correct answer — reasonable engineers make different tool choices, prioritise differently, and draw scope boundaries differently. What matters is the thinking behind each decision: what risk is being mitigated, what constraint is being respected, and what trade-off is being made.

This lesson covers the Performance and Security sections of the plan. Part 2 covers Accessibility, Compatibility, Localisation, and Reliability.

The three-month testing timeline

Before diving into individual areas, the plan needs a spine: when does each type of testing happen? The timeline shapes everything else.

Step 1 of 3

Month 1 — Foundation and Early Integration

Performance baselines against partial environment. Security: SAST and dependency scanning wired into CI. Accessibility: axe-core in CI, WCAG audit of first completed screens. Compatibility: smoke tests on tier-1 browsers. No load testing yet — staging environment not at production scale.

The timeline is not balanced: Month 2 carries the heaviest testing load because the full staging environment is not available until then. This is normal — but it creates a risk that issues found in Month 2 leave insufficient time for remediation before the Month 3 sign-off. The mitigation is to front-load whatever testing is possible in Month 1 (SAST, accessibility of early screens, smoke compatibility).

Performance testing plan

Scope

In scope: the web application. Out of scope in Month 2: the iOS and Android apps — the agency handover is in Month 2, and mobile performance testing requires native tooling (Instruments on iOS, Android Profiler) that is outside the current engagement scope. Flag this as a risk: app performance is untested before launch.

User journeys under load

Performance tests are not useful if they test the wrong things. For ShopRight, three journeys represent the highest business risk under load:

Journey 1: Browse and search — user searches for a product category, applies filters, scrolls through results, opens a product detail page. This covers the catalogue service, the search index, and CDN-served assets. Likely the most common journey by volume.

Journey 2: Add to basket and checkout — user adds a product, proceeds through the checkout flow, enters shipping details, and completes payment via Stripe. This journey touches the basket service, the order management system, and the Stripe API integration. Revenue-critical; any performance failure here directly costs money.

Journey 3: Account actions — user logs in, views their order history, accesses their wishlist. Lower volume than browsing but important for user retention; order history queries can become expensive at scale if not properly indexed.

Load profiles

Three test types, each with a defined user count and shape:

Load test — 10,000 virtual users (baseline daily traffic), sustained for 30 minutes after a 5-minute ramp-up. Validates that the system handles normal traffic without degradation. Run in Month 2 week 1.

Stress test — ramp from 10,000 to 60,000 virtual users over 20 minutes, hold for 10 minutes, ramp down. 60,000 is 20% above the peak target, providing headroom. Identifies the breaking point and whether the system fails gracefully. Run in Month 2 week 2.

Soak test — 15,000 virtual users (moderate sustained load), run for 8 hours. Identifies memory leaks, connection pool exhaustion, and gradual degradation that short tests miss. Run in Month 2 week 3.

Spike test — jump from 5,000 to 50,000 users in 2 minutes (Black Friday pattern), hold for 15 minutes, drop back. Validates auto-scaling response time and whether the system can absorb sudden demand. Run in Month 3 week 2.

Tool choice: k6 with k6 Cloud

k6 is chosen over JMeter for three reasons specific to ShopRight: scripts are written in JavaScript (matching the team's existing skills), k6 Cloud is already approved and eliminates infrastructure setup, and k6 has native Grafana integration for real-time metric dashboards that the DevOps engineer can observe alongside infrastructure metrics during the tests.

Test scripts live in the repository under tests/performance/ alongside application code, not in a separate system. This ensures they stay in version control and can be updated when the application changes.

Acceptance criteria

Metric	Target
p95 response time (browse/search)	< 2 seconds
p95 response time (checkout submission)	< 3 seconds
Error rate across all journeys	< 0.1%
Throughput degradation at peak vs baseline	< 10%
Memory growth during 8-hour soak	< 200 MB
Auto-scaling time on spike (0 to peak capacity)	< 3 minutes

These numbers are not arbitrary. The 2-second p95 is derived from the Google research linking load time to conversion rate; the 3-second checkout threshold reflects that users tolerate slightly longer waits for payment processing. The 0.1% error rate aligns with an availability target of 99.9%.

Risks

The staging database must be seeded with a realistic product catalogue (100,000 products with typical query distributions) before load testing begins. Testing against an empty or small database produces misleading results. This is a dependency that must be resolved with the DevOps engineer before Month 2 week 1.

Stripe's sandbox environment has rate limits — load test scripts must mock the Stripe payment call rather than hitting Stripe's test endpoint at scale. This is a known limitation: checkout performance under load excludes the Stripe API response time.

Security testing plan

Scope

In scope: the web application and its API endpoints. The Stripe integration is out of scope — card data never touches ShopRight servers (Stripe Elements handles it entirely), so the PCI obligation is significantly reduced. The ShopRight application's responsibility is to handle the Stripe payment intent correctly, not to secure raw card data.

Out of scope: mobile app security testing in Month 2 (same reason as performance — agency handover timing). Flag for post-launch assessment.

SAST and dependency scanning (Month 1)

Static analysis and dependency scanning must run in CI from the start of Month 1 — these do not require a running environment and provide immediate value.

SonarQube is configured against the Node.js API and the React frontend. Rules enabled: injection vulnerabilities (SQL, XSS), hardcoded secrets, insecure cryptography, missing authentication checks. Any finding rated Critical or High blocks the pull request merge.

Dependabot (GitHub's built-in dependency scanner) flags packages with known CVEs. High-severity CVE PRs are reviewed within 24 hours; critical CVEs within 4 hours. This is a team process agreement, not just a tool configuration.

DAST with OWASP ZAP (Month 2)

OWASP ZAP runs in authenticated scan mode against the staging environment in Month 2. The scan covers:

All API endpoints (derived from the OpenAPI specification)
Authentication and session handling
All form inputs (search, checkout, account creation, contact forms)

ZAP is configured with the ShopRight user account credentials so it can reach authenticated-only endpoints. Scan results are exported and triaged: Critical and High findings must be remediated before the external pen test; Medium findings are scheduled for remediation with the engineering team.

External penetration test (Month 2 week 3)

A professional penetration testing firm is engaged for a one-week test against staging. The scope includes:

Authentication: brute force protection, account enumeration, password reset flow, session fixation
Authorisation: IDOR testing across order IDs, address IDs, and wishlist IDs
Injection: SQL injection, XSS, command injection in search and filter inputs
Business logic: coupon code abuse, negative quantity in basket, order total manipulation
GDPR mechanics: data export endpoint, data deletion request, cookie consent bypass

The external test is timed for Month 2 week 3 specifically so that the ZAP DAST findings (week 2) have been remediated before the pen testers arrive. Sending pen testers in before automated scanning is a waste — they will find the same automated-discoverable issues, billed at consultant rates.

The pen test deliverable is a report with CVSS-scored findings. ShopRight's acceptance criterion: zero Critical findings, zero High findings unmitigated at launch. Medium findings require a documented remediation plan with a date.

Acceptance criteria

Area	Threshold
SonarQube Critical/High findings in CI	Zero (blocks merge)
Known CVEs (Critical) in production dependencies	Zero
External pen test: Critical findings at launch	Zero
External pen test: High findings at launch	Zero unmitigated
OWASP ZAP authenticated scan: Critical findings	Zero at launch
Authentication: lockout after failed attempts	Yes, configurable threshold

Risks

The external pen test requires the staging environment to be stable and fully integrated before the testers begin. If Month 2 is delayed or integration issues mean staging is not production-equivalent, the pen test either gets postponed (reducing Month 3 remediation time) or runs against a partial system (producing misleading results). This is the highest scheduling risk in the security plan.

GDPR testing requires working implementations of the data export and data deletion endpoints. If these are not built by Month 2, GDPR mechanics cannot be pen tested before launch. Flag as a dependency on the engineering team's delivery schedule.

The next lesson continues the walkthrough with Accessibility, Compatibility, Localisation, and Reliability — plus a worked example of the Resources section that lists every tool and environment dependency in one place.