This lesson continues the ShopRight plan walkthrough. Part 1 covered Performance and Security. Here the focus is on Accessibility, Compatibility, Localisation, and Reliability — followed by a worked Resources section that consolidates every tool, environment, and team dependency.
Accessibility testing plan
Regulatory obligation
WCAG 2.1 Level AA is not a recommendation for ShopRight — it is a legal requirement under the UK Equality Act 2010. The accessibility plan must be designed to produce evidence of compliance, not just to fix the most obvious issues. Evidence matters: if a complaint is filed post-launch, ShopRight needs documentation showing systematic testing was conducted.
Automated scanning (Month 1 onwards)
axe-core is integrated into the Playwright end-to-end test suite from Month 1. Every automated E2E test run includes an axe scan of the pages it traverses. Any new critical or serious violation introduced by a pull request fails the build.
Automated scanning covers approximately 30–40% of WCAG 2.1 AA criteria — specifically the machine-detectable ones: missing alt text, insufficient colour contrast, form inputs without labels, missing language attribute, and keyboard traps. The remaining 60–70% requires manual testing.
Manual WCAG audit (Month 2)
A structured manual audit is conducted against all user-facing screens in Month 2, following the WCAG 2.1 AA success criteria checklist. Priority screens:
- Homepage and navigation
- Product listing and search results
- Product detail page
- Basket and checkout (all steps)
- Account creation and login
- Order history and tracking
- Mobile app equivalent screens (post-agency handover)
The audit uses the WebAIM WCAG 2.1 checklist as a reference. Each success criterion is marked Pass, Fail, or Not Applicable. Fails are filed as bugs with severity aligned to the WCAG impact: Level A failures are Critical; Level AA failures are High.
Screen reader testing (Month 2)
Two screen readers are tested: NVDA with Firefox on Windows (most common screen reader combination among assistive technology users) and VoiceOver with Safari on macOS and iOS.
The test protocol is journey-based, not criteria-based — a screen reader user must be able to complete the same three journeys used in performance testing (browse and search, checkout, account actions) without sighted assistance. Any journey that cannot be completed end-to-end is a Critical finding.
Acceptance criteria
| Criterion | Target |
|---|---|
| axe-core Critical violations in CI | Zero (blocks merge from Month 1) |
| WCAG Level A violations at launch | Zero |
| WCAG Level AA violations at launch | Zero |
| Screen reader: checkout journey completable (NVDA) | Yes |
| Screen reader: checkout journey completable (VoiceOver iOS) | Yes |
| Colour contrast ratio (normal text) | ≥ 4.5:1 |
Compatibility testing plan
Browser and device matrix
The ShopRight user analytics are not available yet (the legacy platform did not capture detailed browser data). The plan defaults to StatCounter UK data as a proxy until real analytics are available post-launch.
Based on UK market data, the tier-1 browsers are Chrome (desktop and Android), Safari (macOS and iOS), and Edge. Firefox and Samsung Internet are tier-2.
Test execution strategy — three tiers of coverage
Smoke (Every PR)
Chrome latest (desktop)
Safari latest (macOS)
iOS Safari (iPhone 15 class)
3 critical paths only
Login → Browse → Checkout
Runs in ~5 minutes via BrowserStack
Regression (Every PR, Chromium)
Chrome latest (desktop)
Chrome on Android (Pixel 8 class)
Full Playwright test suite
All user journeys
All form interactions
Fast — one browser only
Full Matrix (Weekly)
Chrome, Edge, Firefox (desktop)
Safari macOS, Safari iOS
Android Chrome, Samsung Internet
Tablet: iPad, Galaxy Tab
Viewport drag: 320px → 1920px
All Playwright tests × all browsers
This tiered strategy prevents the matrix from becoming a bottleneck. Running the full browser matrix on every pull request would slow down the pipeline unacceptably — smoke tests on the two most important browsers give fast feedback, while the weekly full matrix catches regressions before they accumulate.
Responsive testing protocol
Responsive testing is incorporated into the Playwright suite using per-test viewport configuration. Three viewport configurations are tested in the full regression suite:
- Mobile: 375×812 (iPhone 14 equivalent)
- Tablet: 768×1024 (iPad portrait)
- Desktop: 1280×800
In addition to the automated viewport tests, manual responsive testing is conducted during Month 2 using Chrome DevTools: dragging the viewport from 1400px to 320px slowly on each priority screen to catch layout breaks at transition points.
Acceptance criteria
| Criterion | Target |
|---|---|
| Critical paths pass on Chrome latest | Yes |
| Critical paths pass on Safari latest (macOS) | Yes |
| Critical paths pass on iOS Safari | Yes |
| No horizontal scroll at 375px viewport | Yes |
| Touch targets meet 44×44px minimum | Yes |
| No content hidden behind iOS safe area insets | Yes |
Localisation testing plan
Year 1 scope: i18n foundation only
ShopRight launches in English only. No translations exist in Year 1. The Year 1 localisation plan is entirely about i18n foundation — building the internationalisation infrastructure so that Year 2 translations can be added without touching the React components.
This is the right approach. Retrofitting i18n into a codebase that shipped hardcoded strings requires changing every string in the application. Starting from day one with externalised strings costs one engineer a few weeks. Doing it later costs the entire team several months.
i18n foundation requirements (verified in testing):
- All user-visible strings fetched from translation files via
t()calls — no hardcoded English in component JSX - Dates formatted via
Intl.DateTimeFormatwith locale parameter — nevernew Date().toLocaleDateString()without locale - Numbers and currencies formatted via
Intl.NumberFormat— never manual decimal/comma insertion - Currency display uses ISO 4217 codes via the formatting API, not hardcoded GBP symbols
- RTL layout support is not required for Year 1 (none of the Year 2 locales are RTL), but CSS logical properties should be used in new components from Month 1
Pseudo-localisation sweep (Month 2)
A pseudo-localisation pass is run in Month 2: the English translation file is processed by a script that replaces ASCII characters with accented variants and pads strings by 40%. The result is loaded into the staging environment and every screen is reviewed for:
- Strings still appearing in unmodified English (hardcoded, not using
t()) - Buttons, labels, or navigation items that clip or overflow at expanded string length
- Translation keys appearing as raw strings (
user.profile.titleinstead of a displayable string) - Encoding errors: accented characters not rendering correctly
This test does not require any French, German, or Spanish content. It catches i18n infrastructure bugs now, before translations are commissioned.
Year 2 preview: what testing will look like
When fr-FR, de-DE, and es-ES are added in Year 2, the localisation test plan expands to include:
- Native speaker review of translations before any locale ships (machine translation is not acceptable for release)
- Date and currency format verification for each locale (de-DE inverts decimal and thousands separators)
- German string expansion: German strings run 30–40% longer than English; any layout that is tight in English will overflow in German
- Email template localisation: transactional emails (order confirmation, shipping notification, password reset) must be localised alongside the UI — these are commonly missed
Acceptance criteria (Year 1)
| Criterion | Target |
|---|---|
| Hardcoded English strings in production build | Zero |
| Pseudo-localisation: strings failing to render via t() | Zero |
| Pseudo-localisation: elements clipping at 140% length | Zero |
| Date formatting using Intl.DateTimeFormat | All date displays |
| Currency formatting using Intl.NumberFormat | All currency displays |
Reliability testing plan
Failover and the RTO/RPO targets
ShopRight has defined RTO of 30 minutes and RPO of 15 minutes. These targets must be validated by actual tests — not assumed based on the AWS Multi-AZ architecture documentation.
Multi-AZ is configured and theoretically failover should be automatic. The word "theoretically" is the reason for testing.
Failover test 1 — RDS primary instance failure: Terminate the primary RDS instance using AWS Fault Injection Simulator (FIS). Measure time from termination to the standby promotion completing and the application accepting writes. Acceptance criterion: application recovers and write traffic is processed within 30 minutes.
Failover test 2 — Availability zone failure simulation: Using AWS FIS, simulate an AZ becoming unavailable. Observe that traffic routes to instances in the remaining two AZs, health checks detect the failure, and user-facing error rates during the failover event do not exceed 1% for longer than 5 minutes.
Failover test 3 — Application server failure: Terminate two-thirds of the ECS task instances simultaneously. Observe that the remaining instances continue handling traffic, ECS replaces the terminated tasks, and auto-scaling adds capacity within 3 minutes.
Backup restoration test
The RPO of 15 minutes requires point-in-time recovery (PITR) to be enabled on the RDS instance and validated. In Month 2, a backup restoration test is run:
- Take note of a specific data state (a known order ID, a known user record)
- Trigger an RDS snapshot
- Restore the snapshot to a separate temporary RDS instance
- Verify the known data state is present and consistent in the restored instance
- Verify the restoration completed within the RPO window
This is the most commonly skipped reliability test and the most important. A backup that has never been restored is unverified.
Soak test connection to reliability
The 8-hour soak test defined in the performance plan also serves as a reliability test. Metrics monitored during the soak specifically for reliability signals: RDS connection pool utilisation, Lambda/ECS memory growth over time, CloudWatch error rate trends, and queue depth if SQS is used for order processing. Any upward trend in these metrics that does not plateau within 2 hours is a reliability finding.
Acceptance criteria
| Criterion | Target |
|---|---|
| RDS failover: writes resume within | 30 minutes |
| AZ failure: error rate spike duration | < 5 minutes above 1% |
| Backup restoration test: data integrity verified | Yes |
| Backup restoration test: completes within RPO (15 min) | Yes |
| 8-hour soak: memory growth in API service | < 200 MB |
| 8-hour soak: error rate trend | Flat (no upward trend) |
Resources summary
A plan without a resources section is incomplete. Every tool and environment dependency collected across all eight plan areas:
| Resource | Purpose | When needed | Owner |
|---|---|---|---|
| k6 Cloud subscription | Load and spike tests | Month 2–3 | DevOps engineer |
| BrowserStack Automate | Cross-browser/device testing | Month 1 onwards | QA lead |
| OWASP ZAP | DAST scanning | Month 2 | QA lead |
| SonarQube | SAST in CI | Month 1 | DevOps engineer |
| Dependabot | Dependency CVE scanning | Month 1 | DevOps engineer |
| External pen test firm | Professional penetration test | Month 2 week 3 | QA lead (coordinate) |
| AWS FIS | Fault injection for failover tests | Month 2 | DevOps engineer |
| NVDA (Windows VM) | Screen reader testing | Month 2 | QA lead |
| Staging environment (production-equivalent) | All load, security, and failover tests | Available Month 2 | DevOps engineer |
| Seeded test database (100K products) | Realistic load test data | Month 2 before load tests | DevOps engineer |
The two highest-risk dependencies: the production-equivalent staging environment (without it, Month 2 tests cannot run as designed) and the seeded database (without realistic data volume, load test results are misleading). Both dependencies have the DevOps engineer as owner — verify availability before Month 2 begins, not after.