Testing GraphQL APIs — API Testing Masterclass

You've seen what GraphQL is and how queries, mutations, and subscriptions are shaped. This lesson is about turning that knowledge into a concrete test plan: which assertions to write, what to negative-test, and the GraphQL-specific bugs you should actively look for. The mental model carries over from REST testing — auth, validation, errors, performance — but each shifts in subtle ways that, if you miss them, leave gaps.

The three-step assertion pattern

Every GraphQL test should answer three questions in this order:

Did the HTTP request succeed? (Status code 200.)
Did the GraphQL operation succeed? (errors array is null or empty.)
Is the data correct? (Specific field assertions on data.)

Send POST /graphqlBody: { query, variables }

HTTP 200?If not, parse error or transport issue

errors empty?If errors present, fail the test with de…

Assert on dataCheck shape and values

Skipping step 2 is the single biggest GraphQL testing mistake. Many test suites assert on the data without ever checking errors, so they happily report success on a partially-failed response.

A reusable helper covers the first two steps:

def gql(query: str, variables: dict | None = None):
    response = requests.post(
        GRAPHQL_URL,
        json={"query": query, "variables": variables or {}},
        headers={"Authorization": f"Bearer {token}"},
        timeout=5
    )
    assert response.status_code == 200, f"HTTP {response.status_code}: {response.text}"
    body = response.json()
    if body.get("errors"):
        raise AssertionError(f"GraphQL errors: {body['errors']}")
    return body["data"]

Tests then call data = gql("{ user(id: 42) { email } }") and assert against the parsed data.

Testing queries

A query test should cover:

Happy path — valid arguments, expected fields populated.
Field selection — request a subset of fields, verify only those come back.
Nested data — request user { orders { items } }, verify deep structure.
Arguments — user(id: 42) returns user 42; user(id: 43) returns user 43.
Empty results — user(id: 999999) returns data.user: null (not an error in most schemas).
Invalid field — { user(id: 42) { nonExistentField } } → 400 or 200 with errors describing the unknown field.
Wrong argument type — user(id: true) → validation error.
Required argument missing — user { name } (no id) → validation error.
Authorisation — anonymous request to a protected query → error with code: UNAUTHENTICATED.

A subtlety: data.user: null and a errors entry mean different things. In GraphQL, null means "the field resolved successfully and the value happens to be null." An entry in errors means the field couldn't be resolved. Treat them differently in your assertions.

Testing mutations

Mutations need the same rigour as REST POST/PUT/DELETE endpoints:

Happy path — valid input → mutation succeeds, returned fields match.
Missing required input → validation error before the mutation runs.
Invalid input values → resolver-level error (e.g. duplicate email → code: CONFLICT).
Authentication — no token → unauthenticated error.
Authorisation — token with wrong scope/role → forbidden error.
Idempotency — calling the same mutation twice. Does it create two records, or detect the duplicate?
Side effects — verify the change actually happened (DB read or follow-up query).

The "follow-up query" pattern is GraphQL-specific and powerful:

data = gql(
    "mutation Create($input: UserInput!) { createUser(input: $input) { id } }",
    variables={"input": {"name": "Alice", "email": "alice@test.com"}}
)
new_id = data["createUser"]["id"]
 
data = gql("query Get($id: ID!) { user(id: $id) { email } }", {"id": new_id})
assert data["user"]["email"] == "alice@test.com"

Two operations, end-to-end verification, all over the same /graphql endpoint.

Errors in the response body

A typical GraphQL error response:

{
  "data": { "user": null },
  "errors": [
    {
      "message": "User not found",
      "path": ["user"],
      "extensions": { "code": "NOT_FOUND" }
    }
  ]
}

Each error has a message, a path indicating which field in the query failed, and extensions holding structured metadata (often an error code). When asserting on errors, prefer the extensions.code over the human message — codes are stable; messages change wording.

errors = body.get("errors", [])
assert len(errors) == 1
assert errors[0]["extensions"]["code"] == "NOT_FOUND"

A frequent cause of confusion: a GraphQL response can have both data and errors populated. If your query asks for ten things and three fail, the response includes the seven that succeeded plus three error entries. Test for this partial success explicitly when it matters.

Introspection

GraphQL servers expose a meta-query that returns the entire schema:

query {
  __schema {
    types { name kind }
  }
}

Useful in development; risky in production. Many teams disable introspection on production to make API surface reconnaissance harder for attackers. Worth a test:

response = requests.post(prod_url, json={"query": "{ __schema { types { name } } }"})
assert response.json().get("errors"), "Introspection should be disabled in production"

In staging or development, the opposite assertion may apply — confirm introspection works so the team can debug schema issues.

N+1 query risk

GraphQL's flexibility lets a client ask for users { posts { comments } } in one request. A naive backend implementation issues:

1 query to fetch the users.
1 query per user to fetch their posts.
1 query per post to fetch its comments.

For 100 users with 10 posts each, that's 1 + 100 + 1,000 = 1,101 database queries to satisfy a single GraphQL request. Backend developers typically defend against this with a batching layer (DataLoader). Tests can detect when the defence is missing:

Run the query against a test database with logging enabled.
Count the SQL queries triggered.
Assert "fewer than N" — typically 5-10 — for a query that should fan out widely.

If you don't have DB-level instrumentation, response time is a usable proxy: an N+1 explosion shows up as a 5-30× latency increase on nested queries.

Query depth and complexity limits

A malicious or buggy client can send a deeply nested query:

{
  user {
    friends {
      friends {
        friends {
          friends { id name }
        }
      }
    }
  }
}

Without limits, the server traverses an exponentially growing set. A defence layer (graphql-depth-limit, query complexity calculators) should reject deep or expensive queries before they run. As QA, the test:

Send a deeply nested query past the documented limit.
Expect an error (typically QUERY_TOO_COMPLEX or similar) and a fast response (the server doesn't actually execute the query).

If the server runs the deep query to completion, you've found a denial-of-service vector worth flagging.

A worked test plan

For a User type with a createUser mutation and a user(id) query, the standing test set looks like:

Query — user(id):
  ✓ Valid id → data.user with all fields
  ✓ Subset selection → only requested fields
  ✓ Nested orders → deep shape
  ✓ Non-existent id → data.user is null, no errors
  ✓ Missing id arg → validation error
  ✓ Wrong type id → validation error
  ✓ Anonymous → UNAUTHENTICATED error

Mutation — createUser:
  ✓ Valid input → returns id
  ✓ Created user retrievable via user(id) query
  ✓ Missing email → validation error
  ✓ Duplicate email → CONFLICT error
  ✓ Anonymous → UNAUTHENTICATED
  ✓ Insufficient role → FORBIDDEN

Schema/security:
  ✓ Introspection disabled in production
  ✓ Excessive depth rejected with depth-limit error
  ✓ Response time on nested user.orders.items query under threshold

About fifteen tests per type. Parameterise where possible to keep maintenance low.

⚠️ Common mistakes

Asserting only on the data field. A response with data: null and an errors array passes a naive assert data["user"]["email"] == ... test by raising a KeyError — but the failure message is unhelpful. Always check errors first.
Skipping introspection tests in production. A leaked schema makes attacks easier. Verify it's disabled where it should be.
Accepting any extensions.code as fine. The server may return a generic INTERNAL_SERVER_ERROR for what should be a specific NOT_FOUND or VALIDATION_ERROR. Assert on the correct code.

🎯 Practice task

Build a small GraphQL test suite. 30-40 minutes.

Pick a public GraphQL API — Countries, SpaceX, or GitHub GraphQL. Use one that doesn't require auth so you can iterate fast.
Write a gql() helper in your favourite language that posts a query, checks HTTP 200, raises on errors, and returns data.
Write three positive tests: a simple query, a query with variables, and a query with nested data.
Write three negative tests: unknown field, wrong argument type, missing required argument. Assert on the errors array's extensions.code where available.
Try an introspection query ({ __schema { types { name } } }). Note whether it works on this API.
Stretch: time a single-level query and a deeply-nested query. The nested one should be slower — sometimes dramatically. That's the N+1 signal.

You can now write meaningful tests against any GraphQL API. The final lesson of this chapter catalogues the GraphQL-specific bugs and pitfalls that surprise even experienced testers.