Working with JSON in Python

8 min read

JSON is the data format every web API speaks and every test fixture uses. Python's standard library ships with a json module, so you don't need to install anything — import json and you're working. JSON's six types map almost one-to-one onto Python's built-in types, which is why moving data between an API and a test script feels frictionless. This lesson covers parsing JSON strings, serialising Python data, reading and writing JSON files, navigating nested structures, and the small set of errors that come up in practice.

The four functions you'll actually use

import json

Four functions cover everything:

FunctionDirectionMnemonic
json.loads()JSON string → Python"load string"
json.dumps()Python → JSON string"dump string"
json.load()JSON file → Python(no s) — works on a file object
json.dump()Python → JSON file(no s) — writes to a file object

The s literally means "string." loads/dumps deal with strings; load/dump deal with file objects. Mixing them up is the most common JSON error in Python.

Parsing a JSON string — json.loads

import json
 
raw = '{"id": 7, "name": "Alice", "roles": ["admin", "tester"]}'
data = json.loads(raw)
 
print(type(data))       # <class 'dict'>
print(data["name"])     # "Alice"
print(data["roles"][0]) # "admin"

json.loads parses a JSON string and returns Python objects — a dict here, with a nested list. Once parsed, you index it like any dict or list.

Serialising — json.dumps

The other direction:

import json
 
result = {
    "name": "login_test",
    "status": "PASS",
    "duration_ms": 1240,
    "tags": ["smoke", "auth"]
}
 
print(json.dumps(result))

Output (one line):

{"name": "login_test", "status": "PASS", "duration_ms": 1240, "tags": ["smoke", "auth"]}

For a human-readable form, pass indent=2 (or 4):

print(json.dumps(result, indent=2))

Output:

{
  "name": "login_test",
  "status": "PASS",
  "duration_ms": 1240,
  "tags": [
    "smoke",
    "auth"
  ]
}

That's the format you'd save to disk for a test fixture or a CI artefact.

Reading a JSON file — json.load

The same with open() pattern you'd use for any file:

import json
 
with open("fixtures/users.json", "r") as f:
    users = json.load(f)
 
print(f"Loaded {len(users)} users")

json.load(f) reads the entire file and parses it in one step. Notice it's load, not loads — the function operates on the file object, not on a string.

You could read the file into a string yourself and call loads, but load saves a step:

# Equivalent — but unnecessary
with open("fixtures/users.json", "r") as f:
    raw = f.read()
    users = json.loads(raw)

The with statement is Python's context manager. It guarantees the file is closed when the block ends, even if the JSON parser raises. Always use with open(...) for files; we'll cover context managers properly in chapter 4.

Writing a JSON file — json.dump

import json
 
results = [
    {"name": "login",    "status": "PASS"},
    {"name": "checkout", "status": "FAIL"}
]
 
with open("output/results.json", "w") as f:
    json.dump(results, f, indent=2)

Same with shape, mode "w" for write. json.dump(obj, f) writes the JSON straight into the file. indent=2 prettifies it; without it everything goes on one line.

JSON ↔ Python type mapping

Every JSON type has a clear Python counterpart. This is the table you'll memorise within a week:

JSON to Python type mapping

JSON

  • object — { "k": "v" }

  • array — [ 1, 2, 3 ]

  • string — "hello"

  • number — 42 or 1.5

  • true / false

  • null

Python

  • dict — { "k": "v" }

  • list — [ 1, 2, 3 ]

  • str — "hello"

  • int — 42 or float — 1.5

  • True / False (note the capitals)

  • None

The two columns line up directly. Once you load JSON, you're working with normal Python collections — no special "JsonNode" class to navigate (looking at you, Java's Jackson).

JSON from an API is usually nested several levels deep. Each ["…"] step drills one more level; [i] indexes into a list:

import json
 
raw = '''
{
  "user": {
    "id": 7,
    "name": "Alice",
    "roles": ["admin", "tester"],
    "preferences": {"theme": "dark"}
  }
}
'''
 
data = json.loads(raw)
 
name        = data["user"]["name"]                # "Alice"
first_role  = data["user"]["roles"][0]            # "admin"
theme       = data["user"]["preferences"]["theme"] # "dark"
 
print(name, first_role, theme)

Output:

Alice admin dark

That's the same shape you'd get from any REST API: a dict at the top, with keys that may point at sub-dicts, lists, or scalars. The whole tree is just nested dicts and lists.

Safe access — .get() chaining

Bracket access raises KeyError the moment a key is missing — sometimes you want that, sometimes you don't. For optional paths, chain .get() with sensible default fallbacks:

data = json.loads('{"user": {}}')
 
name = data.get("user", {}).get("name", "Unknown")
print(name)        # "Unknown"

The trick is the {} mid-chain. Without it, data.get("user") could return None, and the next .get() would fail with 'NoneType' object has no attribute 'get'. Default each level to an empty dict so the next call still works.

For deeper paths, this pattern grows ugly. Tools like glom (third-party) or dataclasses (chapter 5) tame deep access; for now, two or three levels of .get(..., {}) are fine.

The errors you'll meet

Three errors come up in practice:

data = json.loads('{"name": "Alice"}')
 
# 1. KeyError — wrong / missing key
data["email"]                # KeyError: 'email'
 
# 2. TypeError — indexing the wrong type
data["name"]["first"]        # TypeError: string indices must be integers
 
# 3. JSONDecodeError — malformed JSON
json.loads("{invalid}")      # json.decoder.JSONDecodeError: …

Two practical habits help:

  • Wrap risky parsing in try / except (chapter 6 covers exception handling fully).
  • Use .get() for optional keys; reserve [] for keys you're confident must exist — its loud failure is then an alarm bell, not noise.

A QA example — filter a fixture file

A complete, runnable script that reads users from a JSON file, picks out the admins, and writes them to a new file:

import json
 
# fixtures/users.json contains an array of users
with open("fixtures/users.json", "r") as f:
    users = json.load(f)
 
admins = [u for u in users if u.get("role") == "admin"]
 
with open("output/admins.json", "w") as f:
    json.dump(admins, f, indent=2)
 
print(f"Wrote {len(admins)} admin users to output/admins.json")

Three steps — read, filter, write — and you have a tiny ETL pipeline. The list comprehension from chapter 2 does the filtering; everything else is json.load and json.dump. No JSON library to install, no schema to declare upfront, no pom.xml to update.

Tip: if you need to inspect a JSON file without writing a script, the JSON Formatter utility on qa.codes takes raw JSON and returns a formatted, copyable view. Useful when an API response in your terminal is a single 2 KB line.

Comparing to Java and JavaScript

TaskPythonJavaScriptJava (Jackson)
Parse a stringjson.loads(s)JSON.parse(s)mapper.readValue(s, Map.class)
Stringify a valuejson.dumps(obj)JSON.stringify(obj)mapper.writeValueAsString(obj)
Read a filejson.load(open(...))JSON.parse(fs.readFileSync(...))mapper.readValue(file, Map.class)
Pretty-printdumps(obj, indent=2)JSON.stringify(obj, null, 2)writerWithDefaultPrettyPrinter()

Python's API is the most uniform of the three — same names for serialise/deserialise, just s for "string" and no s for "file." JavaScript is also concise but works on strings only, so file reading is two steps. Java is the most ceremonial.

⚠️ Common mistakes

  • Mixing loads and load. json.loads takes a string; json.load takes a file object. Swap them and you get AttributeError ("'str' object has no attribute 'read'") or "TypeError: the JSON object must be str, bytes or bytearray." Remember: the trailing s is for strings.
  • Forgetting indent when writing for humans. json.dump(data, f) writes the entire object on a single line — fine for machine-to-machine, miserable to read in a fixture file. Pass indent=2 for any file a human will inspect.
  • Trying to dump types JSON doesn't know. json.dumps({"set": {1, 2, 3}}) raises TypeError: Object of type set is not JSON serializable. JSON has no set type. Convert sets to lists (list(my_set)), datetimes to ISO strings, etc., before serialising.

🎯 Practice task

Read, transform, and write JSON. 25-30 minutes.

  1. Create fixtures/users.json containing an array of at least six user objects, each with name, email, role (mix of "admin", "tester", "viewer"), and active (true or false).
  2. Create filter_users.py. Use json.load to read the file into users.
  3. Print len(users) and the names of the first two using indexing.
  4. Use a list comprehension (chapter 2) to build admins = [u for u in users if u["role"] == "admin"].
  5. Use json.dump(admins, f, indent=2) to write them to output/admins.json. Confirm the file is readable and pretty-printed.
  6. Print the parsed dict in two ways: json.dumps(users[0]) (one line) and json.dumps(users[0], indent=2) (multi-line). Notice the difference.
  7. Demonstrate safe access: data.get("missing", "fallback") for a missing key, and the chained pattern users[0].get("metadata", {}).get("source", "unknown").
  8. Demonstrate one error: try json.loads("{nope}") inside a try / except json.JSONDecodeError as e: and print e.msg.
  9. Stretch: write a function def summarise(users: list) -> dict: that returns a dict like {"total": …, "admins": …, "active": …, "inactive": …}. Use json.dumps(summary, indent=2) to print the result. Bonus: persist it to output/summary.json.

You now have the full toolkit for working with the JSON every API and every fixture file uses. The next chapter zooms out from in-memory data to files and APIs — opening text and CSV files, calling HTTP endpoints with requests, and parsing the responses you get back.

// tip to track lessons you complete and pick up where you left off across devices.