Using Dataclasses for Test Data Models

8 min read

Most classes in test code exist to hold data: a TestUser, a Product, an ApiResponse, a TestResult. You write __init__ to assign every field, __repr__ so prints are readable, __eq__ so comparisons work — and the file fills up with boilerplate that adds no information. Python 3.7's @dataclass decorator generates all of that from a tiny field declaration. This lesson covers the basics, defaults (including the mutable-default trap), frozen=True, asdict, methods on dataclasses, @property, and when not to use them.

The boilerplate @dataclass saves

The hand-written class:

class TestUser:
    def __init__(self, name, email, role="tester", is_active=True):
        self.name = name
        self.email = email
        self.role = role
        self.is_active = is_active
 
    def __repr__(self):
        return (f"TestUser(name={self.name!r}, email={self.email!r}, "
                f"role={self.role!r}, is_active={self.is_active!r})")
 
    def __eq__(self, other):
        if not isinstance(other, TestUser):
            return NotImplemented
        return (self.name == other.name and
                self.email == other.email and
                self.role == other.role and
                self.is_active == other.is_active)

Fifteen lines of mostly mechanical code. Now the dataclass version:

from dataclasses import dataclass
 
@dataclass
class TestUser:
    name: str
    email: str
    role: str = "tester"
    is_active: bool = True
 
alice = TestUser("Alice", "alice@test.com", "admin")
print(alice)
# TestUser(name='Alice', email='alice@test.com', role='admin', is_active=True)
 
bob = TestUser("Alice", "alice@test.com", "admin")
print(alice == bob)        # True — equality compares every field

Five lines. Same __init__, same readable __repr__, same field-by-field __eq__, same constructor signature. The @dataclass decorator generated all the dunder methods from the type-annotated fields above.

Reading the syntax

Inside a @dataclass, every line of the form name: Type = default is a field. The type annotation is mandatory — that's how the decorator knows it's a field rather than a class-level constant or an annotation-only attribute.

Three legal shapes:

@dataclass
class TestResult:
    name: str                       # required field
    duration_ms: int = 0            # field with default
    tags: list = field(default_factory=list)  # mutable default — see below

Fields with defaults must come after fields without defaults — same rule as function parameters. Try the wrong order and Python raises TypeError: non-default argument 'name' follows default argument.

The mutable-default trap — field(default_factory=list)

You can't write tags: list = [] in a dataclass — Python catches that at class-definition time and refuses, because reusing the same [] across every instance would mutate-on-mutate the same list. (This is the same trap you saw with mutable function defaults in chapter 2.)

The right form uses field(default_factory=...):

from dataclasses import dataclass, field
 
@dataclass
class TestSuite:
    name: str
    test_cases: list = field(default_factory=list)
    metadata: dict = field(default_factory=dict)

default_factory takes a callable. Python calls it once per __init__, creating a fresh empty list (or dict) per instance. For non-mutable defaults (int, str, bool, None), regular = 0 / = "" / = False is fine.

Frozen dataclasses — immutable records

@dataclass(frozen=True) makes the instance immutable. Setting a field after construction raises FrozenInstanceError:

from dataclasses import dataclass
 
@dataclass(frozen=True)
class TestEnvironment:
    name: str
    base_url: str
    region: str
 
env = TestEnvironment("staging", "https://staging.api.example.com", "eu-west-1")
env.region = "us-east-1"        # FrozenInstanceError: cannot assign to field 'region'

Frozen dataclasses are perfect for value objects — config snapshots, API contract records, anything you want to pass around as a fact rather than a mutable bag. They're also hashable by default, which means you can use them as dict keys or set members:

seen = set()
seen.add(TestEnvironment("staging", "...", "eu-west-1"))

Mutable dataclasses (the default) are not hashable — Python refuses to allow it because hashing a value that can change would corrupt sets and dicts.

asdict and astuple — converting to plain data

Sometimes you need a dataclass as a dict (to send as JSON, for example). The standard library has helpers:

from dataclasses import dataclass, asdict, astuple
 
@dataclass
class TestUser:
    name: str
    email: str
    role: str = "tester"
 
user = TestUser("Alice", "alice@test.com", "admin")
 
asdict(user)
# {'name': 'Alice', 'email': 'alice@test.com', 'role': 'admin'}
 
astuple(user)
# ('Alice', 'alice@test.com', 'admin')

asdict recurses into nested dataclasses and converts them too. The result is a plain dict you can hand to json.dumps or requests.post(json=...).

Methods and @property on dataclasses

A dataclass is still a class — it can have methods. They go after the field declarations:

from dataclasses import dataclass
 
@dataclass
class TestResult:
    name: str
    status: str
    duration_ms: int
 
    @property
    def duration_seconds(self) -> float:
        return self.duration_ms / 1000
 
    def is_slow(self, threshold_ms: int = 2000) -> bool:
        return self.duration_ms > threshold_ms
 
    def label(self) -> str:
        return f"{self.name}: {self.status} ({self.duration_seconds:.2f}s)"
 
 
r = TestResult("login", "PASS", 1240)
print(r.duration_seconds)        # 1.24  (property — no parens)
print(r.is_slow())               # False
print(r.label())                 # login: PASS (1.24s)

This is exactly where dataclasses shine — your data model gets the auto-generated __init__ and __repr__, plus you bolt on the small computed properties and predicates the rest of your code needs. It's the same shape as a regular class, just with the boilerplate gone.

When NOT to use a dataclass

Three cases where a dict or a regular class fits better:

  • The shape is dynamic. If you're parsing JSON whose schema varies (response.json() returning different keys per endpoint), a dict is more honest.
  • You need substantial logic in __init__. A dataclass's __init__ is auto-generated and just assigns each field. If construction needs side effects (open a session, read a file), use a regular class — or use __post_init__, a hook the dataclass calls after the auto-generated __init__ finishes.
  • You're modelling something with extensive behaviour, not data. A TestRunner that mostly does things (run tests, retry, report) is better as a regular class. A TestResult that mostly holds things is the dataclass case.

Regular class vs @dataclass — same data, two shapes

Modelling TestUser — regular class vs @dataclass

Regular class — 15+ lines

  • Write __init__ that assigns every field

  • Write __repr__ for readable prints

  • Write __eq__ for value comparison

  • Add __hash__ if you want it in sets/dicts

  • Every new field touches three or four methods

  • Wins when construction needs real logic, not just assignment

@dataclass — 5 lines

  • @dataclass + name: type lines, that's it

  • __init__, __repr__, __eq__ generated automatically

  • frozen=True gives immutability and free __hash__

  • asdict() / astuple() converts to plain data

  • Adding a field is one line — the dunders update themselves

  • Wins for stable-shape test models, fixtures, configs

Both produce equivalent runtime objects. The dataclass is what you should reach for by default; drop down to a regular class when construction needs more than mechanical assignment.

A worked example — three QA models

from dataclasses import dataclass, field, asdict
 
@dataclass
class TestUser:
    name: str
    email: str
    role: str = "tester"
    is_active: bool = True
 
    def email_domain(self) -> str:
        return self.email.split("@")[-1]
 
 
@dataclass(frozen=True)
class Product:
    sku: str
    name: str
    price: float
    in_stock: bool = True
 
 
@dataclass
class TestResult:
    name: str
    status: str
    duration_ms: int
    tags: list = field(default_factory=list)
 
    @property
    def duration_seconds(self) -> float:
        return self.duration_ms / 1000
 
    def is_slow(self, threshold_ms: int = 1000) -> bool:
        return self.duration_ms > threshold_ms
 
 
user = TestUser("Alice", "alice@test.com", "admin")
print(user)
print(user.email_domain())
 
product = Product("SKU-001", "Widget", 19.99)
print(product)
# product.price = 9.99   # FrozenInstanceError
 
result = TestResult("login", "PASS", 1240, tags=["smoke", "auth"])
print(result.is_slow())
print(asdict(result))    # → ready to send as JSON

Three short class declarations and you've got a typed, comparable, printable model for the test data the rest of your suite cares about. The asdict(result) line is what you'd hand to json.dumps or requests.post(json=...) — dataclasses bridge cleanly to the JSON world.

⚠️ Common mistakes

  • tags: list = [] instead of field(default_factory=list). Python raises a ValueError at class-definition time precisely to stop you from sharing a mutable default across instances. Use field(default_factory=list) for any list/dict/set field with a default.
  • Forgetting the type annotation. name = "" (no : str) is treated as a class-level constant, not a field — the dataclass decorator ignores it. Every field needs name: Type with the annotation.
  • Using @dataclass for a class whose __init__ does real work. If construction opens a file, builds a requests.Session, or validates input, the auto-generated __init__ won't suit. Either drop the decorator or use __post_init__(self) for the extra logic — it runs after the auto-generated __init__.

🎯 Practice task

Convert hand-written classes to dataclasses, then add behaviour. 25-30 minutes.

  1. Create data_models.py. Start with from dataclasses import dataclass, field, asdict.
  2. Define @dataclass class TestUser: with name: str, email: str, role: str = "tester", is_active: bool = True. Add a method email_domain(self) -> str: that returns the part after @.
  3. Define @dataclass(frozen=True) class TestEnvironment: with name: str, base_url: str. Show that setting .name after creation raises FrozenInstanceError (catch it with try/except).
  4. Define @dataclass class TestSuite: with name: str, test_cases: list = field(default_factory=list). Add def add(self, name: str) -> None: that appends a name. Build a suite, add three test cases, print the suite.
  5. Define @dataclass class TestResult: with name: str, status: str, duration_ms: int. Add @property duration_seconds(self) returning seconds and def is_slow(self, threshold_ms: int = 1000) -> bool:.
  6. Build at least three TestResults with different durations. Use a list comprehension to find the slow ones. Print the count and their names.
  7. Show off asdict: print(asdict(user)). Pretend you're sending it as JSON — import json; print(json.dumps(asdict(user), indent=2)).
  8. Define @dataclass class TestRun: with id: str, results: list = field(default_factory=list). Add a method pass_rate(self) -> float: returning passed / total, defaulting to 0.0 when empty.
  9. Stretch: add __post_init__(self) to TestUser that converts self.email to lowercase. Confirm TestUser("Alice", "ALICE@TEST.COM").email == "alice@test.com". __post_init__ is the dataclass equivalent of "do extra setup after the auto-generated __init__."

You can now model any QA fixture, response, or test result with one short class declaration. The next chapter shifts from data and inheritance to error handling and code organisationtry/except, custom exceptions, modules, and virtual environments — the structural pieces a real Python project needs.

// tip to track lessons you complete and pick up where you left off across devices.