Most classes in test code exist to hold data: a TestUser, a Product, an ApiResponse, a TestResult. You write __init__ to assign every field, __repr__ so prints are readable, __eq__ so comparisons work — and the file fills up with boilerplate that adds no information. Python 3.7's @dataclass decorator generates all of that from a tiny field declaration. This lesson covers the basics, defaults (including the mutable-default trap), frozen=True, asdict, methods on dataclasses, @property, and when not to use them.
The boilerplate @dataclass saves
The hand-written class:
class TestUser:
def __init__(self, name, email, role="tester", is_active=True):
self.name = name
self.email = email
self.role = role
self.is_active = is_active
def __repr__(self):
return (f"TestUser(name={self.name!r}, email={self.email!r}, "
f"role={self.role!r}, is_active={self.is_active!r})")
def __eq__(self, other):
if not isinstance(other, TestUser):
return NotImplemented
return (self.name == other.name and
self.email == other.email and
self.role == other.role and
self.is_active == other.is_active)Fifteen lines of mostly mechanical code. Now the dataclass version:
from dataclasses import dataclass
@dataclass
class TestUser:
name: str
email: str
role: str = "tester"
is_active: bool = True
alice = TestUser("Alice", "alice@test.com", "admin")
print(alice)
# TestUser(name='Alice', email='alice@test.com', role='admin', is_active=True)
bob = TestUser("Alice", "alice@test.com", "admin")
print(alice == bob) # True — equality compares every fieldFive lines. Same __init__, same readable __repr__, same field-by-field __eq__, same constructor signature. The @dataclass decorator generated all the dunder methods from the type-annotated fields above.
Reading the syntax
Inside a @dataclass, every line of the form name: Type = default is a field. The type annotation is mandatory — that's how the decorator knows it's a field rather than a class-level constant or an annotation-only attribute.
Three legal shapes:
@dataclass
class TestResult:
name: str # required field
duration_ms: int = 0 # field with default
tags: list = field(default_factory=list) # mutable default — see belowFields with defaults must come after fields without defaults — same rule as function parameters. Try the wrong order and Python raises TypeError: non-default argument 'name' follows default argument.
The mutable-default trap — field(default_factory=list)
You can't write tags: list = [] in a dataclass — Python catches that at class-definition time and refuses, because reusing the same [] across every instance would mutate-on-mutate the same list. (This is the same trap you saw with mutable function defaults in chapter 2.)
The right form uses field(default_factory=...):
from dataclasses import dataclass, field
@dataclass
class TestSuite:
name: str
test_cases: list = field(default_factory=list)
metadata: dict = field(default_factory=dict)default_factory takes a callable. Python calls it once per __init__, creating a fresh empty list (or dict) per instance. For non-mutable defaults (int, str, bool, None), regular = 0 / = "" / = False is fine.
Frozen dataclasses — immutable records
@dataclass(frozen=True) makes the instance immutable. Setting a field after construction raises FrozenInstanceError:
from dataclasses import dataclass
@dataclass(frozen=True)
class TestEnvironment:
name: str
base_url: str
region: str
env = TestEnvironment("staging", "https://staging.api.example.com", "eu-west-1")
env.region = "us-east-1" # FrozenInstanceError: cannot assign to field 'region'Frozen dataclasses are perfect for value objects — config snapshots, API contract records, anything you want to pass around as a fact rather than a mutable bag. They're also hashable by default, which means you can use them as dict keys or set members:
seen = set()
seen.add(TestEnvironment("staging", "...", "eu-west-1"))Mutable dataclasses (the default) are not hashable — Python refuses to allow it because hashing a value that can change would corrupt sets and dicts.
asdict and astuple — converting to plain data
Sometimes you need a dataclass as a dict (to send as JSON, for example). The standard library has helpers:
from dataclasses import dataclass, asdict, astuple
@dataclass
class TestUser:
name: str
email: str
role: str = "tester"
user = TestUser("Alice", "alice@test.com", "admin")
asdict(user)
# {'name': 'Alice', 'email': 'alice@test.com', 'role': 'admin'}
astuple(user)
# ('Alice', 'alice@test.com', 'admin')asdict recurses into nested dataclasses and converts them too. The result is a plain dict you can hand to json.dumps or requests.post(json=...).
Methods and @property on dataclasses
A dataclass is still a class — it can have methods. They go after the field declarations:
from dataclasses import dataclass
@dataclass
class TestResult:
name: str
status: str
duration_ms: int
@property
def duration_seconds(self) -> float:
return self.duration_ms / 1000
def is_slow(self, threshold_ms: int = 2000) -> bool:
return self.duration_ms > threshold_ms
def label(self) -> str:
return f"{self.name}: {self.status} ({self.duration_seconds:.2f}s)"
r = TestResult("login", "PASS", 1240)
print(r.duration_seconds) # 1.24 (property — no parens)
print(r.is_slow()) # False
print(r.label()) # login: PASS (1.24s)This is exactly where dataclasses shine — your data model gets the auto-generated __init__ and __repr__, plus you bolt on the small computed properties and predicates the rest of your code needs. It's the same shape as a regular class, just with the boilerplate gone.
When NOT to use a dataclass
Three cases where a dict or a regular class fits better:
- The shape is dynamic. If you're parsing JSON whose schema varies (
response.json()returning different keys per endpoint), a dict is more honest. - You need substantial logic in
__init__. A dataclass's__init__is auto-generated and just assigns each field. If construction needs side effects (open a session, read a file), use a regular class — or use__post_init__, a hook the dataclass calls after the auto-generated__init__finishes. - You're modelling something with extensive behaviour, not data. A
TestRunnerthat mostly does things (run tests, retry, report) is better as a regular class. ATestResultthat mostly holds things is the dataclass case.
Regular class vs @dataclass — same data, two shapes
Modelling TestUser — regular class vs @dataclass
Regular class — 15+ lines
Write __init__ that assigns every field
Write __repr__ for readable prints
Write __eq__ for value comparison
Add __hash__ if you want it in sets/dicts
Every new field touches three or four methods
Wins when construction needs real logic, not just assignment
@dataclass — 5 lines
@dataclass + name: type lines, that's it
__init__, __repr__, __eq__ generated automatically
frozen=True gives immutability and free __hash__
asdict() / astuple() converts to plain data
Adding a field is one line — the dunders update themselves
Wins for stable-shape test models, fixtures, configs
Both produce equivalent runtime objects. The dataclass is what you should reach for by default; drop down to a regular class when construction needs more than mechanical assignment.
A worked example — three QA models
from dataclasses import dataclass, field, asdict
@dataclass
class TestUser:
name: str
email: str
role: str = "tester"
is_active: bool = True
def email_domain(self) -> str:
return self.email.split("@")[-1]
@dataclass(frozen=True)
class Product:
sku: str
name: str
price: float
in_stock: bool = True
@dataclass
class TestResult:
name: str
status: str
duration_ms: int
tags: list = field(default_factory=list)
@property
def duration_seconds(self) -> float:
return self.duration_ms / 1000
def is_slow(self, threshold_ms: int = 1000) -> bool:
return self.duration_ms > threshold_ms
user = TestUser("Alice", "alice@test.com", "admin")
print(user)
print(user.email_domain())
product = Product("SKU-001", "Widget", 19.99)
print(product)
# product.price = 9.99 # FrozenInstanceError
result = TestResult("login", "PASS", 1240, tags=["smoke", "auth"])
print(result.is_slow())
print(asdict(result)) # → ready to send as JSONThree short class declarations and you've got a typed, comparable, printable model for the test data the rest of your suite cares about. The asdict(result) line is what you'd hand to json.dumps or requests.post(json=...) — dataclasses bridge cleanly to the JSON world.
⚠️ Common mistakes
tags: list = []instead offield(default_factory=list). Python raises aValueErrorat class-definition time precisely to stop you from sharing a mutable default across instances. Usefield(default_factory=list)for any list/dict/set field with a default.- Forgetting the type annotation.
name = ""(no: str) is treated as a class-level constant, not a field — the dataclass decorator ignores it. Every field needsname: Typewith the annotation. - Using
@dataclassfor a class whose__init__does real work. If construction opens a file, builds arequests.Session, or validates input, the auto-generated__init__won't suit. Either drop the decorator or use__post_init__(self)for the extra logic — it runs after the auto-generated__init__.
🎯 Practice task
Convert hand-written classes to dataclasses, then add behaviour. 25-30 minutes.
- Create
data_models.py. Start withfrom dataclasses import dataclass, field, asdict. - Define
@dataclass class TestUser:withname: str,email: str,role: str = "tester",is_active: bool = True. Add a methodemail_domain(self) -> str:that returns the part after@. - Define
@dataclass(frozen=True) class TestEnvironment:withname: str,base_url: str. Show that setting.nameafter creation raisesFrozenInstanceError(catch it withtry/except). - Define
@dataclass class TestSuite:withname: str,test_cases: list = field(default_factory=list). Adddef add(self, name: str) -> None:that appends a name. Build a suite, add three test cases, print the suite. - Define
@dataclass class TestResult:withname: str,status: str,duration_ms: int. Add@property duration_seconds(self)returning seconds anddef is_slow(self, threshold_ms: int = 1000) -> bool:. - Build at least three TestResults with different durations. Use a list comprehension to find the slow ones. Print the count and their names.
- Show off
asdict:print(asdict(user)). Pretend you're sending it as JSON —import json; print(json.dumps(asdict(user), indent=2)). - Define
@dataclass class TestRun:withid: str,results: list = field(default_factory=list). Add a methodpass_rate(self) -> float:returningpassed / total, defaulting to0.0when empty. - Stretch: add
__post_init__(self)toTestUserthat convertsself.emailto lowercase. ConfirmTestUser("Alice", "ALICE@TEST.COM").email == "alice@test.com".__post_init__is the dataclass equivalent of "do extra setup after the auto-generated__init__."
You can now model any QA fixture, response, or test result with one short class declaration. The next chapter shifts from data and inheritance to error handling and code organisation — try/except, custom exceptions, modules, and virtual environments — the structural pieces a real Python project needs.