What are Dataclasses in Python?

10 mins read

Definition

The dataclasses in Python introduced in version 3.7, provide a convenient way to create a class and store data values. It is similar to regular Python classes, but in addition, it also generates special methods which make it simple without having to write repetitive code blocks for magic methods of classes like __init__, __repr__ and __eq__.

Python dataclass example

To define a data class in Python, we use a decorator @dataclass and annotate the types of class attributes also called Type Hints. Here is a basic example, of how to use Python data classes and what are dataclasses in Python:

from dataclasses import dataclass
@dataclass
class Person:
    name: str
    age: int
    city: str

When to use dataclasses in Python?

If you want to create multiple classes without writing the magic methods like __init__, __repr__ and __eq__ again and again for each class, you can use dataclasses. Just use the decorator @dataclass before your normal class. For example here is a normal class:

class Person:
    def __init__(self, name, age, city):
        self.name = name
        self.age = age
        self.city = city
    def __repr__(self):
        return "Person(name={}, age={}, city={})".format(self.name, self.age, self.city)
    def __eq__(self, other):
        return (self.name, self.age, self.city) == (other.name, other.age, other.city)

The above can be done in a few lines only with the help of Dataclass

from dataclasses import dataclass
@dataclass
class Person:
    name: str
    age: int
    city: str

In the above class, we haven’t written code for __init__, __repr__ and __eq__ functions. Instead of that, we used @dataclass a decorator. Now let’s test the hidden magic functions in the above code.

p1 = Person('John', 26, 'Delhi')
p2 = Person('John', 26, 'Delhi')
print(p1.name)
# output will be: John
print(p1)
# output will be: Person(name='John', age=26, city='Delhi')
p1 == p2
# output will be: True

If you don’t know the uses of __init__, __repr__ and __eq__ magic functions, then let me explain them one by one.

__init__

It is one of the important magic methods that always resides in every classes. With the help of __init__ we can assign values while we create instances of that class. It is just like a constructor in other languages. We just have to pass values in the form of parameters with class names. As in the above code, we are passing name, age, and city with class name Person and storing in a variable p1 and p2.So if we print the value of name (name of person) from the variable p1 like p1.name, it will return John.

class Person:
    def __init__(self, name, age, city):
        self.name = name
        self.age = age
        self.city = city
p1 = Person('John', 26, 'Delhi')
print(p1.name)
# output will be: John

__repr__

The string representation of a class can be done with the help of __repr__ the magic function of that class. Let’s see if we haven’t overridden this default magic function and we called the instance of the class in the above example.

print(p1)
<__main__.Person at 0x8d7f58f785f48>

But we want to get this something like Person('John', 26, 'Delhi'). For this, we have to override the default __repr__ function and return a string as per our requirement. So to achieve the above we have to write like this

...
...
def __repr__(self):
        return "Person(name={}, age={}, city={})".format(self.name, self.age, self.city)
p = Person('John', 26, 'Delhi')
print(p)
# output will be: Person(name='John', age=26, city='Delhi')

After defining the above if we print the variable directly, it will return full details.

__eq__

To check the equality of two objects of the same class we use __eq__ the magic function of that class. This function is used to return True or False. If we haven’t overridden this function, it will return False in the above example. Because by default == will check the reference address in the memory for both objects and those will be different. So if we want to check the values of objects then we have to override this magic function in our class. For example:

....
....
def __eq__(self, other):
        return (self.name, self.age, self.city) == (other.name, other.age, other.city)
p1 = Person('John', 26, 'Delhi')
p2 = Person('John', 26, 'Delhi')
print(p1 == p2)
# The output will be: True

Parameters of Dataclass

dataclass(*, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False, match_args=True, kw_only=False, slots=False, weakref_slot=False)

For init, repr and eq we already discussed. Here we can pass value False also while calling the decorator @dataclass. Have a look if we pass repr=False

from dataclasses import dataclass
@dataclass(repr=False)
class Person:
    name: str
    age: int
    city: str
p = Person('John', 28, 'Delhi')
print(p)
# Output will be: <__main__.Person at 0x8d7f58f785f48>

Similarly, if we want __eq__ or __ne__ to work as default then we can pass this as False.

Order

By default, the order is False, which means we can’t do comparisons between objects of the same class. If we try to perform such comparisons, it will raise TypeError as an exception. For example

p1 = Person('John', 28, 'Delhi')
p2 = Person('John', 26, 'Delhi')
p1 > p2
# TypeError exception will raised here.

But if we want to provide such comparisons then simply pass True with the decorator @dataclass

from dataclasses import dataclass
@dataclass(order=True)
class Person:
    name: str
    age: int
    city: str
p1 = Person('John', 28, 'Delhi')
p1 = Person('John', 26, 'Delhi')
p1 > p2
# Output will be True

In the above case, the output will be true. It is used to compare each value name, age and city one by one. So while comparing age, the age of p1 is greater than p2 so it will return True. The functions are:

  • __lt__ (<)
  • __le__ (<=)
  • __gt__ (>)
  • __ge__ (>=)

Frozen

The default value is False for frozen. To make all properties immutable so that the values of variables can’t be changed once assigned, we use to pass True for frozen. Let’s have an example:

from dataclasses import dataclass
@dataclass(frozen=True)
class Person:
    name: str
    age: int
    city: str
p = Person('John', 28, 'Delhi')
p.name = 'Something else'
# FrozenInstanceError: Cannot assign to field 'name'

unsafe_hash

Generally, we can generate a hash value of immutable objects in python or we can say if an object value can not be changed then only we can generate a hash of that object.

So dataclass with frozen True will make this immutable and we can generate the hash of that object. For example:

from dataclasses import dataclass
@dataclass(frozen=True)
class Person:
    name: str
    age: int
    city: str
p = Person('John', 28, 'Delhi')
hash(p)
# Output will be 785715478993145

But if we want to change the values and also want to generate a hash of the above object, then we have to pass unsafe_hash as True. So if we pass unsafe_hash as True within decorator @dataclass, we will be able to generate a hash and also able to change values.

match_args

The default is True. A tuple will be created from a given list of parameters to the generated __init__ function.  (even if __init__  is not generated). If we make this to False, or if __match_args__ is already defined in the class, then __match_args__ will not be generated.

kw_only

The default value is False. If we make this True, then all fields will become keyword-only and we have to specify a keyword when __init__ is called.

slots

The default value id False. If we make this True, then __slot__ attribute will be generated and a new class will be returned instead of the original one. If already defined in class then we will get a TypeError exception.

What is Fields in Dataclasses?

The properties of the class are managed by the Data class itself. The field() function allows us to customize various aspects of how attributes behave within the data class or we can fine-tune the behavior of individual attributes within our data class, making it more flexible and adaptable to our specific use cases. Here is the signature of field the function defined in the data class:

field(*, default=MISSING, default_factory=MISSING, init=True, repr=True, hash=None, compare=True, metadata=None, kw_only=MISSING)

Note: Every property of the data class has an inbuilt variable called ‘__dataclass_fields__’, which provides us with details about the field which the data class is managing.

from dataclasses import dataclass
@dataclass(frozen=True)
class Person:
    name: str
    age: int
    city: str
p = Person('John', 28, 'Delhi')
print(p.__dataclass_fields__)
# output: 
{'name': Field(
  name='name',
  type=<class 'str'>,
  default=<dataclasses._MISSING_TYPE object at 0x000001D0EE7EAF20>,
  default_factory=<dataclasses._MISSING_TYPE object at 0x000001D0EE7EAF20>,
  init=True,
  repr=True,
  hash=None,
  compare=True,
  metadata=mappingproxy({}),
  kw_only=False,
  _field_type=_FIELD), 
 
 'age': Field(
   name='age',
   type=<class 'int'>,
   default=<dataclasses._MISSING_TYPE object at 0x000001D0EE7EAF20>,
   default_factory=<dataclasses._MISSING_TYPE object at 0x000001D0EE7EAF20>,
   init=True,
   repr=True,
   hash=None,
   compare=True,
   metadata=mappingproxy({}),
   kw_only=False,
   _field_type=_FIELD), 
 
 'city': Field(
   name='city',
   type=<class 'str'>,
   default=<dataclasses._MISSING_TYPE object at 0x000001D0EE7EAF20>,
   default_factory=<dataclasses._MISSING_TYPE object at 0x000001D0EE7EAF20>,
   init=True,
   repr=True,
   hash=None,
   compare=True,
   metadata=mappingproxy({}),
   kw_only=False,
   _field_type=_FIELD)
}

To get details of the field for property age, we just have to write p.__dataclass_fields__[‘age’]

Field(
  name='age',
  type=<class 'int'>,
  default=<dataclasses._MISSING_TYPE object at 0x0000015ED71EAF50>,
  default_factory=<dataclasses._MISSING_TYPE object at 0x0000015ED71EAF50>,
  init=True,
  repr=True
  ,hash=None,
  compare=True,
  metadata=mappingproxy({}),
  kw_only=False,
  _field_type=_FIELD
)

Let’s understand in detail of all the properties of the field

default

default property of the field function is used to provide a default value to the property of the data class. For example, if we wish to provide a default value of age in our above code. In that case, if we don’t provide age value during the creation of objects from that data class then, the already defined default value will be assigned to that data class property.

from dataclasses import dataclass, field
@dataclass(frozen=True)
class Person:
    name: str
    city: str
    age: int = field(default=26)
p = Person('John', 'Delhi', 28)
print(p)
# output: Person(name='John', city='Delhi', age=28)
p1 = Person('John', 'Delhi')
print(p1)
# output: Person(name='John', city='Delhi', age=26)

In p1 we haven’t defined the value of age, however, we are getting 26 as output. This is because we defined a default value (26) to the age property of our data class. We can also do the same by simply writing age: int = 26 within the data class. Please note that the property with a default field should be the last property in the data class.

default_factory

It is also used to assign a default value to data class attributes like default, but it uses the return value of a function instead of directly assigning the values. For example

from dataclasses import dataclass, field
def default_price():
    return {"USD": 0.0}
@dataclass
class Product:
    name: str
    price: dict = field(default_factory=default_price)
    quantity: int = field(default=1)
product1 = Product("Widget")
product2 = Product("Gadget", quantity=3)
print(product1)  # Output: Product(name='Widget', price={'USD': 0.0}, quantity=1)
print(product2)  # Output: Product(name='Gadget', price={'USD': 0.0}, quantity=3)

In this example, the price attribute is assigned a dictionary with a default value generated by the default_price function. Each instance of the Product the class gets its own independent dictionary for the price attribute.

Using default_factory helps ensure that each instance of the data class has its own distinct default value for mutable attributes, avoiding potential sharing of references and unwanted side effects. Note: The calling function should not contain any parameters.

init

The init parameter allows us to control whether an attribute is part of the initial construction process or not. The init parameter is an option we can use with the field() function to control whether an attribute should be included in the automatically generated __init__ method of the data class. It accepts a Boolean value and the default is True. For example:

from dataclasses import dataclass, field
@dataclass
class Person:
    name: str
    city: str = field(init=False, default='Delhi')
    age: int
p = Person('John', 25)
print(p)

In this example, the city attribute is set with init=False, so it’s not required to pass city value while creating an object of this data class Person. We created an object p here and provided only name and age. It’s totally fine, but in addition to that, we also gave a default value of ‘Delhi’ for the city. So while printing the object we to get all the details.

repr

repr is used to control whether an attribute should be included when generating the string representation (usually using the __repr__ method) of an instance. The __repr__ method is responsible for providing a human-readable string that represents the state of an object, which is often used for debugging and display purposes. The default value of repr is True. Example:

from dataclasses import dataclass, field
@dataclass
class Person:
    name: str
    city: str = field(repr=False)
    age: int
p = Person('John', 'Delhi', 25)
print(p)
# output: Person(name='John', age=25)

In this example, the city attribute is set with repr=False, so it’s not included in the automatically generated __repr__ method. As a result, when you print instances of the Person class, the city attribute is omitted from the output.

hash

If we don’t want a property of the data class object included while calculating the hash value of an object, we can simply set the property to False. Using the hash parameter allows us to control which attributes contribute to the hash value of instances, which can affect their behavior when used in hash-based data structures. However, be cautious when excluding attributes from hashing, as it might lead to unexpected behavior if instances are used as keys in dictionaries or elements in sets. Example:

from dataclasses import dataclass, field
@dataclass(unsafe_hash=True)
class Person:
    name: str
    city: str = field(init=False, default='Delhi', repr=False)
    age: int
p = Person('John', 28)
print(hash(p))
# Output: -3395593575601423064
# If we add hash=False in city field like: field(init=False, default='Delhi', repr=False, hash=False)
# The output will be: 8381521673055019895

compare

The compare parameter allows us to control which data class attributes participate in the comparison methods of instances, which can be useful for customizing how instances are considered equal or ordered based on our specific use case. The default value is True. Example:

from dataclasses import dataclass, field
@dataclass
class Person:
    name: str
    city: str
    age: int = field(compare=False)
p1 = Person('John', 'Delhi', 25)
p2 = Person('John', 'Delhi', 30)
print(p1==p2)
# output will be True

The instances p1 and p2 has different ages, but while comparing ==, it’s returning True. This is because we are not taking an age for comparing objects.

metadata

Metadata is extra information that we can associate with an attribute of a data class, which can be useful for various purposes such as documentation, annotations, or custom logic.

The metadata parameter accepts a dictionary containing key-value pairs, where the keys are strings (often representing a specific purpose) and the values can be of any type. The metadata is not used directly by the data class itself but can be accessed programmatically to provide additional context or behavior.

Here’s an example of how to use the metadata parameter in a data class:

from dataclasses import dataclass, field
@dataclass
class Employee:
    name: str
    age: int = field(metadata={"unit": "years"})
    department: str = field(metadata={"description": "Department where the employee works"})
employee1 = Employee("Alice", age=30, department="Engineering")
print(employee1.age)  
# Output: 30
print(employee1.__dataclass_fields__["age"].metadata["unit"])  
# Output: years
print(employee1.__dataclass_fields__["department"].metadata["description"])  
# Output: Department where the employee works

kw_only

 If True, this field will be marked as keyword-only. This is used when the generated __init__ method’s parameters are computed. The default value is False

What is Post-init processing?

Post-init processing in Python data classes refers to the ability to perform additional operations or computations on the attributes of an instance after it has been initialized using the automatically generated __init__ method. This feature allows us to define custom behavior that should occur immediately after an instance is created.

To implement post-init processing in a data class, we can define a special method (magic method) named __post_init__. This method will be called automatically by the __init__ method after all attributes have been initialized. For example:

from dataclasses import dataclass, field
@dataclass
class Person:
    name: str
    city: str
    age: int
    is_adult: bool = field(init=False)
    def __post_init__(self):
        if self.age &gt;= 18:
            self.is_adult = True
        else:
            self.is_adult = False
p1 = Person('John', 'Delhi', 25)
p2 = Person('Sem', 'Delhi', 16)
print(p1.is_adult)
print(p2.is_adult)
# output:
# True
# False

In the above example, we have a data class attribute called is_adult, and the value of this attribute is set just after the initialization of objects. If __post_init__ we are assigning the value True or False based on the age value. One more thing we did in the above code is to make the is_adult not part of __init__ so that we don’t have to provide the value of this property while creating instances.

Inheritance

This allows us to reuse and extend the attributes and methods defined in the parent data class, while also adding new attributes or overriding existing ones as needed. It looks through all of the class’s base classes in reverse MRO (Method Resolution Order). For example:

from dataclasses import dataclass
@dataclass
class Person:
    name: str
    age: int
@dataclass
class Employee(Person):
    employee_id: int
    department: str
# Creating instances
person = Person("Alice", 30)
employee = Employee("Bob", 25, employee_id=12345, department="Engineering")
print(person)  
# Output: Person(name='Alice', age=30)
print(employee)  
# Output: Employee(name='Bob', age=25, employee_id=12345, department='Engineering')

In the above example, the Employee data class inherits attributes of Person in the data class. So finally the instance of Employee will have four attributes, its own attributes (employee_id, department) and attributes of inheriting data class Person attributes (name, age). The Employee class will have the following attributes in this order:

Employee(name, age, employee_id, department)
# first the attributes of base data class will come and then attributes of own data class. 

We can override the attribute’s value of the base class by assigning the same attributes and values in the child data class. For example:

from dataclasses import dataclass
@dataclass
class Base:
    x: Any = 10
    y: int = 20
@dataclass
class C(Base):
    z: int = 30
    x: int = 40
# The generated __init__() method for C will look like:
# def __init__(self, x: int = 40, y: int = 20, z: int = 30):

In the above example, the value of x will be 40, because the child data class is overriding the attribute value (x) of the parent data class.

Convert data class properties into a dictionary or tuple

Suppose we want the values of all the properties of the data class in the form of a data structure like a dictionary or tuple to convert this into json and use it in API services. Let’s have a complex properties structure of data class.

from dataclasses import dataclass, asdict, astuple
@dataclass
class Address:
    Flat_no: str
    colony: str
    city: str
    dist: str
    pin: int
@dataclass
class Person:
    name: str
    age: str
    address: Address
add = Address('O2', 'Netaji Nagar', 'Delhi', 'Delhi', 110011)
p = Person('John', 28, add)
print(asdict(p))
print(astuple(p))
# Output:
# {
#    'name': 'John', 
#   'age': 28, 
#    'address': {
#       'Flat_no': 'O2', 
#       'colony': 'Netaji Nagar', 
#        'city': 'Delhi', 
#       'dist': 'Delhi', 
#      'pin': 110011
#   }
#}
# ('John', 28, ('O2', 'Netaji Nagar', 'Delhi', 'Delhi', 110011))

FAQ

Why use DataClass in Python?

Use DataClasses in Python to easily create classes that hold data without writing a lot of extra code. They make your code cleaner, and more readable and help prevent mistakes. DataClasses are especially useful when you want to represent simple structures like a point in space.

In Python, you use data classes to create classes that primarily store data. They offer conciseness, readability, immutability, default values, and automatic comparison methods, making your code cleaner and more maintainable. You can define data classes using the dataclasses module introduced in Python 3.7.

What are data classes?

Data classes in Python are a way to create simple classes that mainly hold data. They make it easy to define and work with objects that store information without writing a lot of extra code. Data classes are designed to be concise and help improve the readability of your code.
For example:

from dataclasses import dataclass
@dataclass
class Point:
x: int
y: int
Usage
p1 = Point(1, 2)
print(p1.x) # Output: 1
print(p1.y) # Output: 2

In this example, we create a data class Point using the @dataclass decorator. It automatically generates the __init__ method for us, so we can create Point objects with x and y attributes easily.

What is dataclasses library?

The dataclasses library is not a standalone library but rather a module that’s part of the Python standard library. It provides a convenient way to create classes that are primarily used for storing and managing data, hence the name “data classes.”
The module includes a decorator (@dataclass) that simplifies the creation of such classes by automatically generating common special methods like __init__, __repr__, __eq__, and __hash__.
You can use the dataclasses module to define data classes, making your code more concise and readable when dealing with objects that primarily represent data structures.

What is the difference between class and Dataclass in Python?

The main difference between a regular class and a data class in Python is the amount of boilerplate code you need to write and the intended use. Data classes are designed for simplicity and ease of use when dealing with data storage, whereas regular classes offer more flexibility but require more manual implementation for common methods.

In Python, both regular classes and data classes can be used to define custom data structures, but there are differences in how you define and work with them:

Boilerplate Code:
Regular Class: When you create a regular class in Python, you need to write explicit code for methods like __init__, __repr__, __eq__, and __hash__ if you want to use them. This can result in a lot of boilerplate code.
– Data Class: Data classes, defined using the @dataclass decorator from the dataclasses module, automatically generates these special methods for you, reducing the amount of boilerplate code you need to write.

Intent:
– Regular Class: Regular classes can be used for various purposes, including storing data, encapsulating behavior (methods), and more. They are more flexible and can be used for a wide range of scenarios.
– Data Class: Data classes are specifically designed for storing data. They are intended to be simple containers for data attributes and are optimized for that purpose.

Mutability:
– Regular Class: Regular classes can be designed as mutable (attributes can change) or immutable (attributes cannot change), depending on how they are implemented.
– Data Class: Data classes are typically designed as immutable by default. Once you create an instance, you cannot change its attributes. This immutability can help prevent unintentional data modification.

Readability and Conciseness:
– Regular Class: Regular classes can be more verbose, especially when you have to write all the special methods yourself.
– Data Class: Data classes are concise and explicitly indicate that their primary purpose is data storage, making the code more readable.

Usage:
– Regular Class: Use regular classes when you need more control, custom behavior, or complex logic within your class.
– Data Class: Use data classes when your main goal is to store and manipulate data straightforwardly with minimal code.

Here is an example to show the difference:

# Regular Class
class Point:
def init(self, x, y):
self.x = x
self.y = y
def __eq__(self, other): return self.x == other.x and self.y == other.y

# Data Class
from dataclasses import dataclass
@dataclass
class DataPoint:
x: int
y: int
Usage
p1 = Point(1, 2)
p2 = DataPoint(1, 2)
print(p1 == p2) # False (Different classes, custom eq needed for DataPoint)


Conclusion

So in this article, we come to know how to use Python data classes in 2023. These features collectively make Python data classes a powerful tool for creating structured and manageable data containers with minimal effort. The data classes are used to save our repetitive code and provide better readability. For more information, you can visit python documentation.

Feel free to write your comment, if any point you are not able to understand. We will be happy to answer your queries.

back

Dharmendra is a blogger, author, Expert in IT Services and admin of DJTechnews. Good experience in software development. Love to write articles to share knowledge and experience with others. He has deep knowledge of multiple technologies and is always ready to explore new research and developments.

Leave a Comment

Stay Connected with us