• Variations of Data Classes

data classes in Python

Hi guys,

Python 3.7 alpha was recently released with the new DataClass functionality. It has reminded me of the fact that I owe you the third article of Python data type series (check out our previous articles Python dicts and Python arrays). So I've decided to lay it all out in this publication. I'm going to show you what kinds of data classes we had in Python 2 and how things have changed over time.

The idea behind the data class usage is to make your code more readable, self-documented and ensure that the correct member of a tuple is accessed, which is quite easy.

collections.namedtuple

The first data class is called the namedtuple. This object type that was added to Python 2.6+. Namedtuples are lightweight and immutable as opposed to dictionaries which they resemble in some way. After creating a specific namedtuple you cannot longer change it but adding new fields or else.

>>> from collections import namedtuple 
>>> Plane = namedtuple('Plane' , 'length height color') 
>>> plane1 = Plane(75.30, 39_000, 'white')
>>> plane1
Plane(length=75.3, height=39000, color='white')
>>> plane1.height 
39000

Namedtuples, same as tuples, are immutable, which means you cannot change the object after it was created.

>>> plane1.height = 20.0 
AttributeError: "can't set attribute" 
>>> plane1.chassis = False 
AttributeError: 
"'Plane' object has no attribute 'chassis'" 

You can get access to attributes by using index.

>>> plane1[2]
'white'

The examples of namedtuples' usage are also present in CheckiO. And here are the two of them. The first solution is from veky for the Building Base mission, and the second one is from PositronicLlama for the Open Labyrinth mission.

For more information you can also check out the following articles:

PYTHON TIPS - "Why should you use namedtuple instead of a tuple?"

PyMOTW - "namedtuple"

typing.NamedTuple

NamedTuple is the next object type which is an alternative to the namedtuple. It's used in pretty much the same way and available in Python 3.6. Basically, its main difference is the renewed syntax which reflects mostly in the field type tracking ability and type hints.

>>> from typing import NamedTuple 
class Plane(NamedTuple):
    length: float
    height: int
    color: str
>>> plane1 = Plane(75.30, 39_000, 'white')
>>> plane1
Plane(length=75.3, height=39000, color='white')

>>> plane1 == (75.30, 39_000, 'white')
True

>>> plane1.height
39000
>>> plane1.height = 32.0 
AttributeError: "can't set attribute" 
>>> plane1.chassis = 'disabled' 
AttributeError: 
"'Plane' object has no attribute 'chassis'" 

Type annotations are not enforced without a separate type checking tool like mypy:

>>> Plane(75.30, 'NOT FLOAT', 'white') 
Plane(length=75.3, height='NOT FLOAT', color='white')

This object type is not too popular among CheckiO players, but also has its place. Therefore you can go through the solutions from zoido for the Pawn Brotherhood and the Roman Numerals missions.

To get more detailed information, you can browse through these resources:

"Modern Python Cookbook" by Steven F. Lott

"New interesting data structures in Python 3" by Topper_123

"Python Type Checking Guide Documentation" by Chad Dombrova

types.SimpleNamespace

SimpleNamespace is the class that can provide you with another great way of data object implementation which is very unfussy and nicely represented. It gives attribute access to its namespace also allowing to change, add or delete the attributes as much as you'd like.

SimpleNamespace is a builtin for Python 3.3. It grants an excellent flexibility and gives the opportunity to use properties instead of index keys with the dotted attribute notation.

>>> from types import SimpleNamespace 
>>> plane1 = SimpleNamespace(length=75.30, 
... height=39_000, 
... color='white')
>>> plane1
namespace(color='white', length=75.30, height=39_000) 
>>> plane1.length = 69
>>> plane1.chassis = 'disabled' 
>>> del plane1.height 
>>> plane1 
namespace(color='white', length=69, chassis='disabled') 

SimpleNamespace is not very broadly used in CheckiO, but here is the solution from ilpalazzo_sama for the Open Labyrinth mission.

To read more about SimpleNamespace you can follow these links:

"New interesting data structures in Python 3" by Topper_123

Python Documentation - "Dynamic type creation and names for built-in types"

@dataclass

Data Classes were added to Python 3.7 which is already supported in CheckiO. The idea behind them is not too complex, they supposed to support static type checkers and generate the additional methods for the class.

It goes down like this - the @dataclass decorator is finding the typed fields (variables with type annotations) by revising the class definition. When that's done it goes to generating needed methods and adjoining them to the class which is eventually returned by the decorator.

from dataclasses import dataclass
@dataclass
class Plane:
    length: float
    height: float
    color: str = 'white'

These are some methods that decorator will generate and attach.

def __init__(self, length: float, height: float, color: str = 'white') -> None:
    self.length = length
    self.height = height
    self.color = color
def __repr__(self):
    return f'Plane(length={self.length!r}, height={self.height!r}, color={self.color!r})'
def __eq__(self, other):
    if other.__class__ is self.__class__:
        return (self.length, self.height, self.color) == (other.length, other.height, other.color)
    return NotImplemented
def __ne__(self, other):
    ...
def __lt__(self, other):
    ...
def __le__(self, other):
    ...
def __gt__(self, other):
    ...
def __ge__(self, other):
    ...

There aren't any required parameters for the @dataclass decorator usage (although they are possible), as well as it doesn't require parentheses. This is how its signature looks like:

def dataclass(*, init=True, repr=True, eq=True, order=False, hash=None, frozen=False)

Parameters show which method will be generated and how.

- If init is true then __int__ method will be added.

- repr for __repr__.

- eq for _eq_ and _ne_.

- order for _lt_, _le_, _gt_, and _ge_.

You can also make object hashable and/or frozen using hash and frozen attributes.

In most typical cases there is no need for any additional functionality. But there also are the cases when Data Class features need the supplementary per-field information. To deal with that you can put a call to the field() function as a replacement of the default field value.

@dataclass
class C:
    x: int
    y: int = field(repr=False)
    z: int = field(repr=False, default=10)
    t: int = 20

Here the class attributes C.x and C.y won't be set, and the C.z will be equaling 10, while the C.t will be equaling 20. Out of x, y, z and t only y and z will be using for __repr__ method.

There are a lot of things that field() function can customize. You can choose which field will be included in the __init__ method, __repr__, __hash__, __eq__, __gt__ etc. You should use field() function for attributes with mutables objects.

The sole purpose of this article is to give you an understanding of what DataClasses are doing in Python and how you can use this functionality in Python >=3.7. There also are a lot of things we didn't cover, like __post_init__ method or extra functions, such as make_dataclass, is_dataclass, asdict, asdict, replace, but all of those you can check out for yourself in the related PEP.

Conclusion

This is the third and the last article in a series where I wanted to show you all the colors available in the Python data types palette. So next time you need to choose which data type to use with a given data, you can have a better understanding of all of your options.

Welcome to CheckiO - games for coders where you can improve your codings skills.

The main idea behind these games is to give you the opportunity to learn by exchanging experience with the rest of the community. Every day we are trying to find interesting solutions for you to help you become a better coder.

Join the Game