Post image
Hashable data types and why do we need those

Last Thursday we have done a short (2 hours long) video code review with veky. We haven't done any new one for while, so we had some issues with the sound. The last will be fixed for the next video and hopefully we will start doing those code reviews more regularly.

Veky starts with the review of his mission - Completely Empty. During this he explains a lot of things about hashes in Python for me, so I've decided to leave some notes in a separate blogpost.

First things fist. What is a hash? Hash is like a global index for objects in Python or integer interpretation of any Python object.

>> hash(40)
<<< 40

>>> hash('cio')
<<< -4632795850239615199

>>> hash((1,2,3))
<<< 2528502973977326415

Well, not any object, only immutable objects can have a hash (an exception will be shown in the and of the post). Don't worry if you don't know what is the difference between mutable and immutable objects and why mutable objects can't have hashes. I'll explain it later.

>> hash([1,2])
TypeError: unhashable type: 'list'

>>> hash({‘a’: 1})
TypeError: unhashable type: 'dict'

Why do we need those indexes (hashes) in Python, especially with such restrictions? Python uses hashes in sets and in dict keys in order to quickly find values there and keep values unique. That's why a list or dict can't be an element of set or key of dict (an exception will be shown in the and of the post).

>> {[1,2]}
TypeError: unhashable type: 'list'

>>> {{}:1}
TypeError: unhashable type: 'dict'

This is why values inside the set can't be mutable (an exception will be shown in the and of the post). Mutable means capable of change or of being changed. Here is a classical example that illustrates that the list is a mutable type:

>> a = [1,2]

>>> b = a

>>> b.append(3)

>>> a
<<< [1, 2, 3]

Which is impossible for tuple

>> a = (1,2)

>>> a.append(3)
AttributeError: 'tuple' object has no attribute 'append'

… because tuple is an immutable type, even if we try to trick Python

>> hash(a)
<<< 3713081631934410656

>>> a = (1,2,3, [])

>>> hash(a)
TypeError: unhashable type: 'list'

>>> {a}
TypeError: unhashable type: 'list'

But Python still allows you to check if two lists are equal, right?

>> [1,2,3] == [1,2,3]
<<< True

>>> [1,2,3] == [1,2,2]
<<< False

So why doesn't Python simply check whether two objects are equal or not before adding a new one into a set without caring about the object's mutability. First of all, it is not as fast as using hashes, and the second reason is that the mutable object can be changed after being added into a set.

Let's imagine we can add a list into set

>> a = [1,2]

>>> b = [1]

>>> wow = {a, b}

>>> len(wow)
<<< 2

>>> b.append(2)

>>> a == b
<<< True

>>> len(wow)
???

If a and b are equal then what should be the length of the set wow? If 2, then how is it possible that the set contains not unique elements? If 1, then this is an unexpected behaviour.

That's why Python will raise an exception on line 3: wow = {a, b}

I hope we've given you a very simple explanation of hashable objects and why do we need them.

PS (StefanPochmann): Python is a very flexible language and you can define the method __hash__ for an object so it becomes hashable. In such a way we can get an answer to our previous question what will be the length of wow

>> class List(list):
       def __hash__(self):
           return hash(tuple(self))

>>> a = List([1,2])

>>> b = List([1])

>>> wow = {a,b}

>>> b.append(2)

>>> a == b
<<< True

>>> len(wow)
<<< 2

>>> wow
<<< {[1,2], [1,2]}

>>> len({a, b})
<<< 1

Created: Sept. 19, 2017, 11:19 a.m.
5
40
User avatar
oduvan