Skip to content Skip to sidebar Skip to footer

Python Portable Hash

I am reading the source code of pyspark here. I got really confused by the portable function here. def portable_hash(x): ''' This function returns consistant hash code for

Solution 1:

"is this hash approach an one to one mapping?"

NO hashing approach is 1-to-1: they all map M possible inputs into N possible integer results where M is much larger than N.

"this function only takes None and tuple and hashable values into consideration, I know that list object is not hashable, did they do that intentionally or not."

Yes, this function delegates to built-in hash everything else except tuples and None. And it's definitely a deliberate design decision in Python (also respected by this function) to make mutable built-ins such as list and dictnot hashable.

"I don't have much experience with hash, is this a very classic hashing approach, if so, is there any resource for me to get a better understanding for it?"

Yes, exclusive-or'ing hash of items, with modification of the running total as one goes through, is indeed a very classic approach to hashing containers of hashable items.

For more study about hashing, I'd start at http://en.wikipedia.org/wiki/Hash_function .

Post a Comment for "Python Portable Hash"