Python Key In Dict.keys() Performance For Large Dictionaries

June 25, 2024 Post a Comment

I was wondering if you guys might be able to give me some advice in regards to making the performance of my code much better. I have a set of for loops which look to see if a key i

Solution 1:

Don't do this:

value.key indict.keys()

That--in Python 2, at least--creates a list containing every key. That gets more and more expensive as the dictionary gets larger, and performs an O(n) search on the list to find the key, which defeats the purpose of using a dict.

Instead, just do:

value.key indict

which doesn't create a temporary list, and does a hash table lookup for the key rather than a linear search.

setdefault, as mentioned elsewhere, is the cleaner way to do this, but it's very important to understand the above.

Solution 2:

Using collections.defaultdict, this can be simplified to

d = collections.defaultdict(list)
for value in value_list:
    d[value.key].append(value.val)

Solution 3:

Step 1: we transform the code using the temp_list into a single expression (I assume temp_list isn't needed outside this code), by using addition instead of the append method. Also, we don't need to use dict.keys() explicitly, as others mentioned (and in fact it wastes a huge amount of time).

for value in value_list:
   if value.key indict:
      dict[value.key] = dict[value.key] + [value.val]
   else:
      dict[value.key] = [value.val]

Step 2: Transform the assignments-to-the-same-location by using the conditional expression syntax.

for value in value_list:
   dict[value.key] = dict[value.key] + [value.val] if value.key indictelse [value.val]

Step 3: Appending or prepending an empty list has no effect on the value of a list, so we can insert that, and then factor out the common 'addition' of the value.

for value in value_list:
   dict[value.key] = (dict[value.key] if value.key indictelse []) + [value.val]

Step 4: Recognize that the dict has built-in functionality for providing a 'default' value when the key is absent:

for value in value_list:
   dict[value.key] = dict.get(value.key, []) + [value.val]

Step 5: Instead of getting a value, modifying it and setting it back, we can use .setdefault to give us the current contents (or set them up if not already there), and then switch back to using .append to modify the list:

for value in value_list:
   dict.setdefault(value.key, []).append(value.val)

(I mean... I could have just looked at it and thought for a bit and arrived at this, but seeing each step makes it clearer where we're going...)

Solution 4:

your_dict.setdefault(value.key, []).append(value.val)

Solution 5:

if value.key indict.keys():

Is very expensive because you're converting to a list of keys and then searching the list. Just replacing that with:

if value.key indict:

Should shorten the search to ~log N (EDIT: I stand corrected by Glenn, probably even faster because the Python dictionaries use a hash table). Then simply:

dict[key].append(value.val)

Should speed things up a bit. Using a temporary is not required and just eats some CPU cycles.

If you can give more details about what you're trying to do someone may be able to suggest a better algorithm.

Python Channel