Efficiently Find Repeated Characters In A String
Solution 1:
As this is a performance question, let's do some timings:
deftest_set(xs):
seen = set() # O(1) lookupsfor x in xs:
if x notin seen:
seen.add(x)
else:
return x
import collections
deftest_counter(xs):
freq = collections.Counter(xs)
for k in freq:
if freq[k] > 1:
return k
deftest_dict(xs):
d = {}
for x in xs:
if x in d:
return x
d[x] = 1deftest_sort(xs):
ys = sorted(xs)
for n inrange(1, len(xs)):
if ys[n] == ys[n-1]:
return ys[n]
##import sys, timeit
print (sys.version + "\n")
xs = list(range(10000)) + [999]
fns = [p for name, p inglobals().items() if name.startswith('test')]
for fn in fns:
assert fn(xs) == 999print ('%50s %.5f' % (fn, timeit.timeit(lambda: fn(xs), number=100)))
I'm testing on an list of integers rather than a string (because with a string you can't get more than 256 loops). The results on my machine look like this:
3.2.3 (v3.2.3:3d0686d90f55, Apr 10 2012, 11:25:50)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)]
<function test_set at 0x1020f7380> 0.19265
<function test_dict at 0x1020f7490> 0.12725
<function test_sort at 0x1020f7518> 0.04683
<function test_counter at 0x1020f7408> 0.92485
So the sort method appears to be the winner. I guess this is because it doesn't waste time creating hashes and allocating dict/set structures. Also, if you don't care about the source list being changed, you can do xs.sort()
instead of ys = sorted(xs)
, which gives you zero memory footprint.
On the other side, if repeated items are more probable to occur towards the beginning of the input (as in xs = 'abcdef' * 10000
), the set
method will perform the best, as it, unlike sort
or Counter
, returns immediately once a repeat is found and doesn't need to preprocess the whole list. You should also use set
if you need the first repeating element, not just one of them.
Counter
is a nice tool, but it's not designed for performance, so if you really have to deal with "gigantic inputs", go with sets (if they fit in memory) or mergesort if they don't.
Solution 2:
You can use collections
to find repeating characters:
import collections
freq = collections.Counter("abcda")
for k in freq:
if freq[k] > 1:
print k # prints "a"break
If you want only to find if there are repetitions (without finding the repeated characters):
letters = list("abcda")
no_rep = set(letters)
printlen(letters) > len(no_rep) # prints 'True' when there are repeating characters
Solution 3:
def find_repeater(mystr):
seen = set() # O(1) lookupsforcharin mystr:
ifcharnotin seen:
seen.add(char)
else:
print('repetition found')
returnchar
Solution 4:
Here you go, using a dictionary:
def find_repeater(string):
my_list = {}
my_list[string[0]] = 1for i in range (1, len(string)):
ifstring[i] in my_list.keys():
print'repetition found'return (string[i])
else:
my_list[string[i]] = 1print find_repeater('abca')
Personally, though, I would use a set here:
def find_repeater(string):
my_list = set()
my_list.add(string[0])
for i in range (1, len(string)):
ifstring[i] in my_list:
print'repetition found'return (string[i])
else:
my_list.add(string[i])
print find_repeater('abca')
Solution 5:
This should work.
def return_dupe(string):
d = {}
[d.update({k:2}) if k in d else d.update({k:1}) for k in string]
return [key for key in d if d[key] > 1]
Post a Comment for "Efficiently Find Repeated Characters In A String"