Skip to content Skip to sidebar Skip to footer

Generate Lexicographic Series Efficiently In Python

I want to generate a lexicographic series of numbers such that for each number the sum of digits is a given constant. It is somewhat similar to 'subset sum problem'. For example if

Solution 1:

Very effective algorithm adapted from Jorg Arndt book "Matters Computational"
(Chapter 7.2 Co-lexicographic order for compositions into exactly k parts)

n = 4
k = 3

x = [0] * n
x[0] = k

while True:
    print(x)
    v = x[-1]
    if (k==v ):
        break
    x[-1] = 0
    j = -2
    while (0==x[j]):
        j -= 1
    x[j] -= 1
    x[j+1] = 1 + v

[3, 0, 0, 0]
[2, 1, 0, 0]
[2, 0, 1, 0]
[2, 0, 0, 1]
[1, 2, 0, 0]
[1, 1, 1, 0]
[1, 1, 0, 1]
[1, 0, 2, 0]
[1, 0, 1, 1]
[1, 0, 0, 2]
[0, 3, 0, 0]
[0, 2, 1, 0]
[0, 2, 0, 1]
[0, 1, 2, 0]
[0, 1, 1, 1]
[0, 1, 0, 2]
[0, 0, 3, 0]
[0, 0, 2, 1]
[0, 0, 1, 2]
[0, 0, 0, 3]

Number of compositions and time on seconds for plain Python (perhaps numpy arrays are faster) for n=100, and k = 2,3,4,5 (2.8 ghz Cel-1840)

2  5050 0.040000200271606445
3  171700 0.9900014400482178
4  4421275 20.02204465866089
5  91962520 372.03577995300293
I expect time  2 hours for 100/6 generation

Same with numpy arrays (x = np.zeros((n,), dtype=int)) gives worse results - but perhaps because I don't know how to use them properly

2  5050 0.07999992370605469
3  171700 2.390003204345703
4  4421275 54.74532389640808

Native code (this is Delphi, C/C++ compilers might optimize better) generates 100/6 in 21 seconds

3  171700  0.012
4  4421275  0.125
5  91962520  1.544
6  1609344100 20.748

Cannot go sleep until all measurements aren't done :)

MSVS VC++: 18 seconds! (O2 optimization)

5  91962520 1.466
6  1609344100 18.283

So 100 millions variants per second. A lot of time is wasted for checking of empty cells (because fill ratio is small). Speed described by Arndt is reached on higher k/n ratios and is about 300-500 millions variants per second:

n=25, k=15 25140840660 60.981  400 millions per second

Solution 2:

My recommendations:

  1. Rewrite it as a generator utilizing yield, rather than a loop that concatenates a global variable on each iteration.
  2. Keep a running sum instead of calculating the sum of some subset of the array representation of the number.
  3. Operate on a single instance of your working number representation instead of splicing a copy of it to a temporary variable on each iteration.

Note no particular order is implied.


Solution 3:

I have a better solution using itertools as follows,

from itertools import product
n = 4 #number of elements
s = 3 #sum of elements
r = []
for x in range(n):
    r.append(x)
result = [p for p in product(r, repeat=n) if sum(p) == s]
print(len(result))
print(result)

I am saying this is better because it took 0.1 secs on my system, while your code with numpy took 0.2 secs.

enter image description here

enter link description here

But as far as n=100 and s=6, this code takes time to go through all the combinations, I think it will take days to compute the results.


Solution 4:

I found a solution using itertools as well (Source: https://bugs.python.org/msg144273). Code follows:

import itertools
import operator

def combinations_with_replacement(iterable, r):
    # combinations_with_replacement('ABC', 2) --> AA AB AC BB BC CC
    pool = tuple(iterable)
    n = len(pool)
    if not n and r:
        return
    indices = [0] * r
    yield tuple(pool[i] for i in indices)
    while True:
        for i in reversed(range(r)):
            if indices[i] != n - 1:
                break
        else:
            return
        indices[i:] = [indices[i] + 1] * (r - i)
        yield tuple(pool[i] for i in indices)

int_part = lambda n, k: (tuple(map(c.count, range(k))) for c in combinations_with_replacement(range(k), n))
for item in int_part(3,4): print(item)

Post a Comment for "Generate Lexicographic Series Efficiently In Python"