Python: How To Split A List Based On A Specific Element

October 31, 2022 Post a Comment

If we have the following list in Python sentence = ['I', 'am', 'good', '.', 'I', 'like', 'you', '.', 'we', 'are', 'not', 'friends', '.'] How do I split this to get a list which co

Solution 1:

You can use itertools.groupby:

from itertools import groupby
i = (list(g) for _, g in groupby(sentence, key='.'.__ne__))
print([a + b for a, b in zip(i, i)])

This outputs:

[['I', 'am', 'good', '.'], ['I', 'like', 'you', '.'], ['we', 'are', 'not', 'friends', '.']]

If your list doesn't always end with '.' then you can use itertools.zip_longest instead:

sentence = ["I", "am", "good", ".", "I", "like", "you", ".", "we", "are", "not", "friends"]
i = (list(g) for _, g in groupby(sentence, key='.'.__ne__))
print([a + b for a, b in zip_longest(i, i, fillvalue=[])])

This outputs:

[['I', 'am', 'good', '.'], ['I', 'like', 'you', '.'], ['we', 'are', 'not', 'friends']]

Solution 2:

We can do this in two stages: first calculating the indices where the dots are located, and then making slices, like:

idxs = [i for i, v in enumerate(sentence, 1) if v == '.']   # calculating indices

result = [sentence[i:j] for i, j in zip([0]+idxs, idxs)]    # splitting accordingly

This then yields:

>>> [sentence[i:j] for i, j in zip([0]+idxs, idxs)]
[['I', 'am', 'good', '.'], ['I', 'like', 'you', '.'], ['we', 'are', 'not', 'friends', '.']]

You can then for example print the individual elements with:

for sub in [sentence[i:j] for i, j in zip([0]+idxs, idxs)]:
    print(sub)

This then will print:

>>> idxs = [i for i, v in enumerate(sentence, 1) if v == '.']
>>> for sub in [sentence[i:j] for i, j in zip([0]+idxs, idxs)]:
...     print(sub)
...
['I', 'am', 'good', '.']
['I', 'like', 'you', '.']
['we', 'are', 'not', 'friends', '.']

Solution 3:

sentence = ["I", "am", "good", ".", "I", "like", "you", ".", "we", "are", "not", "friends", "."]

output = []
temp = []
for item in sentence:
    temp.append(item)
    if item == '.':
        output.append(temp)
        temp = []
if temp:
    output.append(temp)

print(output)

Solution 4:

Using a simple iteration.

Demo:

sentence = ["I", "am", "good", ".", "I", "like", "you", ".", "we", "are", "not", "friends", "."]
last = len(sentence) - 1
result = [[]]
for i, v in enumerate(sentence):
    if v == ".":
        result[-1].append(".")
        if i != last:
            result.append([])
    else:
        result[-1].append(v)
print(result)

Output:

[['I', 'am', 'good', '.'], ['I', 'like', 'you', '.'], ['we', 'are', 'not', 'friends', '.']]

Solution 5:

This answer aims to be the simplest one...

The data

sentences = ["I", "am", "good", ".",
            "I", "like", "you", ".",
            "We", "are", "not", "friends", "."]

We initialize the output list and represent that we are starting a new sentence

l, start = [], 1

We loop on the data list, using w to address the current word

if we are at the start of a new sentence we clear the flag and add an empty list to the tail of the output list
we append the current word to the last sublist (note that ① we are guaranteed that there is at least a last sublist (do you like alliterations?) and ② every word gets appended)
if we are at the end — we have met a "." — we raise again the flag.

Note the single comment…

for w in sentences:
    if start: start = l.append([]) # l.append() returns None, that is falsey...
    l[-1].append(w)
    if w == ".": start = 1

Python Channel