Hashtag Counter Python

February 02, 2024 Post a Comment

I am a python beginner. For an exercise, I have to write a python function that will scan a list of strings, counts how many times a hashtag appears and puts this into a dictionary

Solution 1:

Fundamentally, your function doesn’t work because this line

hash_index = post_string.find(char)

Will always find the index of the first hash tag in the string. This could be fixed by providing a start index to str.find, or, better, by not calling str.find at all and instead maintaining the index when iterating over the string (you can use enumerate for that). Better yet, don’t use an index, you don’t need it if you restructure your parser to use a state machine.

That said, a Pythonic implementation would replace the whole function with a regular expression, which would make it drastically shorter, correct, more readable, and likely more efficient.

Solution 2:

This should work:

import string
alpha = string.ascii_letters + string.digits

defanalyze(posts):
    hashtag_dict = {}

    for post in posts:
        for i in post.split():
            if i[0] == '#':
                current_hashtag = sanitize(i[1:])

                iflen(current_hashtag) > 0:
                    if current_hashtag in hashtag_dict:
                        hashtag_dict[current_hashtag] += 1else:
                        hashtag_dict[current_hashtag] = 1return hashtag_dict


defsanitize(s):
    s2 = ''for i in s:
        if i in alpha:
            s2 += i
        else:
            breakreturn s2


posts = [
        "hi #weekend",
        "good morning #zurich #limmat",
        "spend my #weekend in #zurich",
        "#zurich <3",
        "#lindehof4Ever(lol)"
        ]

print(analyze(posts))

Solution 3:

With your help, I managed to get 2.75 points out of 4. Thanks a lot! I didn't copy-paste any of your solutions into the correction tool, I used my own version that I tried to improve with your suggestions. (I am sure if I posted any of your solutions I would've gotten 4/4.)

According to them, the official solution would have been:

defanalyze(posts):
tags = {}

for post in posts:
    curHashtag = Nonefor c in post:
        is_allowed_char = c.isalnum()

        if curHashtag != Noneandnot is_allowed_char:
            iflen(curHashtag) > 0andnot curHashtag[0].isdigit():
                if curHashtag in tags.keys():
                    tags[curHashtag] += 1else:
                    tags[curHashtag] = 1
            curHashtag = Noneif c == "#":
            curHashtag = ""continueif c.isalnum() and curHashtag != None:
            curHashtag += c

    if curHashtag != None:
        iflen(curHashtag) > 0andnot curHashtag[0].isdigit():
            if curHashtag in tags.keys():
                tags[curHashtag] += 1else:
                tags[curHashtag] = 1return tags

This is of course not an elegant solution, but a solution using exclusively what we have learned so far. Maybe this helps another beginner, who wants to use the tools they have to solve this exercise.

Solution 4:

Well,

this task can be done with regexes, don't be afraid to use them ;) Some quick solution.

#!/usr/bin/python3.4import re

PATTERN = re.compile(r'#(\w+)')
posts = [
    "hi #weekend",
    "good morning #zurich #limmat",
    "spend my #weekend in #zurich",
    "#zurich <3"]

container = {}
for post in posts:
    for element in PATTERN.findall(elements):
        container[element] = container.get(element, 0) + 1print(container)

Result:

{'zurich': 3, 'limmat': 1, 'weekend': 2}

EDIT

I would like to use here Counter from collections aswell.

#!/usr/bin/python3.4import re
from collections import Counter

PATTERN = re.compile(r'#(\w+)')
posts = [
    "hi #weekend",
    "good morning #zurich #limmat",
    "spend my #weekend in #zurich",
    "#zurich <3"]

words = [word for post in posts for word in PATTERN.findall(post)]

counted = Counter(words)
print(counted)

# Result: Counter({'zurich': 3, 'weekend': 2, 'limmat': 1})

Python Channel