Skip to content Skip to sidebar Skip to footer

Clean Url With Beautifulsoup

My script import BeautifulSoup as bs from BeautifulSoup import BeautifulSoup url_list = sys.argv[1] urls = [tag['href'] for tag in BeautifulSoup(open(url_list)).findAll('a')

Solution 1:

But this is completely basic Python. You're getting a list, and you want to output it one URL per line.

for url in urls:
    print url

Solution 2:

It pretty much is returning that. What you see is simply a list of url strings, encoded as unicode strings (that's why there's a u in front of them).

If you simply want to print these urls nicely, Python has a module for pretty printing that can be used as follows:

from pprint import pprint

pprint(my_list_of_urls)

However, this won't print them line by line. To do that you'll need to use:

for url in my_list_of_urls:
    print url

Edit:

I just tried the pretty print module on a list of unicode strings, and I don't think it actually does anything different with the 'u' in front of the strings. I'm leaving it in because it can do wonders with representing long data structures.

Post a Comment for "Clean Url With Beautifulsoup"