Clean Url With Beautifulsoup
My script import BeautifulSoup as bs from BeautifulSoup import BeautifulSoup url_list = sys.argv[1] urls = [tag['href'] for tag in BeautifulSoup(open(url_list)).findAll('a')
Solution 1:
But this is completely basic Python. You're getting a list, and you want to output it one URL per line.
for url in urls:
print url
Solution 2:
It pretty much is returning that. What you see is simply a list of url strings, encoded as unicode strings (that's why there's a u
in front of them).
If you simply want to print these urls nicely, Python has a module for pretty printing that can be used as follows:
from pprint import pprint
pprint(my_list_of_urls)
However, this won't print them line by line. To do that you'll need to use:
for url in my_list_of_urls:
print url
Edit:
I just tried the pretty print module on a list of unicode strings, and I don't think it actually does anything different with the 'u' in front of the strings. I'm leaving it in because it can do wonders with representing long data structures.
Post a Comment for "Clean Url With Beautifulsoup"