Skip to content Skip to sidebar Skip to footer

Beautifulsoup Removing Tags

I'm trying to remove the style tags and their contents from the source, but it's not working, no errors just simply doesn't decompose. This is what I have: source = BeautifulSoup(o

Solution 1:

The following code does what you want and works fine; do not use blanket except handling to mask bugs:

source = BeautifulSoup(open("page.html"))
for hidden in source.body.find_all(style='display:none'):
    hidden.decompose()

or better still, use a regular expression to cast the net a little wider:

import re

source = BeautifulSoup(open("page.html"))
for hidden in source.body.find_all(style=re.compile(r'display:\s*none')):
    hidden.decompose()

Tag.children only lists direct children of the body tag, not all nested children.

Post a Comment for "Beautifulsoup Removing Tags"