Skip to content Skip to sidebar Skip to footer

BS4 Breaks HTML Trying To Repair It

BS4 corrects faulty html. Usually this is not a problem. I tried parsing, altering and saving the html of this page: ulisses-regelwiki.de/index.php/sonderfertigkeiten.html In this

Solution 1:

Try this lib.

from simplified_scrapy import SimplifiedDoc

html = '''
<!DOCTYPE html>
<center>
Some Test content
<!-- A comment -->
<center>
'''
doc = SimplifiedDoc(html)
print (doc.html)

Here are more examples: https://github.com/yiyedata/simplified-scrapy-demo/tree/master/doc_examples


Post a Comment for "BS4 Breaks HTML Trying To Repair It"