html5lib - Standards-compliant library for parsing and serializing HTML documents and fragments in Python
html5lib is a pure-python library for parsing HTML. It is designed to conform to the WHATWG HTML specification, as is implemented by all major web browsers.By default, the document will be an xml.etree element instance. Whenever possible, html5lib chooses the accelerated ElementTree implementation (i.e. xml.etree.cElementTree on Python 2.x). Two other tree types are supported: xml.dom.minidom and lxml.etree.
https://github.com/html5lib/html5lib-python
License:
Tech:
Tags: