htmlcleaner
« Back to VersTracker
Description:
HTML parser written in Java
Type: Formula  |  Tracked Since: Dec 28, 2025
Links: Homepage  |  formulae.brew.sh
Category: Developer tools
Tags: html xml parser java web-scraping
Install: brew install htmlcleaner
About:
Htmlcleaner is a Java-based HTML parser and cleaner designed to parse malformed HTML and convert it into well-formed XML. It provides a flexible API for traversing the DOM tree and offers powerful configuration options for tag balancing and attribute management. This tool is particularly useful for web scraping and preparing HTML for XSLT processing.
Key Features:
  • Parses malformed HTML into valid XML
  • Configurable tag balancing and cleaning rules
  • Java API for DOM traversal and manipulation
  • Outputs clean, indented HTML or XML
Use Cases:
  • Cleaning up messy HTML before XML/XSLT processing
  • Web scraping where source HTML is not well-formed
  • Sanitizing user-submitted HTML content
Alternatives:
  • Jsoup – Jsoup is more modern and implements the HTML5 spec, whereas Htmlcleaner is often used for strict XML conversion.
  • HTML Tidy – HTML Tidy is a C-based library with similar cleaning capabilities, but Htmlcleaner is native to the Java ecosystem.
Version History
Detected Version Rev Change Commit
Sep 27, 2024 7:05pm 1 VERSION_BUMP d5e3720f