|
trafilatura
☆
« Back to VersTracker
|
||||||||||||||||||||
|
Description: Discovery, extraction and processing for Web text |
||||||||||||||||||||
| Type: Formula | Latest Version: 2.0.0@4 | Tracked Since: Oct 11, 2025 | ||||||||||||||||||||
| Links: Homepage | formulae.brew.sh | ||||||||||||||||||||
| Category: Developer tools | ||||||||||||||||||||
| Tags: web-scraping data-extraction nlp python text-processing | ||||||||||||||||||||
| Install: brew install trafilatura | ||||||||||||||||||||
|
About: Trafilatura is a Python package and command-line tool for extracting main content and metadata from web pages. It focuses on reliability and efficiency, using heuristics and structural analysis to find text while filtering out boilerplate, ads, and navigation elements. It provides structured output formats like JSON, XML, and CSV, making it ideal for large-scale web data collection and NLP pipelines. |
||||||||||||||||||||
Key Features:
|
||||||||||||||||||||
Use Cases:
|
||||||||||||||||||||
Alternatives:
|
||||||||||||||||||||
| Version History | ||||||||||||||||||||
|