tika ☆

« Back to VersTracker

Description:
Content analysis toolkit

Type: Formula | Tracked Since: Dec 28, 2025

Links: Homepage | @ApacheTika | formulae.brew.sh

Category: Developer tools

Tags: parsing metadata-extraction document-processing search-indexing content-analysis

Install: brew install tika

About:
Apache Tika is a content analysis toolkit that detects and extracts metadata and structured text content from over a thousand different file types. It provides a unified parsing interface, enabling developers to index content for search engines and analyze documents without managing complex format-specific libraries. This makes it essential for building robust document ingestion and data extraction pipelines.

Key Features:

Unified parsing interface for over 1,000 file formats
Automatic content type detection and language identification
Java-based server (Tika Server) with a RESTful API
Integration with Apache Lucene and Solr for search indexing

Use Cases:

Building search indexes for enterprise document management systems
Extracting text and metadata for data analysis and compliance auditing
Automating document processing workflows in content management systems

Alternatives:

Apache NiFi – NiFi is a dataflow automation tool that can use Tika for processing, but Tika is specifically focused on the parsing aspect.
Pandoc – Pandoc is excellent for converting documents between markup formats, whereas Tika focuses on raw content extraction and metadata.

Version History

Detected	Rev	Change	Commit
Sep 16, 2025 4:47am	0	VERSION_BUMP	18cddaae
Sep 13, 2024 7:29pm	1	VERSION_BUMP	96c62085
Oct 21, 2023 1:40pm	0	VERSION_BUMP	6876b200
Feb 4, 2023 10:06am	0	VERSION_BUMP	e1040bd4
Feb 4, 2023 10:06am	0	VERSION_BUMP	b8320983