kytea ☆

« Back to VersTracker

Description:
Toolkit for analyzing text, especially Japanese and Chinese

Type: Formula | Tracked Since: Dec 28, 2025

Links: Homepage | formulae.brew.sh

Category: Ai ml

Tags: nlp japanese chinese text-processing machine-learning tokenization

Install: brew install kytea

About:
Kytea is a statistical machine learning toolkit designed for text segmentation and analysis, particularly for languages without whitespace like Japanese and Chinese. It provides high-accuracy word segmentation, part-of-speech tagging, and dictionary extraction capabilities. The tool is widely used in natural language processing pipelines for its robustness and configurability.

Key Features:

High-accuracy word segmentation for CJK languages
Part-of-speech tagging and morphological analysis
Custom dictionary training and extraction
Efficient C++ implementation with Python bindings

Use Cases:

Preprocessing Japanese or Chinese text for NLP applications
Building custom tokenizers for machine learning models
Extracting vocabulary and morphological patterns from corpora

Alternatives:

MeCab – Kytea offers more modern statistical models and built-in POS tagging, while MeCab is more established but requires separate dictionaries
Jieba – Jieba is Python-specific and easier to use for Chinese, whereas Kytea is language-agnostic and provides more advanced training capabilities

Version History

Detected	Version	Rev	Change	Commit
Sep 13, 2025 7:51am		0	VERSION_BUMP	604b65df
Sep 14, 2024 12:58pm		0	VERSION_BUMP	c2e6ba84