kytea
« Back to VersTracker
Description:
Toolkit for analyzing text, especially Japanese and Chinese
Type: Formula  |  Tracked Since: Dec 28, 2025
Links: Homepage  |  formulae.brew.sh
Category: Ai ml
Tags: nlp japanese chinese text-processing machine-learning tokenization
Install: brew install kytea
About:
Kytea is a statistical machine learning toolkit designed for text segmentation and analysis, particularly for languages without whitespace like Japanese and Chinese. It provides high-accuracy word segmentation, part-of-speech tagging, and dictionary extraction capabilities. The tool is widely used in natural language processing pipelines for its robustness and configurability.
Key Features:
  • High-accuracy word segmentation for CJK languages
  • Part-of-speech tagging and morphological analysis
  • Custom dictionary training and extraction
  • Efficient C++ implementation with Python bindings
Use Cases:
  • Preprocessing Japanese or Chinese text for NLP applications
  • Building custom tokenizers for machine learning models
  • Extracting vocabulary and morphological patterns from corpora
Alternatives:
  • MeCab – Kytea offers more modern statistical models and built-in POS tagging, while MeCab is more established but requires separate dictionaries
  • Jieba – Jieba is Python-specific and easier to use for Chinese, whereas Kytea is language-agnostic and provides more advanced training capabilities
Version History
Detected Version Rev Change Commit
Sep 13, 2025 7:51am 0 VERSION_BUMP 604b65df
Sep 14, 2024 12:58pm 0 VERSION_BUMP c2e6ba84