cmuclmtk
« Back to VersTracker
Description:
Language model tools (from CMU Sphinx)
Type: Formula  |  Latest Version: 0.7@0  |  Tracked Since: Dec 17, 2025
Links: Homepage  |  formulae.brew.sh
Category: Ai ml
Tags: nlp speech-recognition language-models cmu-sphinx text-processing
Install: brew install cmuclmtk
About:
CMUCLMTK is a suite of command-line utilities for building and evaluating statistical language models from text corpora. It provides tools like text2wfreq, wfreq2vocab, and build-lm to process text, generate frequency vocabularies, and compile ARPA-format models. This toolkit is essential for creating custom language models for speech recognition and NLP applications.
Key Features:
  • Text corpus processing and frequency analysis
  • Vocabulary generation and pruning
  • ARPA format language model compilation
  • N-gram model evaluation and perplexity calculation
Use Cases:
  • Building custom language models for CMU Sphinx speech recognition systems
  • Creating domain-specific language models for NLP applications
  • Analyzing text corpus statistics and vocabulary distributions
Alternatives:
  • KenLM – KenLM is faster and more memory-efficient for large-scale models, while CMUCLMTK provides a simpler toolset integrated with the Sphinx ecosystem
  • SRILM – SRILM offers more advanced smoothing techniques but is proprietary/non-free for commercial use, unlike the BSD-licensed CMUCLMTK
License: BSD-2-Clause
Bottles available for: arm64_tahoe, arm64_sequoia, arm64_sonoma, arm64_ventura, arm64_monterey, arm64_big_sur, sonoma, ventura, monterey, big_sur, catalina, arm64_linux, x86_64_linux
Version History
Detected Version Rev Change Commit
Oct 15, 2025 5:02pm 0 VERSION_BUMP 20700161
Nov 17, 2024 8:37pm 0 VERSION_BUMP d39265d8