cmuclmtk ☆

« Back to VersTracker

Description:
Language model tools (from CMU Sphinx)

Type: Formula | Latest Version: 0.7@0 | Tracked Since: Dec 17, 2025

Links: Homepage | formulae.brew.sh

Category: Ai ml

Tags: nlp speech-recognition language-models cmu-sphinx text-processing

Install: brew install cmuclmtk

About:
CMUCLMTK is a suite of command-line utilities for building and evaluating statistical language models from text corpora. It provides tools like text2wfreq, wfreq2vocab, and build-lm to process text, generate frequency vocabularies, and compile ARPA-format models. This toolkit is essential for creating custom language models for speech recognition and NLP applications.

Key Features:

Text corpus processing and frequency analysis
Vocabulary generation and pruning
ARPA format language model compilation
N-gram model evaluation and perplexity calculation

Use Cases:

Building custom language models for CMU Sphinx speech recognition systems
Creating domain-specific language models for NLP applications
Analyzing text corpus statistics and vocabulary distributions

Alternatives:

KenLM – KenLM is faster and more memory-efficient for large-scale models, while CMUCLMTK provides a simpler toolset integrated with the Sphinx ecosystem
SRILM – SRILM offers more advanced smoothing techniques but is proprietary/non-free for commercial use, unlike the BSD-licensed CMUCLMTK

License: BSD-2-Clause

Bottles available for: arm64_tahoe, arm64_sequoia, arm64_sonoma, arm64_ventura, arm64_monterey, arm64_big_sur, sonoma, ventura, monterey, big_sur, catalina, arm64_linux, x86_64_linux

Version History

Detected	Change	Commit
Oct 15, 2025 5:02pm	VERSION_BUMP	20700161
Nov 17, 2024 8:37pm	VERSION_BUMP	d39265d8
Sep 11, 2024 1:32pm	VERSION_BUMP	7db97629
Mar 18, 2023 3:36am	VERSION_BUMP	d34c41c2