scws
« Back to VersTracker
Description:
Simple Chinese Word Segmentation
Type: Formula  |  Latest Version: 1.2.3@0  |  Tracked Since: Dec 24, 2025
Links: Homepage  |  GitHub  |  Docs  |  formulae.brew.sh
Stars: 1,680  |  Forks: 353  |  Language: PHP  |  Category: Ai ml
Tags: chinese nlp text-processing segmentation c-library
Install: brew install scws
About:
SCWS (Simple Chinese Word Segmentation) is a C library and command-line tool for performing word segmentation on Chinese text. It uses a rule-based and dictionary-driven approach to split continuous Chinese characters into meaningful words, which is a fundamental preprocessing step in Chinese natural language processing. Its main value is providing a lightweight, efficient, and portable solution for a core linguistic task.
Key Features:
  • Rule-based and dictionary-driven segmentation algorithm
  • Supports custom user dictionaries and rule sets
  • Provides a C library, command-line tool, and language bindings (e.g., PHP)
  • Lightweight, fast, and portable design
  • Handles Chinese character set encodings like GBK and UTF-8
Use Cases:
  • Preprocessing Chinese text for search engine indexing and information retrieval
  • Tokenizing text as a foundational step for NLP tasks like text analysis and machine learning
  • Integrating Chinese word segmentation into web applications (e.g., via its PHP extension)
Alternatives:
  • jieba – jieba is a more modern, actively maintained Python library for Chinese text segmentation, offering additional features like part-of-speech tagging and keyword extraction.
  • Stanford Word Segmenter – A Java-based, statistical NLP tool offering high accuracy but with greater complexity and resource requirements compared to the lightweight SCWS.
Version History
Detected Version Rev Change Commit
Dec 24, 2025 9:39am 1.2.3 0 VERSION_BUMP 3c2764bd
Sep 15, 2025 10:06pm 0 VERSION_BUMP fc5422a5
Sep 14, 2024 5:22pm 0 VERSION_BUMP 69f37640