pdfsandwich
« Back to VersTracker
Description:
Generate sandwich OCR PDFs from scanned file
Type: Formula  |  Tracked Since: Dec 28, 2025
Links: Homepage  |  formulae.brew.sh
Category: Productivity
Tags: ocr pdf scanning document-processing tesseract
Install: brew install pdfsandwich
About:
pdfsandwich is a command-line tool designed to add OCR text layers to scanned PDF documents. It acts as a user-friendly wrapper around powerful OCR engines like Tesseract and OCRopus, enabling full-text search and selection within scanned files. This utility is particularly valuable for digitizing archives and improving the accessibility of document collections.
Key Features:
  • Supports multiple OCR backends (Tesseract, OCRopus, Cuneiform)
  • Multi-language OCR capability
  • Automatic blank page detection and removal
  • Output compression to reduce file size
Use Cases:
  • Creating searchable text archives from scanned documents and books
  • Adding OCR layers to image-only PDFs for accessibility and text extraction
  • Processing large batches of scanned documents for digital libraries
Alternatives:
  • OCRmyPDF – A more modern and actively maintained Python-based tool with similar functionality and extensive features.
  • Adobe Acrobat Pro – A commercial GUI application with powerful OCR capabilities but requires a paid license.
Version History
Detected Version Rev Change Commit
Sep 16, 2025 12:33pm 4 VERSION_BUMP 1acd7a8f
Sep 14, 2024 6:47am 4 VERSION_BUMP ac0df270