text-embeddings-inference ☆

« Back to VersTracker

Description:
Blazing fast inference solution for text embeddings models

Type: Formula | Latest Version: 1.8.3@0 | Tracked Since: Oct 30, 2025

Links: Homepage | @huggingface | formulae.brew.sh

Category: Ai ml

Tags: ai machine-learning nlp inference embeddings huggingface

Install: brew install text-embeddings-inference

About:
Text Embeddings Inference (TEI) is a toolkit for deploying and serving dense feature extraction models. It leverages optimized Rust and CUDA kernels to maximize throughput and minimize latency for popular transformer architectures. The solution supports popular models like BERT and includes features like token-based dynamic batching.

Key Features:

Optimized Rust/CUDA kernels for high performance
Support for popular open-source models (BERT, BGE, etc.)
Token-based dynamic batching
Easy-to-use API for integration

Use Cases:

Production deployment of embedding models for semantic search
Building RAG (Retrieval-Augmented Generation) pipelines
Real-time feature extraction for NLP applications

Alternatives:

FastAPI + Transformers – TEI offers significantly higher throughput and lower latency out-of-the-box compared to a standard Python implementation.
vLLM – vLLM is optimized for LLM text generation, whereas TEI is specifically optimized for feature extraction and embedding tasks.

Version History

Detected	Version	Rev	Change	Commit
Oct 30, 2025 11:29am		0	VERSION_BUMP	e57b3df9
Sep 13, 2025 5:30am		0	VERSION_BUMP	e3368c12