AI Wisdom
๐Ÿ”—

Embedding Models

Text and multimodal embedding models for semantic search and retrieval.

Graduated ยท 4Incubating ยท 4Sandbox ยท 210 total
โ† All categories

text-embedding-3-large

Graduated
5/5

OpenAI's most capable embedding model with 3072 dimensions

Industry standard for production embeddings. Matryoshka support lets you trade dimensions for cost. Best MTEB scores among API models. Drop-in for any RAG pipeline.

Proprietary

Cohere Embed v3

Graduated
4/5

Multilingual embedding model optimised for search and RAG

Best multilingual embeddings for search. 100+ languages with compression options. Search and classification input types improve relevance. Strong for global enterprise RAG.

Proprietary

Voyage AI 3

Graduated
4/5

Specialised embeddings for code, legal, finance, and multilingual

Domain-specific variants (code, law, finance) consistently outperform general models. Best code embedding for codebase search. Acquired by Anthropic โ€” expect deep Claude integration.

Proprietary

BGE-M3

Incubating
4/5

BAAI multi-granularity multilingual embedding with dense + sparse

Unique hybrid model supporting dense, sparse, and ColBERT retrieval in one model. 100+ languages, 8K context. Best open-source choice for multilingual RAG systems.

Open Source

E5-Mistral-7B

Incubating
4/5

Large LLM-based embedding model for maximum retrieval quality

LLM-scale embedding model โ€” 7B params delivers top MTEB scores. Task-specific prompting improves quality. Needs GPU but excels where embedding quality is critical.

Open Source

Jina Embeddings v3

Incubating
4/5

8K context multilingual embedding with task-specific LoRAs

Long context embeddings up to 8192 tokens โ€” ideal for document-level retrieval. Task-specific LoRA adapters for retrieval, classification, and similarity. Good API and open weights.

Open Source

Nomic Embed v1.5

Incubating
3/5

Fully open-source embedding model with Matryoshka support

Truly open โ€” fully auditable training data and code. Competitive MTEB scores at 137M params. Runs on CPU. Best for teams requiring full transparency and reproducibility.

Open Source

all-MiniLM-L6-v2

Graduated
3/5

Classic lightweight embedding model โ€” fast and CPU-friendly

The embedding model that started the vector search revolution. 22M params, runs anywhere. Quality surpassed by newer models but still the default for quick prototypes and edge deployment.

Open Source

GTE-Qwen2

Sandbox
3/5

Alibaba's embedding model with strong CJK language support

Excellent for Chinese, Japanese, Korean retrieval scenarios. Multiple size variants from 1.5B to 7B. Good balance of multilingual quality and inference cost.

Open Source

Mixedbread Embed

Sandbox
3/5

Emerging high-quality embedding model from Berlin-based lab

Strong newcomer with competitive MTEB scores. Binary quantization support for efficient storage. Good API with self-host options. Watch this space โ€” rapidly improving.