[Chroma DB] Change embeddings(using huggingface model)
by Edward Park
Intro
- Chroma DB’s default embedding model is
all-MiniLM-L6-v2
. But in languages other than English, better models exist. - Chroma DB supports huggingface models and usage is very simple.
Code
My Development Env
- Python: 3.9.13
- Chroma: 0.4.18
Use Default Embedding Model
- It may take a long time to download the model when running it for the first time.
import chromadb
from chromadb.db.base import UniqueConstraintError
client = chromadb.PersistentClient(path="db/") # data stored in 'db' folder
try:
collection = client.create_collection(name='article')
except UniqueConstraintError: # already exist collection
collection = client.get_collection(name='article')
Use Custom Embedding Model
- In this example, used Huffon/sentence-klue-roberta-base model in huggingface.
import chromadb
from chromadb.db.base import UniqueConstraintError
from chromadb.utils import embedding_functions
client = chromadb.PersistentClient(path="db/") # data stored in 'db' folder
em = embedding_functions.SentenceTransformerEmbeddingFunction(model_name="Huffon/sentence-klue-roberta-base")
try:
collection = client.create_collection(name='article', embedding_function=em)
except UniqueConstraintError: # already exist collection
collection = client.get_collection(name='article', embedding_function=em)
Test: Save/Query
# save
collection.add(
documents = ['NAVER Corporation Earnings Surprise', 'Samgsung Corporation Earnings Surprise'],
ids = ['naver1', 'samsung1']
)
# query
results = collection.query(
query_texts='SamSam',
n_results=1
)
print(results)