alvarobartt HF Staff commited on
Commit
acb50ea
·
verified ·
1 Parent(s): 4149d6b

Add `text-embeddings-inference` snippet in `README.md`

Browse files

This PR adds a sample snippet on how to deploy and run inference with Text Embeddings Inference (TEI) via Docker in the `README.md`.

Thanks in advance 🤗

Files changed (1) hide show
  1. README.md +47 -0
README.md CHANGED
@@ -93,6 +93,53 @@ embed_document2 = outputs[1].outputs.data
93
 
94
  </details>
95
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
96
  <details>
97
  <summary> via <a href="https://github.com/ggml-org/llama.cpp">llama.cpp (GGUF)</a></summary>
98
  After installing <a href="https://github.com/ggml-org/llama.cpp">llama.cpp</a> one can run llama-server to host the embedding model as OpenAI API compatible HTTP server with the respective model version:
 
93
 
94
  </details>
95
 
96
+ <details>
97
+ <summary>via <a href="https://github.com/huggingface/text-embeddings-inference">Text Embeddings Inference</a></summary>
98
+
99
+ - Via Docker on CPU:
100
+ ```bash
101
+ docker run -p 8080:80 \
102
+ ghcr.io/huggingface/text-embeddings-inference:cpu-1.9 \
103
+ --model-id jinaai/jina-embeddings-v5-text-small-classification \
104
+ --dtype float32 --pooling last-token
105
+ ```
106
+ - Via Docker on NVIDIA GPU (Turing, Ampere, Ada Lovelace, Hopper or Blackwell):
107
+ ```bash
108
+ docker run --gpus all --shm-size 1g -p 8080:80 \
109
+ ghcr.io/huggingface/text-embeddings-inference:cuda-1.9 \
110
+ --model-id jinaai/jina-embeddings-v5-text-small-classification \
111
+ --dtype float16 --pooling last-token
112
+ ```
113
+
114
+ > Alternatively, you can also run with `cargo`, more information can be found in the [Text Embeddings Inference documentation](https://hf.co/docs/text-embeddings-inference).
115
+
116
+ Send a request to `/v1/embeddings` to generate embeddings via the [OpenAI Embeddings API](https://platform.openai.com/docs/api-reference/embeddings/create):
117
+
118
+ ```bash
119
+ curl -X POST http://127.0.0.1:8080/v1/embeddings \
120
+ -H "Content-Type: application/json" \
121
+ -d '{
122
+ "model": "jinaai/jina-embeddings-v5-text-small-classification",
123
+ "input": [
124
+ "Query: Overview of climate change impacts on coastal cities",
125
+ "Document: The impacts of climate change on coastal cities are significant...",
126
+ ]
127
+ }'
128
+ ```
129
+
130
+ Or rather via the [Text Embeddings Inference API specification](https://huggingface.github.io/text-embeddings-inference/) instead, to prevent from manually formatting the inputs:
131
+
132
+ ```bash
133
+ curl -X POST http://127.0.0.1:8080/embed \
134
+ -H "Content-Type: application/json" \
135
+ -d '{
136
+ "inputs": "Overview of climate change impacts on coastal cities",
137
+ "prompt_name": "query",
138
+ }'
139
+ ```
140
+
141
+ </details>
142
+
143
  <details>
144
  <summary> via <a href="https://github.com/ggml-org/llama.cpp">llama.cpp (GGUF)</a></summary>
145
  After installing <a href="https://github.com/ggml-org/llama.cpp">llama.cpp</a> one can run llama-server to host the embedding model as OpenAI API compatible HTTP server with the respective model version: