Add `text-embeddings-inference` snippet in `README.md`
Browse filesThis PR adds a sample snippet on how to deploy and run inference with Text Embeddings Inference (TEI) via Docker in the `README.md`.
Thanks in advance 🤗
README.md
CHANGED
|
@@ -93,6 +93,53 @@ embed_document2 = outputs[1].outputs.data
|
|
| 93 |
|
| 94 |
</details>
|
| 95 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 96 |
<details>
|
| 97 |
<summary> via <a href="https://github.com/ggml-org/llama.cpp">llama.cpp (GGUF)</a></summary>
|
| 98 |
After installing <a href="https://github.com/ggml-org/llama.cpp">llama.cpp</a> one can run llama-server to host the embedding model as OpenAI API compatible HTTP server with the respective model version:
|
|
|
|
| 93 |
|
| 94 |
</details>
|
| 95 |
|
| 96 |
+
<details>
|
| 97 |
+
<summary>via <a href="https://github.com/huggingface/text-embeddings-inference">Text Embeddings Inference</a></summary>
|
| 98 |
+
|
| 99 |
+
- Via Docker on CPU:
|
| 100 |
+
```bash
|
| 101 |
+
docker run -p 8080:80 \
|
| 102 |
+
ghcr.io/huggingface/text-embeddings-inference:cpu-1.9 \
|
| 103 |
+
--model-id jinaai/jina-embeddings-v5-text-small-classification \
|
| 104 |
+
--dtype float32 --pooling last-token
|
| 105 |
+
```
|
| 106 |
+
- Via Docker on NVIDIA GPU (Turing, Ampere, Ada Lovelace, Hopper or Blackwell):
|
| 107 |
+
```bash
|
| 108 |
+
docker run --gpus all --shm-size 1g -p 8080:80 \
|
| 109 |
+
ghcr.io/huggingface/text-embeddings-inference:cuda-1.9 \
|
| 110 |
+
--model-id jinaai/jina-embeddings-v5-text-small-classification \
|
| 111 |
+
--dtype float16 --pooling last-token
|
| 112 |
+
```
|
| 113 |
+
|
| 114 |
+
> Alternatively, you can also run with `cargo`, more information can be found in the [Text Embeddings Inference documentation](https://hf.co/docs/text-embeddings-inference).
|
| 115 |
+
|
| 116 |
+
Send a request to `/v1/embeddings` to generate embeddings via the [OpenAI Embeddings API](https://platform.openai.com/docs/api-reference/embeddings/create):
|
| 117 |
+
|
| 118 |
+
```bash
|
| 119 |
+
curl -X POST http://127.0.0.1:8080/v1/embeddings \
|
| 120 |
+
-H "Content-Type: application/json" \
|
| 121 |
+
-d '{
|
| 122 |
+
"model": "jinaai/jina-embeddings-v5-text-small-classification",
|
| 123 |
+
"input": [
|
| 124 |
+
"Query: Overview of climate change impacts on coastal cities",
|
| 125 |
+
"Document: The impacts of climate change on coastal cities are significant...",
|
| 126 |
+
]
|
| 127 |
+
}'
|
| 128 |
+
```
|
| 129 |
+
|
| 130 |
+
Or rather via the [Text Embeddings Inference API specification](https://huggingface.github.io/text-embeddings-inference/) instead, to prevent from manually formatting the inputs:
|
| 131 |
+
|
| 132 |
+
```bash
|
| 133 |
+
curl -X POST http://127.0.0.1:8080/embed \
|
| 134 |
+
-H "Content-Type: application/json" \
|
| 135 |
+
-d '{
|
| 136 |
+
"inputs": "Overview of climate change impacts on coastal cities",
|
| 137 |
+
"prompt_name": "query",
|
| 138 |
+
}'
|
| 139 |
+
```
|
| 140 |
+
|
| 141 |
+
</details>
|
| 142 |
+
|
| 143 |
<details>
|
| 144 |
<summary> via <a href="https://github.com/ggml-org/llama.cpp">llama.cpp (GGUF)</a></summary>
|
| 145 |
After installing <a href="https://github.com/ggml-org/llama.cpp">llama.cpp</a> one can run llama-server to host the embedding model as OpenAI API compatible HTTP server with the respective model version:
|