oneozerova reach-vb commited on
Commit
1acfdc3
·
0 Parent(s):

Duplicate from facebook/seamless-streaming

Browse files

Co-authored-by: Vaibhav Srivastav <reach-vb@users.noreply.huggingface.co>

.gitattributes ADDED
@@ -0,0 +1,35 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ *.7z filter=lfs diff=lfs merge=lfs -text
2
+ *.arrow filter=lfs diff=lfs merge=lfs -text
3
+ *.bin filter=lfs diff=lfs merge=lfs -text
4
+ *.bz2 filter=lfs diff=lfs merge=lfs -text
5
+ *.ckpt filter=lfs diff=lfs merge=lfs -text
6
+ *.ftz filter=lfs diff=lfs merge=lfs -text
7
+ *.gz filter=lfs diff=lfs merge=lfs -text
8
+ *.h5 filter=lfs diff=lfs merge=lfs -text
9
+ *.joblib filter=lfs diff=lfs merge=lfs -text
10
+ *.lfs.* filter=lfs diff=lfs merge=lfs -text
11
+ *.mlmodel filter=lfs diff=lfs merge=lfs -text
12
+ *.model filter=lfs diff=lfs merge=lfs -text
13
+ *.msgpack filter=lfs diff=lfs merge=lfs -text
14
+ *.npy filter=lfs diff=lfs merge=lfs -text
15
+ *.npz filter=lfs diff=lfs merge=lfs -text
16
+ *.onnx filter=lfs diff=lfs merge=lfs -text
17
+ *.ot filter=lfs diff=lfs merge=lfs -text
18
+ *.parquet filter=lfs diff=lfs merge=lfs -text
19
+ *.pb filter=lfs diff=lfs merge=lfs -text
20
+ *.pickle filter=lfs diff=lfs merge=lfs -text
21
+ *.pkl filter=lfs diff=lfs merge=lfs -text
22
+ *.pt filter=lfs diff=lfs merge=lfs -text
23
+ *.pth filter=lfs diff=lfs merge=lfs -text
24
+ *.rar filter=lfs diff=lfs merge=lfs -text
25
+ *.safetensors filter=lfs diff=lfs merge=lfs -text
26
+ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
27
+ *.tar.* filter=lfs diff=lfs merge=lfs -text
28
+ *.tar filter=lfs diff=lfs merge=lfs -text
29
+ *.tflite filter=lfs diff=lfs merge=lfs -text
30
+ *.tgz filter=lfs diff=lfs merge=lfs -text
31
+ *.wasm filter=lfs diff=lfs merge=lfs -text
32
+ *.xz filter=lfs diff=lfs merge=lfs -text
33
+ *.zip filter=lfs diff=lfs merge=lfs -text
34
+ *.zst filter=lfs diff=lfs merge=lfs -text
35
+ *tfevents* filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,211 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ inference: False
4
+ tags:
5
+ - audio-to-audio
6
+ - text-to-speech
7
+ library_name: seamless_communication
8
+ language:
9
+ - af
10
+ - am
11
+ - ar
12
+ - as
13
+ - az
14
+ - be
15
+ - bn
16
+ - bs
17
+ - bg
18
+ - ca
19
+ - cs
20
+ - zh
21
+ - cy
22
+ - da
23
+ - de
24
+ - el
25
+ - en
26
+ - et
27
+ - fi
28
+ - fr
29
+ - or
30
+ - om
31
+ - ga
32
+ - gl
33
+ - gu
34
+ - ha
35
+ - he
36
+ - hi
37
+ - hr
38
+ - hu
39
+ - hy
40
+ - ig
41
+ - id
42
+ - is
43
+ - it
44
+ - jv
45
+ - ja
46
+ - kn
47
+ - ka
48
+ - kk
49
+ - mn
50
+ - km
51
+ - ky
52
+ - ko
53
+ - lo
54
+ - ln
55
+ - lt
56
+ - lb
57
+ - lg
58
+ - lv
59
+ - ml
60
+ - mr
61
+ - mk
62
+ - mt
63
+ - mi
64
+ - my
65
+ - nl
66
+ - nb
67
+ - ne
68
+ - ny
69
+ - oc
70
+ - pa
71
+ - ps
72
+ - fa
73
+ - pl
74
+ - pt
75
+ - ro
76
+ - ru
77
+ - sk
78
+ - sl
79
+ - sn
80
+ - sd
81
+ - so
82
+ - es
83
+ - sr
84
+ - sv
85
+ - sw
86
+ - ta
87
+ - te
88
+ - tg
89
+ - tl
90
+ - th
91
+ - tr
92
+ - uk
93
+ - ur
94
+ - uz
95
+ - vi
96
+ - wo
97
+ - xh
98
+ - yo
99
+ - ms
100
+ - zu
101
+ - ary
102
+ - arz
103
+ - yue
104
+ - kea
105
+ ---
106
+
107
+ # SeamlessStreaming
108
+ SeamlessStreaming is a multilingual streaming translation model. It supports:
109
+
110
+ - Streaming Automatic Speech Recognition on 96 languages.
111
+ - Simultaneous translation on 101 source languages for speech input.
112
+ - Simultaneous translation on 96 target languages for text output.
113
+ - Simultaneous translation on 36 target languages for speech output.
114
+
115
+ ![SeamlessStreaming architecture](streaming_arch.png)
116
+
117
+ ## SeamlessStreaming models
118
+ | Model Name | #params | checkpoint | metrics |
119
+ | ------------------ | ------- | --------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------ |
120
+ | SeamlessStreaming | 2.5B | [🤗 Model card](https://huggingface.co/facebook/seamless-streaming) - [monotonic decoder checkpoint](https://huggingface.co/facebook/seamless-streaming/resolve/main/seamless_streaming_monotonic_decoder.pt) - [streaming UnitY2 checkpoint](https://huggingface.co/facebook/seamless-streaming/resolve/main/seamless_streaming_unity.pt) | [metrics](https://dl.fbaipublicfiles.com/seamless/metrics/streaming/seamless_streaming.zip) |
121
+
122
+ The evaluation data ids for FLEURS, CoVoST2 and CVSS-C can be found [here](https://dl.fbaipublicfiles.com/seamless/metrics/evaluation_data_ids.zip)
123
+
124
+
125
+ ## Evaluating SeamlessStreaming models
126
+ To reproduce our results, or to evaluate using the same metrics over your own test sets, please check out the [Evaluation README here](../../src/seamless_communication/cli/streaming/README.md). Streaming evaluation depends on the SimulEval library.
127
+
128
+ ## Seamless Streaming demo
129
+
130
+ ### Running on HF spaces
131
+ You can simply duplicate the space to run it. [🤗 HF Space](https://huggingface.co/spaces/facebook/seamless-streaming)
132
+
133
+ ## Running locally
134
+
135
+ ### Install backend seamless_server dependencies
136
+
137
+ > [!NOTE]
138
+ > Please note: we *do not* recommend running the model on CPU. CPU inference will be slow and introduce noticable delays in the simultaneous translation.
139
+
140
+ > [!NOTE]
141
+ > The example below is for PyTorch stable (2.1.1) and variant cu118.
142
+ > Check [here](https://pytorch.org/get-started/locally/) to find the torch/torchaudio command for your variant.
143
+ > Check [here](https://github.com/facebookresearch/fairseq2#variants) to find the fairseq2 command for your variant.
144
+
145
+ If running for the first time, create conda environment and install the desired torch version. Then install the rest of the requirements:
146
+ ```
147
+ cd seamless_server
148
+ conda create --yes --name smlss_server python=3.8 libsndfile==1.0.31
149
+ conda activate smlss_server
150
+ conda install --yes pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
151
+ pip install fairseq2 --pre --extra-index-url https://fair.pkg.atmeta.com/fairseq2/whl/nightly/pt2.1.1/cu118
152
+ pip install -r requirements.txt
153
+ ```
154
+
155
+ ### Install frontend streaming-react-app dependencies
156
+ ```
157
+ conda install -c conda-forge nodejs
158
+ cd streaming-react-app
159
+ npm install --global yarn
160
+ yarn
161
+ yarn build # this will create the dist/ folder
162
+ ```
163
+
164
+
165
+ ### Running the server
166
+
167
+ The server can be run locally with uvicorn below.
168
+ Run the server in dev mode:
169
+
170
+ ```
171
+ cd seamless_server
172
+ uvicorn app_pubsub:app --reload --host localhost
173
+ ```
174
+
175
+ Run the server in prod mode:
176
+
177
+ ```
178
+ cd seamless_server
179
+ uvicorn app_pubsub:app --host 0.0.0.0
180
+ ```
181
+
182
+ To enable additional logging from uvicorn pass `--log-level debug` or `--log-level trace`.
183
+
184
+
185
+ ### Debuging
186
+
187
+ If you enable "Server Debug Flag" when starting streaming from the client, this enables extensive debug logging and it saves audio files in /debug folder.
188
+
189
+ ## Citation
190
+
191
+ For EMMA, please cite :
192
+ ```bibtex
193
+ @article{ma_efficient_2023,
194
+ author={Ma, Xutai and Sun, Anna and Ouyang, Siqi and Inaguma, Hirofumi and Tomasello, Paden},
195
+ title={Efficient Monotonic Multihead Attention},
196
+ year={2023},
197
+ url={https://ai.meta.com/research/publications/efficient-monotonic-multihead-attention/},
198
+ }
199
+ ```
200
+
201
+ For SeamlessStreaming, please cite :
202
+ ```bibtex
203
+ @inproceedings{seamless2023,
204
+ title="Seamless: Multilingual Expressive and Streaming Speech Translation",
205
+ author="{Seamless Communication}, Lo{\"i}c Barrault, Yu-An Chung, Mariano Coria Meglioli, David Dale, Ning Dong, Mark Duppenthaler, Paul-Ambroise Duquenne, Brian Ellis, Hady Elsahar, Justin Haaheim, John Hoffman, Min-Jae Hwang, Hirofumi Inaguma, Christopher Klaiber, Ilia Kulikov, Pengwei Li, Daniel Licht, Jean Maillard, Ruslan Mavlyutov, Alice Rakotoarison, Kaushik Ram Sadagopan, Abinesh Ramakrishnan, Tuan Tran, Guillaume Wenzek, Yilin Yang, Ethan Ye, Ivan Evtimov, Pierre Fernandez, Cynthia Gao, Prangthip Hansanti, Elahe Kalbassi, Amanda Kallet, Artyom Kozhevnikov, Gabriel Mejia, Robin San Roman, Christophe Touret, Corinne Wong, Carleigh Wood, Bokai Yu, Pierre Andrews, Can Balioglu, Peng-Jen Chen, Marta R. Costa-juss{\`a}, Maha Elbayad, Hongyu Gong, Francisco Guzm{\'a}n, Kevin Heffernan, Somya Jain, Justine Kao, Ann Lee, Xutai Ma, Alex Mourachko, Benjamin Peloquin, Juan Pino, Sravya Popuri, Christophe Ropers, Safiyyah Saleem, Holger Schwenk, Anna Sun, Paden Tomasello, Changhan Wang, Jeff Wang, Skyler Wang, Mary Williamson",
206
+ journal={ArXiv},
207
+ year={2023}
208
+ }
209
+ ```
210
+
211
+ [//]: # "https://arxiv.org/abs/2312.05187"
seamless_streaming_monotonic_decoder.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:7797deda337f51f124450857f9aa851e4a869efeb9078aa7c57966f52af6335c
3
+ size 4273639264
seamless_streaming_unity.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ce7dfdb0af9d81c2a5d2accf508cc1e5fd2d120cdaf15d9c8606655e1dea89eb
3
+ size 3587576654
spm_char_lang38_tc.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9e7f2075dbc38dbe11d2414bfa4fb8e900022e87bbff4f74c97817e32a7ab493
3
+ size 368901
streaming_arch.png ADDED
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:026a76827537db9f1348e4d5aaa127bb10a2f2ff633243f3a52d16be82d73f9d
3
+ size 5165809
vocoder_v2.pt ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:20c50c3edd3fb08704c10542bf3ec72e8a96aaba4ec09fb6ac1fa64172c8ca13
3
+ size 167785015