--- language: - en tags: - sentence-transformers - bible - cross-translation - semantic-similarity - embeddings license: mit datasets: - LoveJesus/biblical-embedding-dataset-chirho pipeline_tag: sentence-similarity model-index: - name: biblical-cross-translation-chirho results: - task: type: sentence-similarity name: Cross-Translation Semantic Similarity dataset: type: LoveJesus/biblical-embedding-dataset-chirho name: Biblical Embedding Dataset (Chirho) metrics: - type: accuracy value: 0.9988 name: Accuracy@0.5 - type: roc_auc value: 1.0000 name: ROC AUC - type: spearmanr value: 0.4915 name: Spearman Correlation --- # Cross-Translation Bible Embeddings A sentence transformer fine-tuned to create a shared embedding space where semantically equivalent Bible verses across different translations map to nearby vectors. ## Usage ```python from sentence_transformers import SentenceTransformer from sentence_transformers.util import cos_sim model = SentenceTransformer("LoveJesus/biblical-cross-translation-chirho") verses = [ "[KJV] In the beginning God created the heaven and the earth.", "[BBE] At the first God made the heaven and the earth.", "[KJV] And the earth was without form, and void;", ] embeddings = model.encode(verses) similarities = cos_sim(embeddings, embeddings) print(similarities) # Gen 1:1 KJV vs Gen 1:1 BBE: ~0.95 (same verse, different translation) # Gen 1:1 KJV vs Gen 1:2 KJV: ~0.30 (different verses) ``` ## Training - **Base model**: paraphrase-multilingual-MiniLM-L12-v2 (118M params, 384-dim) - **Training**: Contrastive learning (CosineSimilarityLoss) on ~300K verse pairs - **Translations**: KJV, ASV, YLT, BBE, WEB (all public domain) - **Positive pairs**: Same verse in different translations - **Negative pairs**: Different verses from the same translation ## Part of bible.systems This is model 5 of 5 in the [bible.systems](https://bible.systems) ML pipeline. ## Evaluation Results Evaluated on a held-out test set of cross-translation verse pairs. | Metric | Score | |--------|-------| | **Accuracy@0.5** (cosine sim threshold) | **0.9988** | | **ROC AUC** | **1.0000** | | **Spearman Correlation** | **0.4915** | | **Avg Positive Similarity** | 0.9841 | | **Avg Negative Similarity** | 0.0359 | | **Similarity Gap** (pos - neg) | **0.9482** | > The model achieves near-perfect discrimination between same-verse pairs across translations (high positive similarity) and different-verse pairs (low negative similarity), with a gap of 0.95. The Spearman correlation is moderate because within-class similarity variance is low (most positive pairs cluster near 0.98). --- *For God so loved the world...* — John 3:16