File size: 3,035 Bytes
c09ec07
6b90dfa
 
c09ec07
6b90dfa
 
 
 
 
 
 
 
c09ec07
3f6d920
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c09ec07
 
6b90dfa
 
c09ec07
6b90dfa
c09ec07
6b90dfa
 
c09ec07
 
 
 
 
6b90dfa
 
 
c09ec07
6b90dfa
 
 
 
c09ec07
 
6b90dfa
 
c09ec07
6b90dfa
 
c09ec07
 
6b90dfa
c09ec07
6b90dfa
 
 
 
 
c09ec07
6b90dfa
c09ec07
6b90dfa
c09ec07
3f6d920
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
6b90dfa
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
---
language:
  - en
tags:
  - sentence-transformers
  - bible
  - cross-translation
  - semantic-similarity
  - embeddings
license: mit
datasets:
  - LoveJesus/biblical-embedding-dataset-chirho
pipeline_tag: sentence-similarity
model-index:
  - name: biblical-cross-translation-chirho
    results:
      - task:
          type: sentence-similarity
          name: Cross-Translation Semantic Similarity
        dataset:
          type: LoveJesus/biblical-embedding-dataset-chirho
          name: Biblical Embedding Dataset (Chirho)
        metrics:
          - type: accuracy
            value: 0.9988
            name: Accuracy@0.5
          - type: roc_auc
            value: 1.0000
            name: ROC AUC
          - type: spearmanr
            value: 0.4915
            name: Spearman Correlation
---

<!-- For God so loved the world that he gave his only begotten Son, -->
<!-- that whoever believes in him should not perish but have eternal life. - John 3:16 -->

# Cross-Translation Bible Embeddings

A sentence transformer fine-tuned to create a shared embedding space where semantically
equivalent Bible verses across different translations map to nearby vectors.

## Usage

```python
from sentence_transformers import SentenceTransformer
from sentence_transformers.util import cos_sim

model = SentenceTransformer("LoveJesus/biblical-cross-translation-chirho")

verses = [
    "[KJV] In the beginning God created the heaven and the earth.",
    "[BBE] At the first God made the heaven and the earth.",
    "[KJV] And the earth was without form, and void;",
]

embeddings = model.encode(verses)
similarities = cos_sim(embeddings, embeddings)
print(similarities)
# Gen 1:1 KJV vs Gen 1:1 BBE: ~0.95 (same verse, different translation)
# Gen 1:1 KJV vs Gen 1:2 KJV: ~0.30 (different verses)
```

## Training

- **Base model**: paraphrase-multilingual-MiniLM-L12-v2 (118M params, 384-dim)
- **Training**: Contrastive learning (CosineSimilarityLoss) on ~300K verse pairs
- **Translations**: KJV, ASV, YLT, BBE, WEB (all public domain)
- **Positive pairs**: Same verse in different translations
- **Negative pairs**: Different verses from the same translation

## Part of bible.systems

This is model 5 of 5 in the [bible.systems](https://bible.systems) ML pipeline.

## Evaluation Results

Evaluated on a held-out test set of cross-translation verse pairs.

| Metric | Score |
|--------|-------|
| **Accuracy@0.5** (cosine sim threshold) | **0.9988** |
| **ROC AUC** | **1.0000** |
| **Spearman Correlation** | **0.4915** |
| **Avg Positive Similarity** | 0.9841 |
| **Avg Negative Similarity** | 0.0359 |
| **Similarity Gap** (pos - neg) | **0.9482** |

> The model achieves near-perfect discrimination between same-verse pairs across translations (high positive similarity) and different-verse pairs (low negative similarity), with a gap of 0.95. The Spearman correlation is moderate because within-class similarity variance is low (most positive pairs cluster near 0.98).


---
*For God so loved the world...* — John 3:16