San Diego Land Use Legal-BERT
This model is a domain-adapted version of nlpaueb/legal-bert-base-uncased, specifically fine-tuned for semantic retrieval of the San Diego Municipal Code (Land Development Code).
It was developed as part of a RAG (Retrieval-Augmented Generation) pipeline to make complex zoning and land use regulations more accessible.
Performance
The model demonstrates a 140% improvement in retrieval precision over generic baseline models when queried with domain-specific legal questions.
| Metric | BM25 (Lexical) | MiniLM (Generic Neural) | Legal-BERT (This Model) |
|---|---|---|---|
| Mean Reciprocal Rank (MRR) | 0.043 | 0.191 | 0.501 |
| Hit Rate @ 5 | 4% | 30% | 72% |
Training Methodology
- Domain Adaptation (MLM): The base Legal-BERT model was adapted using Masked Language Modeling on the full text of San Diego Municipal Code Chapters 11-15.
- Contrastive Fine-Tuning: The model was further refined using
MultipleNegativesRankingLosson a synthetic ground-truth dataset of 50 complex, user-centric query-anchor pairs to optimize the dense vector space for retrieval.
Usage (Sentence-Transformers)
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("AreTaj/sd-land-use-legal-bert")
queries = ["What are the setback requirements for RM-1-1 zones?", "Permitted uses in open space."]
embeddings = model.encode(queries)
License
This model is released under the Apache License 2.0.
Citation
If you use this model in your research, please attribute it to the San Diego Land Use RAG Project.
- Downloads last month
- 95