San Diego Land Use Legal-BERT

This model is a domain-adapted version of nlpaueb/legal-bert-base-uncased, specifically fine-tuned for semantic retrieval of the San Diego Municipal Code (Land Development Code).

It was developed as part of a RAG (Retrieval-Augmented Generation) pipeline to make complex zoning and land use regulations more accessible.

Performance

The model demonstrates a 140% improvement in retrieval precision over generic baseline models when queried with domain-specific legal questions.

Metric BM25 (Lexical) MiniLM (Generic Neural) Legal-BERT (This Model)
Mean Reciprocal Rank (MRR) 0.043 0.191 0.501
Hit Rate @ 5 4% 30% 72%

Training Methodology

  1. Domain Adaptation (MLM): The base Legal-BERT model was adapted using Masked Language Modeling on the full text of San Diego Municipal Code Chapters 11-15.
  2. Contrastive Fine-Tuning: The model was further refined using MultipleNegativesRankingLoss on a synthetic ground-truth dataset of 50 complex, user-centric query-anchor pairs to optimize the dense vector space for retrieval.

Usage (Sentence-Transformers)

from sentence_transformers import SentenceTransformer

model = SentenceTransformer("AreTaj/sd-land-use-legal-bert")
queries = ["What are the setback requirements for RM-1-1 zones?", "Permitted uses in open space."]
embeddings = model.encode(queries)

License

This model is released under the Apache License 2.0.

Citation

If you use this model in your research, please attribute it to the San Diego Land Use RAG Project.

Downloads last month
95
Safetensors
Model size
0.1B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support