--- extra_gated_prompt: Our models are intended for academic projects and academic research only. If you are not affiliated with an academic institution, please reach out to us at huggingface [at] poltextlab [dot] com for further inquiry. If we cannot clearly determine your academic affiliation and use case based on your form data, your request may be rejected. Please allow us a few business days to manually review subscriptions. extra_gated_fields: Country: country Institution: text Institution Email: text Full Name: text Please specify your academic project/use case you want to use the models for: text --- # Model Card ## Model Description This is an **xlm-roberta-large** model finetuned on English training data labelled with the 10 major level codes from the LiSST thesaurus: 1. **Demographics**: DEMOGRAPHY, LIFE EVENTS, IDENTITY 2. **Environment**: ENVIRONMENT (including HOME) 3. **Health**: HEALTH & CARE 4. **Work**: WORK & EMPLOYMENT (including TRAINING) 5. **Education**: EDUCATION & QUALIFICATION 6. **Network**: FAMILY & SOCIAL NETWORK 7. **Values**: ATTITUDES, VALUES 8. **Policy**: PUBLIC POLICY 9. **Time use**: TIME USE, LEISURE 10. **Income**: INCOME AND CONSUMPTION --- ## How to Use the Model ```python from transformers import AutoTokenizer, pipeline tokenizer = AutoTokenizer.from_pretrained("xlm-roberta-large") pipe = pipeline( model="poltextlab/ontolisst_major_v1_xlm-roberta-large_8_5e-06_128", task="text-classification", tokenizer=tokenizer, use_fast=False, token="" ) text = "Apart from yourself (and your husband/wife/partner), does anyone else living in your household make a contribution towards the cost of the accommodation? pipe(text) ``` ## Gated Access This model requires gated access. You must pass the token parameter when loading the model. In earlier versions of the Transformers package, you may need to use the use_auth_token parameter instead. ## Model Performance The model was evaluated on a test set of **1,627 English examples**. - **Accuracy**: 0.82 - **Precision**: 0.82 - **Recall**: 0.82 - **Weighted Average F1-score**: 0.82 ### Classification Report | Label | Precision | Recall | F1-score | Support | |-------|-----------|--------|----------|---------| | 1 (Demographics) | 0.70 | 0.52 | 0.60 | 117 | | 2 (Environment) | 0.79 | 0.72 | 0.75 | 109 | | 3 (Health) | 0.89 | 0.94 | 0.92 | 679 | | 4 (Work) | 0.83 | 0.84 | 0.84 | 127 | | 5 (Education) | 0.81 | 0.83 | 0.82 | 111 | | 6 (Network) | 0.77 | 0.87 | 0.82 | 241 | | 7 (Values) | 0.72 | 0.70 | 0.71 | 138 | | 8 (Policy) | 0.89 | 0.31 | 0.46 | 26 | | 9 (Time use) | 0.75 | 0.26 | 0.39 | 23 | | 10 (Income) | 0.68 | 0.71 | 0.70 | 56 |