IIC
/

RigoChat-7b-v2

Text Generation

text-generation-inference

Model card Files Files and versions

gonzalo-santamaria-iic commited on Nov 26, 2024

Commit

937a31e

·

verified ·

1 Parent(s): 8e315de

Update README.md

Files changed (1) hide show

README.md +5 -6

README.md CHANGED Viewed

@@ -91,25 +91,26 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 ### Tool Use
 ```python
-def get_current_weather(location: str) -> float:
     """
     Obtener la datos del tiempo de una localización.
     Args:
         location: La locaización, con el siguiente formato: "Ciudad, País."
     Returns:
         El tiempo en dicha localización.
     """
     return {"temperatura": 22, "cielo": "nublado", "probabilidad de lluvias": "60%"}
 messages = [
-  {"role": "user", "content": "Este fin de semana quiero visitar Madrid, y no se qué ropa llevarme. ¿Podrías decirme qué tal va a hacer?"}
 ]
 text = tokenizer.apply_chat_template(
     messages,
     tokenize=False,
-    tools=[suma],
     add_generation_prompt=True
 )
 model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
@@ -129,9 +130,7 @@ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
 ### Training Data
-<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
-[More Information Needed]
 ### Training Procedure

 ### Tool Use
 ```python
+def get_current_weather(location: str, date: str) -> float:
     """
     Obtener la datos del tiempo de una localización.
     Args:
         location: La locaización, con el siguiente formato: "Ciudad, País."
+        date: La fecha, en el formato AAAA-MM-DD.
     Returns:
         El tiempo en dicha localización.
     """
     return {"temperatura": 22, "cielo": "nublado", "probabilidad de lluvias": "60%"}
 messages = [
+  {"role": "user", "content": "Este fin de semana quiero visitar Madrid, y no se qué ropa llevarme. ¿Podrías decirme qué tal va a hacer? Es el puente del 6 de diciembre de 2024."}
 ]
 text = tokenizer.apply_chat_template(
     messages,
     tokenize=False,
+    tools=[get_current_weather],
     add_generation_prompt=True
 )
 model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
 ### Training Data
+A combination of both public and private datasets, the latter designed in the IIC. The dataset consists of 21975 conversations in Spanish, with the format `chatml`. Each conversation has two variants: `chosen` and `rejected`, where the only thing that changes is the last answer of the assistant. The last answer in the `chosen` variant is considered a better answer than the one in the `rejected` variant. Different techniques have been used to generate the dataset, which we explain in depth in the paper (**coming soon**).
 ### Training Procedure