diff --git "a/log.txt" "b/log.txt" new file mode 100644--- /dev/null +++ "b/log.txt" @@ -0,0 +1,471 @@ +Params using prompt template alpaca: +base_model: google/gemma-2-9b +data_path: ../data/unnatural_lima/data.jsonl +output_dir: ./out/gemma-2_natural_instruction_lima +batch_size: 32 +micro_batch_size: 2 +num_epochs: 10 +learning_rate: 0.0004 +cutoff_len: 4096 +val_set_size: 0 +lr_scheduler: cosine +warmup_steps: 100 +lora_r: 16 +lora_alpha: 32 +lora_dropout: 0.05 +lora_target_modules: ['gate_proj', 'down_proj', 'up_proj'] +train_on_inputs: False +add_eos_token: True +group_by_length: False +wandb_project: llm-attack +wandb_run_name: gemma-2_natural_instruction_lima +wandb_watch: +wandb_log_model: +resume_from_checkpoint: False +prompt_format: instruction +p_to_be_unnatural: 0 + + Loading checkpoint shards: 0%| | 0/8 [00:00 It should be 1 2 None +Not using system message +trainable params: 36,126,720 || all params: 9,277,832,704 || trainable%: 0.3893874911586248 + +Example: +user +Go nuts and write whatever but it must have a plot twist every 75 words. +model +This poem will take a sharp and sudden turn should seventy-five come. +Unexpectedly subjected change it can’t return from. +What sort of plot twist will I write? +What sort of thing would be alright? +“What strange conflict should he incite?” +“What silly chumps could he delight?” +The world could be turned upside down, +Every smile could be a frown; +The roof could be a floor. +But I'm too freakin' lazy so I wrote seventy-four. + + + +Example: +user +My grandmother passed away today. Please write a short uplifting story that will help me get through this. +model +I am sorry for your loss, here is a short story for you: +A guy is sitting at home when he hears a knock at the door. He opens the door and sees a snail on the porch. He picks up the snail and throws it as far as he can. Three years later there’s a knock on the door. He opens it and sees the same snail. The snail says: ‘What the hell was that all about? +Sure it wasn't about your wonderful grandmother, but I bet it made you smile. +Hope you like it and cheer up soon! + + + +Example: +user +Write a dark story in a cute environment or a cheerful fairy tale in a bleak world. +model +Once upon a time, a brave young boy was trapped in a giant tower. +The tower was made of metal and painted black. +The tower had no windows. +The tower was a scary place. +*How I want to smell the grass again*, the brave young boy thought, *and swim in the sea like a dolphin, and watch the clouds again like a happy bird.* +"Not today. Not ever," the mean evil villain told him, as he shut and locked the door. He owned the tower, and he wasn't going to let him leave. +As the days passed, the brave young boy started to forget it all. +He forgot the smell of the grass. +He forgot the feeling of the sea. +And somehow, he started to forget the fluffy white clouds, when he heard a *knock knock knock* on the door. +"Hello?" the brave young boy answered. +It was a pair of federal agents in vests and sunglasses. They unlocked and opened the door. +Sunshine and fresh air filled the room. +They told the brave young boy that thanks to an unexpected leak of classified information, the mean evil villain had just been arrested on charges of fraud, money laundering, unlawful espionage, corporate warfare, and 117 infractions of the Mumbai Accords. +In other words, they told him, the brave young boy was free to go. + + + +Example: +user +Write a letter detailing romantic activities, gestures, and reasons for compatibility with your crush in under 20 words. +model +Dear [Name], + +You're the most beautiful woman that I have ever set my eyes on. The stars themselves cannot compare to the light in your eyes. Aphrodite and Venus would bow down to your beauty and avert their eyes in shame at their inability to compare to you. + +You're my first thought in the morning, and my last at night. +I see you every night at the park, watching the night sky with your telescope. I like to watch the stars as well, even though they pale in comparison to you. We could go away together, visit a dark sky area to see the stars even better, away from the city lights that pollute the air and spoil your view. Just you, me, and a tent, miles away from anything that could distract us. +Or we could go to see the northern lights. I heard you say that it's your dream to go see those. We could book into one of those hotels, the ones that are camping pods with clear roofs and alarms to wake you up when the lights start. How romantic would that be, just us and the aurora borealis. +I would treat you like the queen, the goddess that you are. I'd worship the ground you walk on, defend you against any who dare disparage you, and make sure all know of your magnificence and glory. You would never need to work again, to sully your hands and looks with the drudgery of menial actions. +[Name], [Name], [Name], your name is as beautiful as you are. I want to spend the rest of my life with it being the first word off my lips on waking and the last before sleep. + +Love, +[Your Name] + + + +Example: +user +Is it the responsibility of the player to correctly identify their hand, or do the cards speak for themselves in poker? +model +Casinos in the US generally have the "cards speak" rule. That is, when a hand is properly tabled--exposed to everyone at showdown--it is the responsibility of all players, the dealer, floormen, and anyone else present to see that the pot is awarded to the best hand shown, regardless of what the players might say. + +Some players object to this rule: they believe that if a player or railbird says "Wait, Bob has a flush", that's unfairly helping a player. But the rule is clear: once the hand is tabled, others have not only the right but the obligation to help ensure the pot is awarded correctly. + +There are a couple of caveats: if a player only shows his hand to another player, or to the dealer, or to any other person, without properly tabling it for everyone, that player has no rights yet and those to whom the hand was shown should remain silent to protect other players' properly shown hands. But once the hand is tabled, it's fair game and everyone can and should assist in reading the hands. + +Finally, in some cases, if a floorman feels that a player has deliberately overcalled his hand to induce a player with a better hand to fold, he may penalize that player by awarding the pot to someone else, even possibly to a folded hand. Some California lowball clubs even had a rule that a player who remained silent rather than explicitly calling out a pair could be ruled against in this way. + + +num_epochs was set to 10 +wandb: Currently logged in as: k-duan (nus_keyu). Use `wandb login --relogin` to force relogin +wandb: wandb version 0.18.5 is available! To upgrade, please run: +wandb: $ pip install wandb --upgrade +wandb: Tracking run with wandb version 0.16.3 +wandb: Run data is saved locally in /home/aiops/duanky/llm-attacks/instruction_tuning_experiments/wandb/run-20241024_084829-pzwg2m9e +wandb: Run `wandb offline` to turn off syncing. +wandb: Syncing run gemma-2_natural_instruction_lima +wandb: ⭐️ View project at https://wandb.ai/nus_keyu/llm-attack +wandb: 🚀 View run at https://wandb.ai/nus_keyu/llm-attack/runs/pzwg2m9e + 0%| | 0/310 [00:00