Buckets:

hf-doc-build/doc / trl /v0.7.10 /en /index.html
download
raw
11.8 kB
<meta charset="utf-8" /><meta name="hf:doc:metadata" content="{&quot;title&quot;:&quot;TRL - Transformer Reinforcement Learning&quot;,&quot;local&quot;:&quot;trl---transformer-reinforcement-learning&quot;,&quot;sections&quot;:[{&quot;title&quot;:&quot;API documentation&quot;,&quot;local&quot;:&quot;api-documentation&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2},{&quot;title&quot;:&quot;Examples&quot;,&quot;local&quot;:&quot;examples&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2},{&quot;title&quot;:&quot;Blog posts&quot;,&quot;local&quot;:&quot;blog-posts&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2}],&quot;depth&quot;:1}">
<link href="/docs/trl/v0.7.10/en/_app/immutable/assets/0.e3b0c442.css" rel="modulepreload">
<link rel="modulepreload" href="/docs/trl/v0.7.10/en/_app/immutable/entry/start.d9a24ea1.js">
<link rel="modulepreload" href="/docs/trl/v0.7.10/en/_app/immutable/chunks/scheduler.9039eef2.js">
<link rel="modulepreload" href="/docs/trl/v0.7.10/en/_app/immutable/chunks/singletons.9eef12cc.js">
<link rel="modulepreload" href="/docs/trl/v0.7.10/en/_app/immutable/chunks/paths.1355483e.js">
<link rel="modulepreload" href="/docs/trl/v0.7.10/en/_app/immutable/entry/app.5bef33b8.js">
<link rel="modulepreload" href="/docs/trl/v0.7.10/en/_app/immutable/chunks/index.ded8f90d.js">
<link rel="modulepreload" href="/docs/trl/v0.7.10/en/_app/immutable/nodes/0.abccdcd8.js">
<link rel="modulepreload" href="/docs/trl/v0.7.10/en/_app/immutable/chunks/each.e59479a4.js">
<link rel="modulepreload" href="/docs/trl/v0.7.10/en/_app/immutable/nodes/9.dce0c585.js">
<link rel="modulepreload" href="/docs/trl/v0.7.10/en/_app/immutable/chunks/Heading.f027f30d.js"><!-- HEAD_svelte-u9bgzb_START --><meta name="hf:doc:metadata" content="{&quot;title&quot;:&quot;TRL - Transformer Reinforcement Learning&quot;,&quot;local&quot;:&quot;trl---transformer-reinforcement-learning&quot;,&quot;sections&quot;:[{&quot;title&quot;:&quot;API documentation&quot;,&quot;local&quot;:&quot;api-documentation&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2},{&quot;title&quot;:&quot;Examples&quot;,&quot;local&quot;:&quot;examples&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2},{&quot;title&quot;:&quot;Blog posts&quot;,&quot;local&quot;:&quot;blog-posts&quot;,&quot;sections&quot;:[],&quot;depth&quot;:2}],&quot;depth&quot;:1}"><!-- HEAD_svelte-u9bgzb_END --> <p></p> <div style="text-align: center" data-svelte-h="svelte-160pg0b"><img src="https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/trl_banner_dark.png"></div> <h1 class="relative group"><a id="trl---transformer-reinforcement-learning" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#trl---transformer-reinforcement-learning"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>TRL - Transformer Reinforcement Learning</span></h1> <p data-svelte-h="svelte-k9dkk3">TRL is a full stack library where we provide a set of tools to train transformer language models with Reinforcement Learning, from the Supervised Fine-tuning step (SFT), Reward Modeling step (RM) to the Proximal Policy Optimization (PPO) step.
The library is integrated with 🤗 <a href="https://github.com/huggingface/transformers" rel="nofollow">transformers</a>.</p> <div style="text-align: center" data-svelte-h="svelte-dcmq8o"><img src="https://huggingface.co/datasets/trl-internal-testing/example-images/resolve/main/images/TRL-readme.png"></div> <p data-svelte-h="svelte-113r82x">Check the appropriate sections of the documentation depending on your needs:</p> <h2 class="relative group"><a id="api-documentation" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#api-documentation"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>API documentation</span></h2> <ul data-svelte-h="svelte-wrckpz"><li><a href="models">Model Classes</a>: <em>A brief overview of what each public model class does.</em></li> <li><a href="sft_trainer"><code>SFTTrainer</code></a>: <em>Supervise Fine-tune your model easily with <code>SFTTrainer</code></em></li> <li><a href="reward_trainer"><code>RewardTrainer</code></a>: <em>Train easily your reward model using <code>RewardTrainer</code>.</em></li> <li><a href="ppo_trainer"><code>PPOTrainer</code></a>: <em>Further fine-tune the supervised fine-tuned model using PPO algorithm</em></li> <li><a href="best-of-n">Best-of-N Sampling</a>: <em>Use best of n sampling as an alternative way to sample predictions from your active model</em></li> <li><a href="dpo_trainer"><code>DPOTrainer</code></a>: <em>Direct Preference Optimization training using <code>DPOTrainer</code>.</em></li> <li><a href="text_environment"><code>TextEnvironment</code></a>: <em>Text environment to train your model using tools with RL.</em></li></ul> <h2 class="relative group"><a id="examples" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#examples"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Examples</span></h2> <ul data-svelte-h="svelte-a8fpa6"><li><a href="sentiment_tuning">Sentiment Tuning</a>: <em>Fine tune your model to generate positive movie contents</em></li> <li><a href="lora_tuning_peft">Training with PEFT</a>: <em>Memory efficient RLHF training using adapters with PEFT</em></li> <li><a href="detoxifying_a_lm">Detoxifying LLMs</a>: <em>Detoxify your language model through RLHF</em></li> <li><a href="using_llama_models">StackLlama</a>: <em>End-to-end RLHF training of a Llama model on Stack exchange dataset</em></li> <li><a href="learning_tools">Learning with Tools</a>: <em>Walkthrough of using <code>TextEnvironments</code></em></li> <li><a href="multi_adapter_rl">Multi-Adapter Training</a>: <em>Use a single base model and multiple adapters for memory efficient end-to-end training</em></li></ul> <h2 class="relative group"><a id="blog-posts" class="header-link block pr-1.5 text-lg no-hover:hidden with-hover:absolute with-hover:p-1.5 with-hover:opacity-0 with-hover:group-hover:opacity-100 with-hover:right-full" href="#blog-posts"><span><svg class="" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" aria-hidden="true" role="img" width="1em" height="1em" preserveAspectRatio="xMidYMid meet" viewBox="0 0 256 256"><path d="M167.594 88.393a8.001 8.001 0 0 1 0 11.314l-67.882 67.882a8 8 0 1 1-11.314-11.315l67.882-67.881a8.003 8.003 0 0 1 11.314 0zm-28.287 84.86l-28.284 28.284a40 40 0 0 1-56.567-56.567l28.284-28.284a8 8 0 0 0-11.315-11.315l-28.284 28.284a56 56 0 0 0 79.196 79.197l28.285-28.285a8 8 0 1 0-11.315-11.314zM212.852 43.14a56.002 56.002 0 0 0-79.196 0l-28.284 28.284a8 8 0 1 0 11.314 11.314l28.284-28.284a40 40 0 0 1 56.568 56.567l-28.285 28.285a8 8 0 0 0 11.315 11.314l28.284-28.284a56.065 56.065 0 0 0 0-79.196z" fill="currentColor"></path></svg></span></a> <span>Blog posts</span></h2> <div class="mt-10" data-svelte-h="svelte-8ufs3j"><div class="w-full flex flex-col space-y-4 md:space-y-0 md:grid md:grid-cols-2 md:gap-y-4 md:gap-x-5"><a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="https://huggingface.co/blog/rlhf"><img src="https://raw.githubusercontent.com/huggingface/blog/main/assets/120_rlhf/thumbnail.png" alt="thumbnail"> <p class="text-gray-700">Illustrating Reinforcement Learning from Human Feedback</p></a> <a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="https://huggingface.co/blog/trl-peft"><img src="https://github.com/huggingface/blog/blob/main/assets/133_trl_peft/thumbnail.png?raw=true" alt="thumbnail"> <p class="text-gray-700">Fine-tuning 20B LLMs with RLHF on a 24GB consumer GPU</p></a> <a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="https://huggingface.co/blog/stackllama"><img src="https://github.com/huggingface/blog/blob/main/assets/138_stackllama/thumbnail.png?raw=true" alt="thumbnail"> <p class="text-gray-700">StackLLaMA: A hands-on guide to train LLaMA with RLHF</p></a> <a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="https://huggingface.co/blog/dpo-trl"><img src="https://github.com/huggingface/blog/blob/main/assets/157_dpo_trl/dpo_thumbnail.png?raw=true" alt="thumbnail"> <p class="text-gray-700">Fine-tune Llama 2 with DPO</p></a> <a class="!no-underline border dark:border-gray-700 p-5 rounded-lg shadow hover:shadow-lg" href="https://huggingface.co/blog/trl-ddpo"><img src="https://github.com/huggingface/blog/blob/main/assets/166_trl_ddpo/thumbnail.png?raw=true" alt="thumbnail"> <p class="text-gray-700">Finetune Stable Diffusion Models with DDPO via TRL</p></a></div></div> <p></p>
<script>
{
__sveltekit_78hn1s = {
assets: "/docs/trl/v0.7.10/en",
base: "/docs/trl/v0.7.10/en",
env: {}
};
const element = document.currentScript.parentElement;
const data = [null,null];
Promise.all([
import("/docs/trl/v0.7.10/en/_app/immutable/entry/start.d9a24ea1.js"),
import("/docs/trl/v0.7.10/en/_app/immutable/entry/app.5bef33b8.js")
]).then(([kit, app]) => {
kit.start(app, element, {
node_ids: [0, 9],
data,
form: null,
error: null
});
});
}
</script>

Xet Storage Details

Size:
11.8 kB
·
Xet hash:
ddb7b5f7941bae4bf8379b9e1c2f5094aa803ac17ac6e0e1e795f4b7714d260b

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.