Buckets:

rtrm's picture
download
raw
58 kB
# Ml-For-3D-Course
## Docs
- [What is it?](https://huggingface.co/learn/ml-for-3d-course/unit2/what-is-it.md)
- [Setup](https://huggingface.co/learn/ml-for-3d-course/unit2/setup.md)
- [Bonus](https://huggingface.co/learn/ml-for-3d-course/unit2/bonus.md)
- [Introduction](https://huggingface.co/learn/ml-for-3d-course/unit2/introduction.md)
- [Hands-on (Part 1)](https://huggingface.co/learn/ml-for-3d-course/unit2/hands-on-1.md)
- [Pipeline](https://huggingface.co/learn/ml-for-3d-course/unit2/pipeline.md)
- [Hands-on (Part 2)](https://huggingface.co/learn/ml-for-3d-course/unit2/hands-on-2.md)
- [Hands-on](https://huggingface.co/learn/ml-for-3d-course/unit3/hands-on.md)
- [What is it?](https://huggingface.co/learn/ml-for-3d-course/unit3/what-is-it.md)
- [Bonus](https://huggingface.co/learn/ml-for-3d-course/unit3/bonus.md)
- [Introduction](https://huggingface.co/learn/ml-for-3d-course/unit3/introduction.md)
- [Hands-on](https://huggingface.co/learn/ml-for-3d-course/unit4/hands-on.md)
- [Marching Cubes](https://huggingface.co/learn/ml-for-3d-course/unit4/marching-cubes.md)
- [Introduction](https://huggingface.co/learn/ml-for-3d-course/unit4/introduction.md)
- [Mesh Generation](https://huggingface.co/learn/ml-for-3d-course/unit4/mesh-generation.md)
- [Run locally](https://huggingface.co/learn/ml-for-3d-course/unit5/run-locally.md)
- [Walkthrough](https://huggingface.co/learn/ml-for-3d-course/unit5/walkthrough.md)
- [Run in notebook](https://huggingface.co/learn/ml-for-3d-course/unit5/run-in-notebook.md)
- [Introduction](https://huggingface.co/learn/ml-for-3d-course/unit5/introduction.md)
- [Run via API](https://huggingface.co/learn/ml-for-3d-course/unit5/run-via-api.md)
- [Conclusion](https://huggingface.co/learn/ml-for-3d-course/conclusion/conclusion.md)
- [Non-meshes](https://huggingface.co/learn/ml-for-3d-course/unit1/non-meshes.md)
- [Generative 3D pipelines](https://huggingface.co/learn/ml-for-3d-course/unit1/pipelines.md)
- [Introduction](https://huggingface.co/learn/ml-for-3d-course/unit1/introduction.md)
- [Meshes](https://huggingface.co/learn/ml-for-3d-course/unit1/meshes.md)
- [What's going on?](https://huggingface.co/learn/ml-for-3d-course/unit0/whats-going-on.md)
- [Welcome to the 🤗 Machine Learning for 3D Course [[introduction]]](https://huggingface.co/learn/ml-for-3d-course/unit0/introduction.md)
- [How to do it yourself](https://huggingface.co/learn/ml-for-3d-course/unit0/how-to-do-it-yourself.md)
- [Why does it matter?](https://huggingface.co/learn/ml-for-3d-course/unit0/why-does-it-matter.md)
### What is it?
https://huggingface.co/learn/ml-for-3d-course/unit2/what-is-it.md
# What is it?
Multi-view diffusion is a type of diffusion model (e.g. [Stable Diffusion](https://huggingface.co/blog/stable_diffusion)). However, instead of being trained on regular images, it's trained on multiple views of an object from different perspectives.
![Multi-view diffusion](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/ml-for-3d-course/multiview.webp)
## Problems
Multi-view diffusion is useful for 3D.
However, it doesn't work out-of-the-box. It tends to suffer from something called the Janus problem, where objects have multiple faces (or, more generally, lack of consistency across views).
![Janus problem](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/ml-for-3d-course/janus.png)
## Solutions
State-of-the-art multi-view-diffusion models like [MVDream](https://huggingface.co/MVDream/MVDream) address this problem using specialized techniques.
I won't be going into technical detail on multi-view diffusion in this course, since it's more related to diffusion than to 3D, but if you'd like to learn more, check out the [Diffusion course](https://huggingface.co/learn/diffusion-course/unit0/1).
In the next sections, we'll be using a pre-trained multi-view diffusion model to generate multi-view images.
### Setup
https://huggingface.co/learn/ml-for-3d-course/unit2/setup.md
# Setup
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/dylanebert/ml-for-3d-course-notebooks/blob/main/multi_view_diffusion.ipynb)
Click the link above to open a Colab notebook with the code for this unit.
## Runtime
Multi-view diffusion, as well as other units in this course, require GPU.
In the notebook, click `Runtime` -> `Change runtime type` and select `GPU` as the hardware accelerator.
![Change runtime type](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/ml-for-3d-course/change-runtime-type.png)
Alternatively, you can run the notebook on your local machine if you have a GPU.
## Prerequisities
To run any code block in the notebook, click the `Play` button on the left side of the block.
We'll start by installing all necessary dependencies:
`pip install -r https://huggingface.co/spaces/dylanebert/multi-view-diffusion/raw/main/requirements.txt`
If the notebook asks you to restart the session, do so, then rerun the code block. You should see a green checkmark next to the code block if everything is installed correctly.
### Bonus
https://huggingface.co/learn/ml-for-3d-course/unit2/bonus.md
# Bonus
This unit used multi-view diffusion as an example to get familiar with the model ecosystem at a high level.
However, multi-view diffusion is just one of many available models used in generative 3D tasks.
Due to the rapid pace of progress in generative 3D, I want to emphasize the importance of getting comfortable with the model ecosystem, allowing you to keep up with the latest research and tools.
To do so, here are some exercises to help you get started:
1. **Explore the Model Hub**: Check out the [Model Hub](https://huggingface.co/models) to see what models are available. You can filter by task, framework, and more.
2. **Customize your Demo**: In the hands-on, we created a Gradio demo for multi-view diffusion. Try customizing it by, for example, adding a [Slider](https://www.gradio.app/docs/gradio/slider) to control the `elevation` parameter.
3. **Create your own Model**: If you aren't familiar with machine learning concepts, follow the [NLP Course](https://huggingface.co/learn/nlp-course/chapter1/1). Even if you aren't interested in NLP, this course provides an in-depth introduction to machine learning concepts.
In the next unit, we'll be diving into the specifics of Gaussian Splatting, an ML-friendly 3D representation and recent hot topic in 3D research.
### Introduction
https://huggingface.co/learn/ml-for-3d-course/unit2/introduction.md
# Introduction
## Multi-view diffusion
The first part of the pipeline.
Not all generative 3D pipelines use this, and it's more related to diffusion than to 3D. And [there's already a course on that](https://huggingface.co/learn/diffusion-course/unit0/1).
In this section, I'll keep it high-level, focusing on tools and ecosystem, so you can set up your own multi-view diffusion demo.
### Hands-on (Part 1)
https://huggingface.co/learn/ml-for-3d-course/unit2/hands-on-1.md
# Hands-on (Part 1)
Time to host your own demo! In this portion, you will:
1. Create a model on Hugging Face.
2. Upload the necessary files to the model repository.
## Create a Model
Start by going to [huggingface.co](https://huggingface.co) and logging in or creating an account.
Then, click on `New` -> `Model` in the top left corner. Enter a model name, then click `Create model`.
Your model will be created at the url `https://huggingface.co//`.
## Access Tokens
For security reasons, you'll need to create an access token to upload files to your model.
Go to [huggingface.co/settings/tokens](https://huggingface.co/settings/tokens) and create a new access token with `write` access.
When git asks for your username and password, use your username and the access token as the password.
## Upload Files
Download [git](https://git-scm.com/downloads) if you don't have it.
Open a terminal. Then, clone my existing [multi-view-diffusion](https://huggingface.co/dylanebert/multi-view-diffusion) model repository:
```bash
git clone https://huggingface.co/dylanebert/multi-view-diffusion
cd multi-view-diffusion
```
Then, upload these files to your model repository:
```bash
git remote set-url origin https://huggingface.co//
git push
```
Replacing `` and `` with your username and model name.
Congratulations! If everything worked correctly, you should see your files at `https://huggingface.co//`.
### Pipeline
https://huggingface.co/learn/ml-for-3d-course/unit2/pipeline.md
# Pipeline
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/dylanebert/ml-for-3d-course-notebooks/blob/main/multi_view_diffusion.ipynb)
In our case, we'll be using a pretrained pipeline:
```python
import torch
from diffusers import DiffusionPipeline
multi_view_diffusion_pipeline = DiffusionPipeline.from_pretrained(
"dylanebert/multi-view-diffusion",
custom_pipeline="dylanebert/multi-view-diffusion",
torch_dtype=torch.float16,
trust_remote_code=True,
).to("cuda")
```
The name of the model is [dylanebert/multi-view-diffusion](https://huggingface.co/dylanebert/multi-view-diffusion), a mirror of [ashawkey/mvdream-sd2.1-diffusers](https://huggingface.co/ashawkey/mvdream-sd2.1-diffusers). For any pretrained model, you can find the model card on the Hugging Face Hub at `https://huggingface.co/`, which contains information about the model.
In our case, we also need to load the custom pipeline (also at `dylanebert/multi-view-diffusion`) to use the model. This is because diffusers doesn't officially support 3D. So, for the purposes of this course, I've wrapped the model in a custom pipeline that allows you to use it for 3D tasks.
## Load an Image
```python
import requests
from PIL import Image
from io import BytesIO
image_url = "https://huggingface.co/datasets/dylanebert/3d-arena/resolve/main/inputs/images/a_cat_statue.jpg"
response = requests.get(image_url)
image = Image.open(BytesIO(response.content))
image
```
![Cat Statue](https://huggingface.co/datasets/dylanebert/3d-arena/resolve/main/inputs/images/a_cat_statue.jpg)
With this code, we load and display the famous [Cat Statue](https://huggingface.co/datasets/dylanebert/3d-arena/resolve/main/inputs/images/a_cat_statue.jpg), used for image-to-3D demos.
## Run the Pipeline
```python
import numpy as np
def create_image_grid(images):
images = [Image.fromarray((img * 255).astype("uint8")) for img in images]
width, height = images[0].size
grid_img = Image.new("RGB", (2 * width, 2 * height))
grid_img.paste(images[0], (0, 0))
grid_img.paste(images[1], (width, 0))
grid_img.paste(images[2], (0, height))
grid_img.paste(images[3], (width, height))
return grid_img
image = np.array(image, dtype=np.float32) / 255.0
images = multi_view_diffusion_pipeline("", image, guidance_scale=5, num_inference_steps=30, elevation=0)
create_image_grid(images)
```
Finally, we run the pipeline on the image.
The `create_image_grid` function isn't part of the pipeline. It's just a helper function to display the results in a grid.
To run the pipeline, we simply prepare the image by converting it to a normalized numpy array:
`image = np.array(image, dtype=np.float32) / 255.0`
Then, we pass it to the pipeline:
`images = multi_view_diffusion_pipeline("", image, guidance_scale=5, num_inference_steps=30, elevation=0)`
Where parameters `guidance_scale`, `num_inference_steps`, and `elevation` are specific to the multi-view diffusion model.
![Multi-view Cats](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/ml-for-3d-course/multi-view-cats.png)
## Conclusion
Congratulations! You've run a multi-view diffusion pipeline.
Now what about hosting your own demo?
### Hands-on (Part 2)
https://huggingface.co/learn/ml-for-3d-course/unit2/hands-on-2.md
# Hands-on (Part 2)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/dylanebert/ml-for-3d-course-notebooks/blob/main/multi_view_diffusion.ipynb)
Time to host your own demo! In this portion, you will:
1. Re-run the notebook with your own model.
2. Create a demo using Gradio.
3. (Optional) Deploy your demo.
## Re-Run the Notebook
In the notebook, replace the model name with your own model name:
```python
import torch
from diffusers import DiffusionPipeline
multi_view_diffusion_pipeline = DiffusionPipeline.from_pretrained(
"/",
custom_pipeline="dylanebert/multi-view-diffusion",
torch_dtype=torch.float16,
trust_remote_code=True,
).to("cuda")
```
Then, re-run the notebook. You should see the same results as before.
![Multi-view Cats](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/ml-for-3d-course/multi-view-cats.png)
## Gradio Demo
Now, let's create a Gradio demo:
```python
import gradio as gr
def run(image):
image = np.array(image, dtype=np.float32) / 255.0
images = multi_view_diffusion_pipeline("", image, guidance_scale=5, num_inference_steps=30, elevation=0)
images = [Image.fromarray((img * 255).astype("uint8")) for img in images]
width, height = images[0].size
grid_img = Image.new("RGB", (2 * width, 2 * height))
grid_img.paste(images[0], (0, 0))
grid_img.paste(images[1], (width, 0))
grid_img.paste(images[2], (0, height))
grid_img.paste(images[3], (width, height))
return grid_img
demo = gr.Interface(fn=run, inputs="image", outputs="image")
demo.launch()
```
The `run` method combines all the code from earlier in a single function. The `gr.Interface` method then uses this function to create a demo with `image` inputs and `image` outputs.
Congratulations! You've created a Gradio demo for your model.
## (Optional) Deploy Your Demo
You probably want to run your demo outside of Colab.
There are many ways to do this:
### Option 1: Create a Space
Go to [Hugging Face Spaces](https://huggingface.co/spaces) and create a new Space. Choose the `Gradio Space SDK`. Create a new file in the Space called `app.py` and paste the code from the Gradio demo. Copy the demo [requirements.txt](https://huggingface.co/spaces/dylanebert/multi-view-diffusion/raw/main/requirements.txt) into the Space.
For a complete example, check out this [Space](https://huggingface.co/spaces/dylanebert/multi-view-diffusion), then click `Files` in the top right to view the source code.
> Note: This approach requires a GPU to host publicly, which costs money. However, you can run the demo locally for free, following the instructions in [Option 3](#option-3-run-locally).
### Option 2: Gradio Deploy
Gradio makes it easy to deploy your demo to a server using the `gradio deploy` command.
For more details, check out the [Gradio documentation](https://www.gradio.app/guides/sharing-your-app).
### Option 3: Run locally
To run locally, simply copy the code into a Python file and run it on your machine.
The full source file should look like this:
```python
import gradio as gr
import numpy as np
import torch
from diffusers import DiffusionPipeline
from PIL import Image
multi_view_diffusion_pipeline = DiffusionPipeline.from_pretrained(
"dylanebert/multi-view-diffusion",
custom_pipeline="dylanebert/multi-view-diffusion",
torch_dtype=torch.float16,
trust_remote_code=True,
).to("cuda")
def run(image):
image = np.array(image, dtype=np.float32) / 255.0
images = multi_view_diffusion_pipeline(
"", image, guidance_scale=5, num_inference_steps=30, elevation=0
)
images = [Image.fromarray((img * 255).astype("uint8")) for img in images]
width, height = images[0].size
grid_img = Image.new("RGB", (2 * width, 2 * height))
grid_img.paste(images[0], (0, 0))
grid_img.paste(images[1], (width, 0))
grid_img.paste(images[2], (0, height))
grid_img.paste(images[3], (width, height))
return grid_img
demo = gr.Interface(fn=run, inputs="image", outputs="image")
demo.launch()
```
To set up and run this demo in a virtual Python environment, run the following:
```bash
# Setup
python -m venv venv
source venv/bin/activate
pip install -r https://huggingface.co/spaces/dylanebert/multi-view-diffusion/raw/main/requirements.txt
# Run
python app.py
```
> Note: This was tested using Python 3.10.12 and CUDA 12.1 on an NVIDIA RTX 4090.
### Hands-on
https://huggingface.co/learn/ml-for-3d-course/unit3/hands-on.md
# Hands-on
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/dylanebert/ml-for-3d-course-notebooks/blob/main/gaussian_splatting.ipynb)
The goal of this hands-on is to build a text-to-splat pipeline, using [LGM](https://huggingface.co/spaces/dylanebert/LGM-mini) (Large Gaussian Model) as an example.
This consists of two parts of the generative 3D pipeline:
1. Multi-view Diffusion
2. ML-friendly 3D (Gaussian Splatting)
## Setup
Open the Colab notebook linked above. Click `Runtime` -> `Change runtime type` and select `GPU` as the hardware accelerator.
Then, start by installing the necessary dependencies:
```python
!pip install -r https://huggingface.co/spaces/dylanebert/LGM-mini/raw/main/requirements.txt
!pip install https://huggingface.co/spaces/dylanebert/LGM-mini/resolve/main/wheel/diff_gaussian_rasterization-0.0.0-cp310-cp310-linux_x86_64.whl
```
As before, if the notebook asks you to restart the session, do so, then rerun the code block.
## Load the Models
Just like in the multi-view diffusion notebook, load the pretrained multi-view diffusion model:
```python
import torch
from diffusers import DiffusionPipeline
image_pipeline = DiffusionPipeline.from_pretrained(
"dylanebert/multi-view-diffusion",
custom_pipeline="dylanebert/multi-view-diffusion",
torch_dtype=torch.float16,
trust_remote_code=True,
).to("cuda")
```
This is because multi-view diffusion is the first step in the LGM pipeline.
Then, load the generative Gaussian Splatting model, the main contribution of LGM:
```python
splat_pipeline = DiffusionPipeline.from_pretrained(
"dylanebert/LGM",
custom_pipeline="dylanebert/LGM",
torch_dtype=torch.float16,
trust_remote_code=True,
).to("cuda")
```
## Load an Image
As before, load the famous Cat Statue image:
```python
import requests
from PIL import Image
from io import BytesIO
image_url = "https://huggingface.co/datasets/dylanebert/3d-arena/resolve/main/inputs/images/a_cat_statue.jpg"
response = requests.get(image_url)
image = Image.open(BytesIO(response.content))
image
```
## Run the Pipeline
Finally, pass the image through both pipelines. The output will be a matrix of splat data, which can be saved with `splat_pipeline.save_ply()`.
```python
import numpy as np
from google.colab import files
input_image = np.array(image, dtype=np.float32) / 255.
multi_view_images = image_pipeline("", input_image, guidance_scale=5, num_inference_steps=30, elevation=0)
```
![Multi-view Cats](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/ml-for-3d-course/multi-view-cats.png)
```python
splat = splat_pipeline(multi_view_images)
output_path = "/tmp/output.ply"
splat_pipeline.save_ply(splat, output_path)
files.download(output_path)
```
This includes `files.download()` to download the file to your local machine when running the notebook in Colab. If you're running the notebook locally, you can remove this line.
Congratulations! You've run the LGM pipeline.
## Gradio Demo
Now, let's create a Gradio demo to run the model end-to-end with an easy-to-use interface:
```python
import gradio as gr
def run(image):
input_image = image.astype("float32") / 255.0
images = image_pipeline("", input_image, guidance_scale=5, num_inference_steps=30, elevation=0)
splat = splat_pipeline(images)
output_path = "/tmp/output.ply"
splat_pipeline.save_ply(splat, output_path)
return output_path
demo = gr.Interface(fn=run, inputs="image", outputs=gr.Model3D())
demo.launch()
```
This will create a Gradio demo that takes an image as input and outputs a 3D splat.
### What is it?
https://huggingface.co/learn/ml-for-3d-course/unit3/what-is-it.md
# What is it?
Gaussian Splatting is a **differentiable rasterization technique**.
## Differentiable Rasterization
In simple terms:
- Differentiable can be thought of as a fancy way to say "AI-compatible"
- Rasterization means taking data and drawing it on the screen
Rasterization is already really common. It usually takes the form of [triangle rasterization](https://en.wikipedia.org/wiki/Rasterisation), where 3D data is converted to 2D pixel data and drawn on the screen. That's how meshes are usually rendered.
![Mesh](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/ml-for-3d-course/mesh.png)
However, triangle rasterization isn't very AI-compatible. This is because it includes discrete decisions like:
- Is this pixel inside the triangle?
Neural networks don't like discrete decisions. They want everything to be fuzzy and continous - or in other words, *differentiable*.
## Gaussian Splatting
Gaussian Splatting is a differentiable rasterization technique. But how does it actually work?
Splats are composed of millions of points, where each point is composed of four parameters:
- **Position**: where it's located (XYZ)
- **Covariance**: how it's stretched (3x3 matrix)
- **Color**: what color it is (RGB)
- **Alpha**: how transparent it is (α)
Then, to rasterize a splat, these points are projected into 2D. Then, for every pixel, contribute the contribution of every point. Or, in pseudocode:
```python
splat2d = splat.project_and_sort()
for point in splat2d:
for pixel in image:
pixel += compute_contribution(point, pixel)
```
The contribution of a point diminishes the further it is from the pixel. The points also need to be sorted, since they are blended back-to-front.
In theory, every point contributes to every pixel, which is very inefficient. However, that's okay, because it's *differentiable*.
In practice, this is optimized with a tile-based rasterization method, as detailed in the [original paper](https://huggingface.co/papers/2308.04079).
## Inference
If you're not training a model, then it doesn't matter if it's differentiable. You can just treat each point as an instanced quad, as in open-source web viewers like [gsplat.js](https://github.com/huggingface/gsplat.js).
This can be seen in action [here](https://huggingface.co/spaces/dylanebert/igf).
## Training
The [original paper](https://huggingface.co/papers/2308.04079) intializes the points using [Structure-from-Motion](https://en.wikipedia.org/wiki/Structure_from_motion), a traditional algorithm for 3D reconstruction.
![Structure from Motion](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/124_ml-for-games/gaussian/points.png)
These points are then rasterized using the tile-based method, and the loss is computed by comparing the rasterized image to the ground truth. Gradient descent is applies to adjust the point parameters (position, covariance, color, alpha).
![Trained](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/124_ml-for-games/gaussian/ellipsoids.png)
The original paper also uses automated densification and pruning to automatically add and remove points as needed. More details can be found [here](https://huggingface.co/blog/gaussian-splatting).
![Final](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/blog/124_ml-for-games/gaussian/bicycle.png)
## Generative 3D
The original approach is suitable for learning individual scenes from photos. However, the concept of differentiable rasterization generalizes to more complex models like neural networks.
This is the case with generative 3D models like [LGM](https://huggingface.co/spaces/dylanebert/LGM-mini), which we'll be using in the next section to build our own generative 3D demo.
### Bonus
https://huggingface.co/learn/ml-for-3d-course/unit3/bonus.md
# Bonus
This unit took a closer look at Gaussian Splatting, a differentiable rasterization technique that also serves as an ML-friendly 3D representation. While this is a special case due to its real-time rendering capabilities, it's one of many representations used in generative 3D research.
Most recently, triplanes have emerged as the latest state-of-the-art in final mesh quality, used in pipelines like [InstantMesh](https://huggingface.co/spaces/TencentARC/InstantMesh). You can learn more about them in this [Community Notebook](https://colab.research.google.com/github/FeMa42/OpenLRM/blob/main/Introduction_to_triplanes_colab.ipynb) provided by Damian.
If you're interested in learning more about Gaussian Splatting, here are some pointers:
- [Nerfstudio gsplat](https://github.com/nerfstudio-project/gsplat): An open-source implementation of Gaussian Splatting
- [gsplat.js](https://github.com/huggingface/gsplat.js): An open-source JavaScript library for splat rendering
- [UnityGaussianSplatting](https://github.com/aras-p/UnityGaussianSplatting): A Unity implementation of Gaussian Splatting (don't be fooled by it being labeled a "toy" - it's a powerful tool!)
- In the capstone of this course, you'll have the option of targeting meshes or Gaussian Splatting as the final output of your generative model
In the next unit, we'll be diving into meshes, the representation used in 3D applications everywhere.
### Introduction
https://huggingface.co/learn/ml-for-3d-course/unit3/introduction.md
# Introduction
## ML-friendly 3D
Let's take a step back and consider the generative 3D pipeline as a whole.
![3D Pipeline](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/ml-for-3d-course/3d-pipeline.png)
After multi-view diffusion comes ML-friendly 3D. This is some non-mesh representation of 3D that's easy for AI to handle.
In the current 3D research ecosystem, this can be a lot of things:
- **Gaussian Splatting**: Detailed in this unit
- **Triplanes**: Latest state-of-the-art. Learn more in this [Community Notebook](https://colab.research.google.com/github/FeMa42/OpenLRM/blob/main/Introduction_to_triplanes_colab.ipynb)
- **NeRFs**: Synthesize novel views with a neural network
- and more
This is all changing very rapidly. So, like in the case of multi-view diffusion, ML-friendly 3D can be treated like a black box, using [pre-trained models](https://huggingface.co/models?pipeline_tag=image-to-3d&sort=trending).
## Gaussian Splatting
In this unit, I'll be diving deeper into one of these: Gaussian Splatting.
The reason I'm diving deeper into this is that, unlike the other representations, splats can be [rendered in real-time](https://huggingface.co/spaces/dylanebert/4DGS-demo), making them suitable for end-to-end 3D applications where everything is AI-compatible.
Let's get started!
### Hands-on
https://huggingface.co/learn/ml-for-3d-course/unit4/hands-on.md
# Hands-on
Initially, we planned to walk through the Marching Cubes algorithm and apply it to the [LGM Demo](https://huggingface.co/spaces/dylanebert/LGM-mini). However, recent advancements in mesh generation have made this approach less relevant.
While a deep dive into the methods behind [MeshAnything](https://huggingface.co/spaces/Yiwen-ntu/MeshAnything) would be much more pertinent, its newness and [non-commercial license](https://github.com/buaacyw/MeshAnything/blob/main/LICENSE.txt) make it suboptimal for the time being.
Instead, here are some resources based on your goals:
- [Splat to Mesh](https://huggingface.co/spaces/dylanebert/splat-to-mesh): If you followed along with with the LGM-based activities and want to produce the final mesh, this open source demo is based on the original [LGM](https://github.com/3DTopia/LGM) codebase. Note that this method is slow and resource-intensive.
- [InstantMesh](https://huggingface.co/spaces/TencentARC/InstantMesh): This is fast and state-of-the-art approach uses FlexiCubes to produce the final mesh. It currently ranks toward the top of the [3D Arena](https://huggingface.co/spaces/dylanebert/3d-arena) leaderboard.
- [meshgpt-pytorch](https://github.com/lucidrains/meshgpt-pytorch): This open source reimplementation of [MeshGPT](https://huggingface.co/papers/2311.15475) provides a good starting point for open-source differentiable mesh generation. [MeshAnything](https://huggingface.co/papers/2311.15475) builds upon MeshGPT. Note: This implementation only provides architecture, not weights.
These resources should help you continue exploring mesh generation and its most recent advancements.
### Marching Cubes
https://huggingface.co/learn/ml-for-3d-course/unit4/marching-cubes.md
# Marching Cubes
Marching Cubes is an algorithm that converts a volumetric representation to a dense mesh.
![Torus Point Cloud](https://huggingface.co/datasets/dylanebert/ml-for-3d-course/resolve/main/torus.gif)
1. **Divide the Space into Voxels:** Split the 3D space into a grid of voxels (cubic cells). The size of each voxel determines the mesh resolution.
2. **Sample the Eight Vertex Positions:** For each voxel, sample the density at the eight vertices (corners). Determine if each vertex is inside or outside the surface based on its density.
3. **Determine the Triangle Configuration:** Each voxel has eight vertices, each with two possible states (inside or outside), yielding 256 possible configurations. Each configuration corresponds to a specific triangulation pattern.
Let's walk through these steps in more detail.
### 1. Divide the Space into Voxels
The first step is to divide the 3D space into a grid of voxels. The size of each voxel will determine the resolution of the mesh.
### 2. Sample the Eight Vertex Positions
For each voxel, the algorithm samples the density at the eight vertices. Depending on the density, each vertex is classified as either `inside` or `outside` the surface.
### 3. Determine the Triangle Configuration
Each voxel's eight vertices can be in two possible states, resulting in $2^8 = 256$ possible configurations. Each configuration corresponds to a specific triangulation pattern.
![Marching Cubes Lookup](https://huggingface.co/datasets/dylanebert/ml-for-3d-course/resolve/main/MarchingCubesCases.png)
### 4. Generate the Mesh
To generate the final mesh, the algorithm "marches" through each voxel and applies the corresponding triangle configuration, hence the name "Marching Cubes".
![Marching Cubes Head](https://huggingface.co/datasets/dylanebert/ml-for-3d-course/resolve/main/Marchingcubes-head.png)
This process produces a dense, rough mesh that approximates the surface of the volume.
## Limitations
While useful for visualization, Marching Cubes meshes are mostly unsuitable for productions like games due to several limiting factors:
1. **Polygon Count:** High polygon meshes require more computation for rendering, which can significantly impact performance in real-time applications.
2. **Edge Flow:** Poor edge flow affects how the mesh deforms when animated, resulting in undesirable artifacts like creasing and pinching.
3. **Texturing:** The dense and irregular topology complicates UV mapping and texturing, resulting in texture artifacts.
![Mesh Topology](https://huggingface.co/datasets/dylanebert/ml-for-3d-course/resolve/main/topology.jpg)
### Improvements
Techniques like [FlexiCubes](https://research.nvidia.com/labs/toronto-ai/flexicubes/) address some limitations by allowing the mesh vertices to move, creating smoother surfaces. This approach is used by [InstantMesh](https://huggingface.co/spaces/TencentARC/InstantMesh), the current [leading](https://huggingface.co/spaces/dylanebert/3d-arena) open-source 3D pipeline. However, the resulting meshes remain overly dense and impractical for production.
## In Practice
Cleaning up the topology of Marching Cubes output often requires more time and effort than creating a mesh from scratch. This creates a major bottleneck for ML for 3D applications. Gaussian Splatting, as discussed earlier, offers a potential solution to this bottleneck.
However, recent work has emerged that directly addresses this bottleneck, using differentiable techniques to produce low-poly meshes with higher-quality topology. This will be covered in the next section.
### Introduction
https://huggingface.co/learn/ml-for-3d-course/unit4/introduction.md
# Introduction
## Meshes
Let's revisit the generative 3D pipeline.
![3D Pipeline](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/ml-for-3d-course/3d-pipeline.png)
Earlier in the course, we discussed the limitations of the mesh side of the pipeline, particularly with Marching Cubes, which produces dense and rough meshes unsuitable for production.
![Marching Cubes Head](https://huggingface.co/datasets/dylanebert/ml-for-3d-course/resolve/main/Marchingcubes-head.png)
This is no longer the full story.
Recent advancements have emerged, allowing the conversion of dense meshes to low-poly meshes differentiably.
![MeshAnything Demo](https://huggingface.co/datasets/dylanebert/ml-for-3d-course/resolve/main/meshanything-demo.gif)
In this unit of the course, we'll cover:
- Marching Cubes
- Mesh Generation
Let's get started!
### Mesh Generation
https://huggingface.co/learn/ml-for-3d-course/unit4/mesh-generation.md
# Mesh Generation
New solutions have emerged that address the limitations of Marching Cubes, allowing the conversion of dense meshes to low-poly meshes.
Earlier in the course, we highlighted the significance of **differentiability** and how mesh rendering is non-differentiable, involving discrete decisions like:
- Is this pixel inside the triangle?
While this remains true, new research introduces a differentiable approach to mesh generation by treating mesh triangles as discrete symbols, similar to words in a language model.
## MeshAnything
[MeshAnything](https://huggingface.co/spaces/Yiwen-ntu/MeshAnything) is a recent research project that converts dense meshes to low-poly meshes using techniques introduced in [MeshGPT](https://huggingface.co/papers/2311.15475).
The main components of MeshAnything are:
1. **VQ-VAE Encoder:** Encodes dense 3D data to a discrete latent representation using a Vector Quantization (VQ) Variational Autoencoder (VAE).
2. **Autoregressive Transformer Decoder:** Generates the triangles of the mesh using an autoregressive transformer decoder.
More details can be found in the paper [MeshAnything: Artist-Created Mesh Generation with Autoregressive Transformers](https://huggingface.co/papers/2406.10163).
![MeshAnything Demo](https://huggingface.co/datasets/dylanebert/ml-for-3d-course/resolve/main/meshanything-demo.gif)
## Implications
MeshAnything represents a major step in 3D mesh generation, addressing a major bottleneck toward practical generative 3D tools.
However, the current results are comparable to or worse than traditional topology reduction methods like [Decimate](https://docs.blender.org/manual/en/latest/modeling/modifiers/generate/decimate.html), still requiring significant manual refinement.
Differentiable mesh generation opens the door to context-aware topology reduction, i.e. accounting for shape and deformation. While still a work in progress, solving this will lead to highly practical 3D tools.
### Run locally
https://huggingface.co/learn/ml-for-3d-course/unit5/run-locally.md
# Run locally
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/dylanebert/ml-for-3d-course-notebooks/blob/main/capstone.ipynb)
The instructions below are tested on an RTX 4090 on WSL2 Ubuntu 22.04. Instructions will differ and may not work, depending on your setup.
1. Install `git`, `python 3.10`, and `cuda` if not already installed.
2. Open your terminal.
3. Clone your space repository, replacing the URL below with your space URL.
```bash
git clone https://huggingface.co/spaces/dylanebert/LGM-tiny
```
4. Navigate into the space folder.
```bash
cd LGM-tiny
```
5. Create a virtual environment and install necessary dependencies.
```bash
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
```
6. Run the demo.
```bash
python app.py
```
### Walkthrough
https://huggingface.co/learn/ml-for-3d-course/unit5/walkthrough.md
# Walkthrough
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/dylanebert/ml-for-3d-course-notebooks/blob/main/capstone.ipynb)
This section will walk you step-by-step through the easiest way to set up a demo based on [LGM](https://huggingface.co/spaces/ashawkey/LGM), as covered in previous units.
## 1. Duplicate the model
Start by duplicating the [LGM-full](https://huggingface.co/dylanebert/LGM-full) model.
1. Go to the [Repo duplicator](https://huggingface.co/spaces/huggingface-projects/repo_duplicator).
2. Generate a `write token` in your [account settings](https://hf.co/settings/tokens). Copy it into the `token` field.
3. Enter `dylanebert/LGM-full` in the `source_repo` field.
4. Enter `{username}/{model_name}` in the `dst_repo` field, replacing `{username}` with your Hugging Face account username, and `{model_name}` with any name you like, such as `LGM`.
5. Click Submit.
You will see a link labeled "find your repo **here**", which leads to `http://huggingface.co/{username}/{model_name}`.
![Repo duplicator](https://huggingface.co/datasets/dylanebert/ml-for-3d-course/resolve/main/duplicate.png)
Congratulations! You can now use this model with Diffusers, and it will appear in [hf.co/models](https://huggingface.co/models?pipeline_tag=image-to-3d&sort=trending).
## 2. Duplicate the space
Go to the [LGM Tiny](https://huggingface.co/spaces/dylanebert/LGM-tiny) space, which provides a simplified image-to-3D demo.
1. Click the `Duplicate Space` button.
2. Choose free hardware, or ZeroGPU Nvidia A100 if available.
3. Click `Files` in the top right.
4. Click `app.py` to view the demo source code.
5. Click `edit` to change the code.
6. Replace the two instances of `dylanebert/LGM-full` with your model path, e.g. `{username}/{model_name}`.
7. Click `Commit changes`.
Congratulations! You've created a demo and met the minimum requirements for this capstone project.
## So how do I run it?
The demo requires a GPU, so it won't work on free hardware. However, there are many free options:
1. **Run in this notebook**: Validate the code quickly.
2. **Run locally**: Clone your space and run it locally.
3. **Community grant**: Building something cool? Apply for a community GPU grant in your space settings.
4. **Run via API**: Less flexible, but runs on free hardware.
The following sections will walk you through each of these options.
### Run in notebook
https://huggingface.co/learn/ml-for-3d-course/unit5/run-in-notebook.md
# Run in notebook
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/dylanebert/ml-for-3d-course-notebooks/blob/main/capstone.ipynb)
The first open is to run in a Colab notebook. This is the easiest way to validate the code quickly.
## Setup
1. Click the "Open in Colab" button above.
2. Change the runtime type to GPU.
3. Scroll down to the `Run in this notebook section`.
## Run the demo
Start by installing dependencies.
```bash
!pip install -r https://huggingface.co/spaces/dylanebert/LGM-tiny/raw/main/requirements.txt
```
Then, run the demo code. This is exactly the same as in the space `app.py`. To ensure your model is working as expected, replace both instances of `dylanebert/LGM-full` with your `{username}/{model_name}`. Then, run the code.
```python
import shlex
import subprocess
import gradio as gr
import numpy as np
import torch
from diffusers import DiffusionPipeline
subprocess.run(
shlex.split(
"pip install https://huggingface.co/spaces/dylanebert/LGM-mini/resolve/main/wheel/diff_gaussian_rasterization-0.0.0-cp310-cp310-linux_x86_64.whl"
)
)
pipeline = DiffusionPipeline.from_pretrained(
"dylanebert/LGM-full",
custom_pipeline="dylanebert/LGM-full",
torch_dtype=torch.float16,
trust_remote_code=True,
).to("cuda")
def run(image):
input_image = np.array(image, dtype=np.float32) / 255.0
splat = pipeline(
"", input_image, guidance_scale=5, num_inference_steps=30, elevation=0
)
splat_file = "/tmp/output.ply"
pipeline.save_ply(splat, splat_file)
return splat_file
demo = gr.Interface(
fn=run,
title="LGM Tiny",
description="An extremely simplified version of [LGM](https://huggingface.co/ashawkey/LGM). Intended as resource for the [ML for 3D Course](https://huggingface.co/learn/ml-for-3d-course/unit0/introduction).",
inputs="image",
outputs=gr.Model3D(),
examples=[
"https://huggingface.co/datasets/dylanebert/iso3d/resolve/main/jpg@512/a_cat_statue.jpg"
],
cache_examples=True,
allow_duplication=True,
)
demo.queue().launch()
```
## Demo breakdown
Let's break down the demo code.
### Import dependencies
Import the required libraries.
```python
import shlex
import subprocess
import gradio as gr
import numpy as np
import spaces
import torch
from diffusers import DiffusionPipeline
```
### Install diff-gaussian-rasterization
For the gaussian splatting step of LGM, we need to install a custom wheel. This is a workaround for the space to run on [ZeroGPU](https://huggingface.co/zero-gpu-explorers).
```python
subprocess.run(
shlex.split(
"pip install https://huggingface.co/spaces/dylanebert/LGM-mini/resolve/main/wheel/diff_gaussian_rasterization-0.0.0-cp310-cp310-linux_x86_64.whl"
)
)
```
### Construct the pipeline
Construct the [LGM](https://huggingface.co/dylanebert/LGM-full) pipeline. Replace `dylanebert/LGM-full` with your `{username}/{model_name}`.
```python
pipeline = DiffusionPipeline.from_pretrained(
"dylanebert/LGM-full",
custom_pipeline="dylanebert/LGM-full",
torch_dtype=torch.float16,
trust_remote_code=True,
).to("cuda")
```
### Define the run function
Define the run function that takes an image and returns a ply file.
1. Convert the image to a numpy array and normalize it to [0, 1].
2. Run the pipeline with the default parameters.
3. Save the ply file to `/tmp/output.ply`.
4. Return the ply file.
```python
@spaces.GPU
def run(image):
input_image = np.array(image, dtype=np.float32) / 255.0
splat = pipeline(
"", input_image, guidance_scale=5, num_inference_steps=30, elevation=0
)
splat_file = "/tmp/output.ply"
pipeline.save_ply(splat, splat_file)
return splat_file
```
### Create the demo
Create the demo using [Gradio](https://www.gradio.app/guides/quickstart), which handles the UI for us.
```python
demo = gr.Interface(
fn=run,
title="LGM Tiny",
description="An extremely simplified version of [LGM](https://huggingface.co/ashawkey/LGM). Intended as resource for the [ML for 3D Course](https://huggingface.co/learn/ml-for-3d-course/unit0/introduction).",
inputs="image",
outputs=gr.Model3D(),
examples=[
"https://huggingface.co/datasets/dylanebert/iso3d/resolve/main/jpg@512/a_cat_statue.jpg"
],
cache_examples=True,
allow_duplication=True,
)
demo.queue().launch()
```
![LGM Tiny Demo](https://huggingface.co/datasets/dylanebert/ml-for-3d-course/resolve/main/lgm-tiny.png)
### Introduction
https://huggingface.co/learn/ml-for-3d-course/unit5/introduction.md
# Introduction
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/dylanebert/ml-for-3d-course-notebooks/blob/main/capstone.ipynb)
Welcome to the final capstone project of the course.
To complete the course and receive a certificate, you will create and host your own Generative 3D model and demo.
## Requirements
- **Model**: A model hosted under your username, e.g. `hf.co/{username}/{model_name}`.
- Ensure there is a [model card](https://huggingface.co/docs/hub/en/model-cards).
- Tag the model with the `image-to-3d` pipeline tag.
- Apply the correct license if applicable.
- **Space**: A space hosted under your username, e.g. `hf.co/spaces/{username}/{space_name}`.
- Ensure the space converts images (`.png`, `.jpg`) to 3D (`.glb`, `.obj`, `.ply`, `.splat`).
- Reference your model URL in the space README.
## What's expected
This capstone project is very open-ended. You can:
- Build something from scratch.
- Train or fine-tune an existing open-source model.
- Clone and existing open-source model and change the demo experience.
- Directly clone an open-source model and demo.
Check out [3D Arena](https://huggingface.co/spaces/dylanebert/3d-arena) for the latest image-to-3D demos to use as starting points.
Here is an example [model](https://huggingface.co/dylanebert/LGM-full) and [space](https://huggingface.co/spaces/dylanebert/LGM-tiny).
## Step-by-step
If you aren't sure how to get started, the following sections will guide you through the process.
Otherwise, you can start right away.
## Join the community
Want to ask questions? Share your work? Chat with the community?
[Join the Discord](https://hf.co/join/discord)!
### Run via API
https://huggingface.co/learn/ml-for-3d-course/unit5/run-via-api.md
# Run via API
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://githubtocolab.com/dylanebert/ml-for-3d-course-notebooks/blob/main/capstone.ipynb)
To run via API, instead of duplicating the [LGM-tiny](https://huggingface.co/spaces/dylanebert/LGM-tiny) space, duplicate the [LGM-tiny-api](https://huggingface.co/spaces/dylanebert/LGM-tiny-api) space. This contains the following `app.py`.
```python
import gradio as gr
from gradio_client import Client, file
def run(image_url):
client = Client("dylanebert/LGM-tiny")
image = file(image_url)
result = client.predict(image, api_name="/predict")
return result
demo = gr.Interface(
fn=run,
title="LGM Tiny API",
description="An API wrapper for [LGM Tiny](https://huggingface.co/spaces/dylanebert/LGM-tiny). Intended as a resource for the [ML for 3D Course](https://huggingface.co/learn/ml-for-3d-course).",
inputs=gr.Textbox(label="Image URL", placeholder="Enter image URL, e.g. https://huggingface.co/datasets/dylanebert/iso3d/resolve/main/jpg@512/a_cat_statue.jpg"),
outputs=gr.Model3D(),
examples=[
"https://huggingface.co/datasets/dylanebert/iso3d/resolve/main/jpg@512/a_cat_statue.jpg"
],
allow_duplication=True,
)
demo.queue().launch()
```
This will work on CPU, but relies on the original LGM-tiny, instead of your custom model. However, is your focus is on UI/UX or downstream tasks, this may be acceptable.
### Conclusion
https://huggingface.co/learn/ml-for-3d-course/conclusion/conclusion.md
# Conclusion
Thank you for participating in the Machine Learning for 3D Course!
This has been a high-level overview of what's going on at the intersection of Machine Learning and 3D. For further exploration:
- Check out the [models](https://huggingface.co/models?pipeline_tag=image-to-3d&sort=trending) page for the latest image-to-3D models.
- Keep up with [3D Arena](https://huggingface.co/spaces/dylanebert/3d-arena) for the latest state-of-the-art.
- Explore open-source projects like [InstantMesh](https://github.com/TencentARC/InstantMesh).
![3D Arena](https://huggingface.co/datasets/dylanebert/ml-for-3d-course/resolve/main/3d-arena.png)
As the field evolves rapidly, it's easy to get overwhelmed. Stay tuned for more accessible tools and resources as these projects mature.
To receive updates about future course releases, sign up for the course mailing list [here](https://mailchi.mp/911880bcff7d/ml-for-3d-course).
To get involved in the community, [join the Discord](https://hf.co/join/discord).
### Non-meshes
https://huggingface.co/learn/ml-for-3d-course/unit1/non-meshes.md
# Non-meshes
While nearly all 3D is represented as [meshes](meshes) in real-world applications today, 3D machine learning research often uses non-mesh representations, which are later converted to meshes.
These non-mesh representations may be things like:
1. Triplanes, such as in [InstantMesh](https://huggingface.co/TencentARC/InstantMesh).
2. [NeRFs](https://en.wikipedia.org/wiki/Neural_radiance_field), such as in [NeRFiller](https://huggingface.co/papers/2312.04560)
3. [Splats](https://en.wikipedia.org/wiki/Gaussian_splatting), such as in [LGM](https://huggingface.co/ashawkey/LGM).
These approaches are constantly evolving and may even have changed by the time you're reading this.
Fortunately, in most cases, this can be treated as a black box. You don't need to understand the details of these non-mesh representations to use them in your work.
There is, however, one representation that stands out.
## Gaussian Splatting
A special case of non-mesh representation is splats, or [Gaussian Splatting](https://en.wikipedia.org/wiki/Gaussian_splatting).
This is because splats can be rendered in real-time, unlike the other non-mesh representations. They are also capable of features like [animation](https://huggingface.co/spaces/dylanebert/4DGS-demo), [physics (hybrid)](https://x.com/HugoDuprez/status/1766019907769000229), and [lighting](https://x.com/Ruben_Fro/status/1719996105675698452).
This means that theoretically, splats could replace meshes in real-world applications. However, the entire real-world 3D ecosystem is built around meshes, so it's unlikely that splats will replace them. They are more likely to have a role in the 3D ecosystem alongside meshes, especially for anticipated applications like real-time generative 3D.
## In this course
We'll be covering both meshes and Gaussian splatting.
While current [state-of-the-art](https://huggingface.co/TencentARC/InstantMesh) uses triplanes, we won't dive deep into these specifics in this course since they are constantly evolving.
Instead, we will focus on the building blocks of 3D machine learning research. Then, we'll dive deeper into Gaussian Splatting and meshes, since they can be used in real-world applications today.
### Generative 3D pipelines
https://huggingface.co/learn/ml-for-3d-course/unit1/pipelines.md
# Generative 3D pipelines
Let's take a step back and look at the generative 3D pipelines as a whole.
![3D Pipeline](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/ml-for-3d-course/3d-pipeline.png)
In Step 2, there is some [non-mesh](non-meshes) representation, labeled "ML-friendly 3D", which is converted to a mesh (Step 3) with Marching Cubes.
Before ML-friendly 3D, there is often a step called "multi-view diffusion". This is where a [diffusion](https://huggingface.co/docs/diffusers/en/index) model, like [Stable Diffusion](https://huggingface.co/spaces/stabilityai/stable-diffusion), is used to generate novel views of an object - either from source images or from text.
This part of the pipeline is very technical and evolving rapidly, being more related to diffusion than 3D. Therefore, in this course, we'll treat it as a building block, focusing on how you can harness this building block using the Hugging Face ecosystem.
If you want to learn more about the specifics of diffusion models, check out the [Diffusion Course](https://huggingface.co/learn/diffusion-course/en/unit0/1).
## In this course
In this course, each core unit will go over these three building blocks:
1. Multi-view diffusion, with a focus on tools and ecosystem
2. ML-friendly 3D, with a deep dive into Gaussian Splatting
3. Meshes, with a focus on practical applications
Each of these units will also include a hands-on exercise, where you'll get to apply what you've learned in a real-world scenario.
### Introduction
https://huggingface.co/learn/ml-for-3d-course/unit1/introduction.md
# Introduction
## What is 3D?
So what does 3D actually mean?
Turns out, a lot of things. There are many different approaches to representing 3D, as well as directions it could go.
In this unit, I'll talk about:
1. These different representations of 3D
2. How they fit together
3. Where things seem to be going
Starting with, [meshes](meshes)!
### Meshes
https://huggingface.co/learn/ml-for-3d-course/unit1/meshes.md
# Meshes
## What's a mesh?
A [mesh](https://en.wikipedia.org/wiki/Polygon_mesh) is a collection of vertices, edges, and faces that define a 3D object.
This is how nearly all 3D is represented in real-world applications today.
![Mesh](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/ml-for-3d-course/mesh.png)
## The problem
Meshes are difficult for machine learning models.
Most 3D research today looks something like this:
1. A machine learning pipeline produces a non-mesh 3D representation
2. This non-mesh representation is converted to a mesh using [Marching Cubes](https://en.wikipedia.org/wiki/Marching_cubes)
While progress has been very rapid in Step 1, Step 2 is relatively unchanged since the 1980s. This creates a gap between 3D research, which produces advancements in Step 1, and 3D applications, which rely on the final output of Step 2.
This is a simplification, but it highlights the challenges in the 3D machine learning landscape broadly.
## The solution
It remains to be seen whether Step 2 will be improved or if the 3D ecosystem will change entirely (or somewhere in between).
In the meantime, it's useful to understand the building blocks of current 3D machine learning research, starting with [non-meshes](non-meshes).
### What's going on?
https://huggingface.co/learn/ml-for-3d-course/unit0/whats-going-on.md
# What's going on?
## Tipping point
In recent years, there has been a tipping point in the domains of language and vision. New models are releasing constantly, such as the recent [Llama 3](https://huggingface.co/blog/llama3).
These have become useful tools for a wide range of tasks.
## What about 3D?
When it comes to 3D, we haven't quite reached this tipping point.
However, we seem to be very close, with new research coming out constantly.
While this is good, it can be very overwhelming to keep up with, especially since there's very little consistency in what 3D even means.
## What is 3D?
New research usually presents 3D results as pre-rendered videos, even for high-quality models like [TripoSR](https://stability.ai/news/triposr-3d-generation).
But how did they get this video?
Is it a mesh? A splat? A NeRF? Multi-view diffusion? What do these things even mean?
In this course, we'll be answering this question, painting a broader picture of the ecosystem as a whole.
### Welcome to the 🤗 Machine Learning for 3D Course [[introduction]]
https://huggingface.co/learn/ml-for-3d-course/unit0/introduction.md
# Welcome to the 🤗 Machine Learning for 3D Course [[introduction]]
## Sign up
To receive updates as the course releases, sign up for the course mailing list [here](https://mailchi.mp/911880bcff7d/ml-for-3d-course).
## Overview
In this course, you'll learn:
1. **What's going on** - the current big picture of machine learning for 3D
2. **Why it matters** - the importance of recent developments
3. **How to do it yourself** - build your own generative 3D demo
## Who am I?
I'm [Dylan Ebert](https://twitter.com/dylan_ebert_), also known as [IndividualKex](https://www.youtube.com/@IndividualKex). I'm a developer advocate at Hugging Face focusing on 3D. I also create educational content related to various topics.
This course is available here, on my [channel](https://www.youtube.com/@IndividualKex), and open source on [GitHub](https://github.com/huggingface/ml-for-3d-course). The content is presented redundantly as video, text, and code - whichever you prefer.
## Discord
Join the [Hugging Face Discord](https://hf.co/join/discord) to ask questions, share your work, and connect with others (in the `#3d` channel).
### How to do it yourself
https://huggingface.co/learn/ml-for-3d-course/unit0/how-to-do-it-yourself.md
# How to do it yourself
Before continuing with this course, there are some things that will be useful to be at least somewhat familiar with.
## Prerequisites
1. [Git](https://git-scm.com/) - the version control system used throughout this course
2. [Python](https://www.python.org/) - the programming language used throughout this course
3. [Blender](https://www.blender.org/) - used very briefly in the last unit. Knowledge of concepts like vertices, faces, and UVs will be useful, but not critical
4. [Hugging Face Hub](https://huggingface.co/) - used to download and share models and datasets
If you aren't familiar with all of these, don't worry - you can still follow along.
## Wrapping up
By the end of this course, you'll have:
1. A better understanding of ML for 3D
2. Have built your own generative 3D demo
So let's get started!
Don't forget to [join the Discord](https://hf.co/join/discord).
### Why does it matter?
https://huggingface.co/learn/ml-for-3d-course/unit0/why-does-it-matter.md
# Why does it matter?
## As a tool
3D is everywhere - from games, to movies, to retail.
As we've seen with 2D, machine learning can be an extremely useful tool.
This is likely already pretty obvious.
## For general intelligence
Something that's less obvious is the importance of 3D for general intelligence.
> In order to achieve general intelligence, AI needs to be grounded to the 3D world.
It's a popular claim that we need 3D for the next big leap in AI. However, there is no consensus on _how_ 3D should be represented or what it looks like to _understand_ the 3D world.
This remains to be seen and highlights the importance of research in this area.
## For you
Whether you are:
1. An ML practitioner who wants to learn more about 3D
2. A 3D practitioner who wants to learn more about ML
3. You have no idea why you're here
This is the course for you.

Xet Storage Details

Size:
58 kB
·
Xet hash:
38ff4db095657856753b6a1beab6672dccfbe0992088267a9e4dfe75b35c39a3

Xet efficiently stores files, intelligently splitting them into unique chunks and accelerating uploads and downloads. More info.