Gemini-VideoGeneration

Sleeping

App Files Files Community

LehongWu commited on 30 days ago

Commit

6cc3d86

verified ·

1 Parent(s): f8d9f81

Upload folder using huggingface_hub

Browse files

This view is limited to 50 files because it contains too many changes. See raw diff

Files changed (50) hide show

.DS_Store +0 -0
.dockerignore +14 -0
.gitattributes +15 -0
.gitignore +3 -0
Dockerfile +33 -0
README.md +30 -6
__pycache__/gen_image_from_prompt.cpython-312.pyc +0 -0
__pycache__/gen_image_prompt_only.cpython-312.pyc +0 -0
__pycache__/gen_image_same_start_end.cpython-312.pyc +0 -0
__pycache__/gen_prompt_only.cpython-312.pyc +0 -0
__pycache__/gen_video_image_start_end.cpython-312.pyc +0 -0
__pycache__/gen_video_prompt_only.cpython-312.pyc +0 -0
__pycache__/generate_video.cpython-312.pyc +0 -0
assets/example_1_prompt_to_image/output_a.png +3 -0
assets/example_1_prompt_to_image/output_b.png +3 -0
assets/example_2_image_to_image/input.png +3 -0
assets/example_2_image_to_image/output.png +3 -0
assets/example_3a_loop_video/first_last_frame.png +3 -0
assets/example_3a_loop_video/output.mp4 +3 -0
assets/example_3b_loop_video/first_last_frame.png +3 -0
assets/example_3b_loop_video/output.mp4 +3 -0
assets/example_4_super_res/input.png +0 -0
assets/example_4_super_res/output_4k.png +3 -0
assets/example_5_video_extension/output_a.mp4 +3 -0
assets/example_5_video_extension/output_b.mp4 +3 -0
docs/README.md +8 -0
docs/SPEC_WEB_UI.md +113 -0
docs/WEB_DEV_GUIDE.md +160 -0
gen_image_image_cond.py +201 -0
gen_image_prompt_only.py +165 -0
gen_lyrics_batch.py +156 -0
gen_video_image_start_end.py +194 -0
gen_video_prompt_only.py +155 -0
gen_video_prompt_only_extend.py +281 -0
generate_lyrics.sh +46 -0
generate_lyrics_batch.sh +48 -0
image_super_resolution.sh +45 -0
run_gen_image_image_cond.sh +42 -0
run_gen_image_prompt_only.sh +36 -0
run_gen_video_image_start_end.sh +30 -0
run_gen_video_image_start_end_diff.sh +38 -0
run_gen_video_prompt_only.sh +35 -0
run_gen_video_prompt_only_extend.sh +49 -0
run_gen_video_prompt_only_extend_2.sh +48 -0
web/__init__.py +1 -0
web/__pycache__/__init__.cpython-312.pyc +0 -0
web/backend/__init__.py +1 -0
web/backend/__pycache__/__init__.cpython-312.pyc +0 -0
web/backend/__pycache__/config.cpython-312.pyc +0 -0
web/backend/__pycache__/deps.cpython-312.pyc +0 -0

.DS_Store ADDED Viewed

Binary file (8.2 kB). View file

.dockerignore ADDED Viewed

	@@ -0,0 +1,14 @@

+.git
+.gitignore
+**/__pycache__
+**/*.py[cod]
+.venv
+venv
+.env
+.env.*
+# Rebuilt inside the image; omit host bundle from context
+web/backend/static
+web/frontend/node_modules
+**/node_modules

.gitattributes CHANGED Viewed

@@ -33,3 +33,18 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+assets/example_1_prompt_to_image/output_a.png filter=lfs diff=lfs merge=lfs -text
+assets/example_1_prompt_to_image/output_b.png filter=lfs diff=lfs merge=lfs -text
+assets/example_2_image_to_image/input.png filter=lfs diff=lfs merge=lfs -text
+assets/example_2_image_to_image/output.png filter=lfs diff=lfs merge=lfs -text
+assets/example_3a_loop_video/first_last_frame.png filter=lfs diff=lfs merge=lfs -text
+assets/example_3a_loop_video/output.mp4 filter=lfs diff=lfs merge=lfs -text
+assets/example_3b_loop_video/first_last_frame.png filter=lfs diff=lfs merge=lfs -text
+assets/example_3b_loop_video/output.mp4 filter=lfs diff=lfs merge=lfs -text
+assets/example_4_super_res/output_4k.png filter=lfs diff=lfs merge=lfs -text
+assets/example_5_video_extension/output_a.mp4 filter=lfs diff=lfs merge=lfs -text
+assets/example_5_video_extension/output_b.mp4 filter=lfs diff=lfs merge=lfs -text
+web/frontend/node_modules/@esbuild/darwin-arm64/bin/esbuild filter=lfs diff=lfs merge=lfs -text
+web/frontend/node_modules/@rollup/rollup-darwin-arm64/rollup.darwin-arm64.node filter=lfs diff=lfs merge=lfs -text
+web/frontend/node_modules/esbuild/bin/esbuild filter=lfs diff=lfs merge=lfs -text
+web/frontend/node_modules/fsevents/fsevents.node filter=lfs diff=lfs merge=lfs -text

.gitignore ADDED Viewed

	@@ -0,0 +1,3 @@

+examples
+output
+.venv

Dockerfile ADDED Viewed

	@@ -0,0 +1,33 @@

+# syntax=docker/dockerfile:1
+# Hugging Face Spaces: sdk: docker, default port 7860 (override with PORT).
+# Build from repo root (directory that contains web/ and assets/).
+FROM node:20-bookworm-slim AS frontend-build
+WORKDIR /app/web/frontend
+COPY web/frontend/package.json web/frontend/package-lock.json ./
+RUN npm ci
+COPY web/frontend/ ./
+RUN npm run build
+FROM python:3.12-slim-bookworm
+WORKDIR /app
+ENV PYTHONDONTWRITEBYTECODE=1 \
+    PYTHONUNBUFFERED=1 \
+    PYTHONPATH=/app
+RUN apt-get update \
+    && apt-get install -y --no-install-recommends ffmpeg \
+    && rm -rf /var/lib/apt/lists/*
+COPY web/requirements.txt /app/web/requirements.txt
+RUN pip install --no-cache-dir -r /app/web/requirements.txt
+COPY web /app/web
+COPY assets /app/assets
+COPY --from=frontend-build /app/web/backend/static /app/web/backend/static
+EXPOSE 7860
+CMD ["sh", "-c", "exec uvicorn web.backend.main:app --host 0.0.0.0 --port ${PORT:-7860}"]

README.md CHANGED Viewed

@@ -1,11 +1,35 @@
 ---
-title: VideoGeneration Release
-emoji: 🏃
-colorFrom: purple
-colorTo: yellow
 sdk: docker
 pinned: false
-license: apache-2.0
 ---
-Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

 ---
+title: Gemini Studio Web
+emoji: 🎨
+colorFrom: gray
+colorTo: indigo
 sdk: docker
 pinned: false
 ---
+# Gemini Studio Web
+Gemini 图片 / 视频创作台（FastAPI + React）。本仓库根目录的 **`README.md`** 是**总索引**（并供 Hugging Face Spaces 读取 YAML 元数据）；具体说明拆到 `docs/` 下，避免与「给实现者的需求文档」「给开发者的运行手册」混在一起。
+## Documentation
+| Document | Audience | Contents |
+|----------|----------|----------|
+| [**docs/SPEC_WEB_UI.md**](docs/SPEC_WEB_UI.md) | 产品 / 实现者 / AI | 功能范围、界面与示例页要求、非技术约束（原 `PLAN.md`） |
+| [**docs/WEB_DEV_GUIDE.md**](docs/WEB_DEV_GUIDE.md) | 本机与部署的开发者 | 环境、`PYTHONPATH`、环境变量、`generation_options.json`、本地运行、稳定 URL、Docker、Hugging Face（原 `web/README.md`） |
+在 GitHub 里浏览 `docs/` 文件夹时，可先打开 **[docs/README.md](docs/README.md)**（仅索引，内容与上表一致）。
+---
+## Hugging Face Space
+部署到 Space 后，在 **Settings → Variables and secrets** 中配置（名称区分大小写）。保存后 Space 会重启；首次冷启动可能需一两分钟。
+| Name | 说明 |
+|------|------|
+| `GEMINI_API_KEY` | Google AI Studio / Gemini API 密钥（仅服务端使用） |
+| `WEB_UI_PASSWORD` | 登录本站时输入的密码 |
+| `SESSION_SECRET` | 会话签名用随机串，例如本地执行 `openssl rand -hex 32` 生成 |
+更完整的步骤与 `docker run` 自测见 **[docs/WEB_DEV_GUIDE.md §10](docs/WEB_DEV_GUIDE.md#10-hugging-face-spaces-docker)**。可选变量 `GENERATION_OPTIONS_PATH` 等见该文档 **§3–4**。

__pycache__/gen_image_from_prompt.cpython-312.pyc ADDED Viewed

Binary file (6.86 kB). View file

__pycache__/gen_image_prompt_only.cpython-312.pyc ADDED Viewed

Binary file (5.56 kB). View file

__pycache__/gen_image_same_start_end.cpython-312.pyc ADDED Viewed

Binary file (7.17 kB). View file

__pycache__/gen_prompt_only.cpython-312.pyc ADDED Viewed

Binary file (4.9 kB). View file

__pycache__/gen_video_image_start_end.cpython-312.pyc ADDED Viewed

Binary file (7.26 kB). View file

__pycache__/gen_video_prompt_only.cpython-312.pyc ADDED Viewed

Binary file (6.01 kB). View file

__pycache__/generate_video.cpython-312.pyc ADDED Viewed

Binary file (5.42 kB). View file

assets/example_1_prompt_to_image/output_a.png ADDED Viewed

Git LFS Details

SHA256: 54e5fdc9947655adb42c0ac6d08ce4849086e3a717826cd20b651525e5c8dba1
Pointer size: 131 Bytes
Size of remote file: 561 kB

assets/example_1_prompt_to_image/output_b.png ADDED Viewed

Git LFS Details

SHA256: dcd66058f8b1834eb25351f15fac15397610b44a34840c8a4a79de3de5024cb0
Pointer size: 131 Bytes
Size of remote file: 562 kB

assets/example_2_image_to_image/input.png ADDED Viewed

Git LFS Details

SHA256: 411d4d81339d11cb9916c78926423203f2a50157c8bd779189fa5b5569e8689f
Pointer size: 131 Bytes
Size of remote file: 450 kB

assets/example_2_image_to_image/output.png ADDED Viewed

Git LFS Details

SHA256: 8e93a25ea80882d1a00388a7b907bf55e9a5d442dbfdcc9d694cf84e0d7d31bf
Pointer size: 133 Bytes
Size of remote file: 40.3 MB

assets/example_3a_loop_video/first_last_frame.png ADDED Viewed

Git LFS Details

SHA256: cc0dc8afdcdb81ac92ed72cb48531cd5a1f56652ed9eb3c126e6d135e4584a83
Pointer size: 131 Bytes
Size of remote file: 584 kB

assets/example_3a_loop_video/output.mp4 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:c324a665d3dc84fd4c1df27d0c82283a51b7c96e300a9e34059eb114bc24a753
+size 30184274

assets/example_3b_loop_video/first_last_frame.png ADDED Viewed

Git LFS Details

SHA256: f1bd09d1fbc0f118d934c724283bf58185c01f7f694bb844f318df5fdf4f33c9
Pointer size: 131 Bytes
Size of remote file: 383 kB

assets/example_3b_loop_video/output.mp4 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:08cfdafd9665b2a8f3d6d9b953b9cb0c79961a73e1fa9818c6f0af7e827f8e51
+size 9509577

assets/example_4_super_res/input.png ADDED Viewed

assets/example_4_super_res/output_4k.png ADDED Viewed

Git LFS Details

SHA256: ff3c8e6a39a1de32d0c0909c2c1776deec98ba1ec2406f0246d3f8c441a894af
Pointer size: 132 Bytes
Size of remote file: 3.93 MB

assets/example_5_video_extension/output_a.mp4 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:19cdca400d87057f9097ff2637fcd73b4bdbb4f4f9737cd473c6c5bf6990bca4
+size 9074282

assets/example_5_video_extension/output_b.mp4 ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:17091cc192dbc35f4db48f2452d0c6a52bdca3d83ca225c947ad04ad38516232
+size 9567422

docs/README.md ADDED Viewed

	@@ -0,0 +1,8 @@

+# Documentation
+| File | Role |
+|------|------|
+| [SPEC_WEB_UI.md](./SPEC_WEB_UI.md) | Product / implementation specification |
+| [WEB_DEV_GUIDE.md](./WEB_DEV_GUIDE.md) | Developer runbook (local run, env, Docker, Hugging Face) |
+Hub (overview + HF secrets summary): **[README.md](../README.md)**.

docs/SPEC_WEB_UI.md ADDED Viewed

	@@ -0,0 +1,113 @@

+# Web UI — product specification
+This file is the **authoring / product spec** for turning the repo into a web UI: what to build, feature scope, and UX expectations. It is aimed at **implementers and AI assistants**. For how to run the stack, env vars, and deployment, see **[`WEB_DEV_GUIDE.md`](./WEB_DEV_GUIDE.md)**.
+---
+This is the initial plan for how to change this pure code-based repo to a web ui.
+# Overview
+The final version of this repo should be launched as a web ui, which supports image and video generation.
+The user might upload prompts and images (optional) as condition.
+There will be several main features:
+## 第一板块：AI创作台
+- A. 图片生成或编辑：提供0-3张参考图片和提示词，生成一张图
+    - 思考强度：界面**并列三项**（模型 + 强度合一）——Flash（快速，默认 minimal）、Flash（快速）（长思考，high）、Pro（标准）（长思考，high）；对应 `gemini-3.1-flash-image-preview` / `gemini-3-pro-image-preview` + `thinking_level`
+    - 宽高比："1:1","2:3","3:2","3:4","4:3","4:5","5:4","9:16","16:9","21:9"
+    - 分辨率："1K", "2K", "4K"
+- B. 视频生成：提供0-3张参考图片和提示词，生成一个短视频
+    - 模型（可配置，见 `generation_options.json`）：`veo-3.1-generate-preview`、`veo-3.1-lite-generate-preview`、`veo-3.1-fast-generate-preview`；界面标注为 **（标准）/（轻量）/（快速）**；其中 **Lite 不支持参考图**（由 `supports_reference_images` 标注）
+    - 宽高比：16:9 或 9:16
+    - 分辨率：720p、1080p 或 4k
+    - 时长：纯文案时可 4/6/8 秒（与分辨率组合以 API 为准）；**有参考图时固定 8 秒**（Veo 接口要求）
+- C. 视频生成（首尾过渡）（起始/可选结尾帧）：至少 1 张起始帧 + 提示词，生成一个短视频；结尾帧可选，或勾选「结尾与起始相同」
+    - 模型：与 B 相同，同上三项 Veo 预览模型，可配置
+    - 宽高比：16:9 或 9:16
+    - 分辨率：720p、1080p 或 4k
+    - 时长：固定 **8 秒**（首/尾帧条件时 Veo 不接受 4/6 秒，与纯文案视频不同）
+## 第二板块：辅助工具
+- 超分辨率：
+    - 内在调用与「图片编辑或生成」相同，使用 1 张参考图生成更高清图片
+    - 提示词**默认**为「保持内容完全不变，提高图片的分辨率」，**可修改**
+    - **先上传原图（必填）**；根据原图**宽高比**在配置列表中**自动择近匹配**；**宽高比选项放在表单最下**，可手动改
+    - **分辨率**默认 4K，**不**根据原图自动推断，用户自选 1K/2K/4K
+    - 若原图宽高比与列表中**任一项都不接近**，界面**警告**：生成图可能与原图不完全一致，仍可点击生成
+    - 模型**默认「快速」**，可改为「标准」
+- 提取视频的特定帧：
+    - 用户上传一个视频
+    - 视频将出现一个胶片一样可以拉动的进度条，用户随便停在一个位置，展示具体的时间和对应图像的preview
+    - 一旦点击“下载”，在下方讲出现这一帧作为一张单独的图片，可供用户下载。
+- 图像裁剪（**前端 Canvas**，`/tools/crop`，不上传服务器）
+    - 上传图片，**交互式裁剪框**（框内平移、四角缩放；「自由」下四边为**可见白条**可拖，命中区与光标反馈按画布像素计算）
+    - 可选固定比例：与 `generation_options.json` 中**图片宽高比**列表一致（另加「自由」）；切换比例时重置为居中最大适配框
+    - 裁剪交互区即原图；裁剪结果随框**实时**更新，下载 PNG
+- 替换纯色背景（**前端 Canvas**，`/tools/replace-bg`，不上传服务器）
+    - 用户上传一个图片（提示：仅适合**纯色或大块相近色**背景）
+    - **原始色 / 目标色**的设定方式一致：**系统调色板**（原生取色器，通常含放大镜/吸管）、**手动** R/G/B 或 Hex（可「应用 Hex」）；可选在预览图上点击取像素色
+    - **不透明度** 0–100%（默认 100%，仅作用于替换结果）
+    - 与原始色在 RGB 距离 ≤ **容差** 的像素改为目标 RGBA（容差可调）
+    - 下方预览，可下载 PNG
+## 第三板块：示例
+这一类是上述创作台的**简化和示例**版本，例如，已经为你写好提示词，选定各种参数，输入参考图片（这里来自asset/目录，有待添加），展示输出。
+理论上，你用上面的创作台能达到完全一样的效果，只是这里给了你例子。因此这个页面是完全静止的，不涉及model query。
+简介：Wake-UP人声乐团是北京大学2025-2026年度十佳歌手冠军，他们比赛现场的彩幕制作过程使用了如下功能。
+1. 手写字体生成
+- 展示两张output
+2. 在图片上加文字
+- 展示提示词、一张input和output
+3. 循环视频制作
+- 展示提示词、一张input和output video
+4. 图像超分辨率
+- 展示一张input和output
+5. 视频延伸（长视频生成）
+只展示output，说此功能敬请期待（保持神秘）
+# 示例代码
+所有上述功能都有一个或多个初步的纯代码脚本
+- A. 参考 /Users/lehongwu/Projects/others/lyrics/VideoGeneration-release/run_gen_image_image_cond.sh 和 /Users/lehongwu/Projects/others/lyrics/VideoGeneration-release/run_gen_image_prompt_only.sh
+- B. 参考 /Users/lehongwu/Projects/others/lyrics/VideoGeneration-release/run_gen_video_prompt_only.sh
+- C. 参考 /Users/lehongwu/Projects/others/lyrics/VideoGeneration-release/run_gen_video_image_start_end_diff.sh
+注意，上述代码不一定包含完整功能，例如，只输入一张图片，而不是0-3张。对于完整document和例子，都参考如下网站：
+https://ai.google.dev/gemini-api/docs/image-generation?hl=zh-cn
+https://ai.google.dev/gemini-api/docs/video?hl=zh-cn
+# 界面要求
+- 初始有一个输入密码界面，这个密码是launch网站之前由用户设置的环境变量
+- 密码正确后，侧栏有上述多重feature选项，点开其中任意一个，包含：
+    1. 功能简介和指示
+    2. 输入提示词的窗口（不可以为空）
+    3. 留给用户上传图片的空位（feature A/B 有三个参考图空位，均可空；feature C 有起始帧必填、结尾帧可选，并可勾选与起始相同）
+    4. 选择模型、宽高比、分辨率（视频类还有时长，受 API/参考图约束）
+    5. 输出图片/视频的空位
+- 在生成过程中，可以有某种计时器显示运行时间，不要让用户感觉卡住了
+# 其他要求
+- Gemini api key是launch网站之前由用户设置的环境变量，千万不能泄露或者hard-code在代码里
+- 所有代码是英文，但是网站上的文字（例如介绍）和提示词可以用中文
+- 网站页面风格：请用比较美观的模板和风格，不用太花哨，但是要有艺术气息
+# 进一步要求/修改建议
+- 无论我在哪个机器launch这个server，我希望url不要变（debug阶段可以用localhost或者x.x.x.x，但最终开放的版本肯定不行），可以是我自己设计的一个url，但是我希望如果我更换serve的机器这个url不会改变，这样用户能一直用相同url访问。我不确定这个能不能做到，请你给出方案。
+- 对于上述模型名称、分辨率、宽高比的选项，我希望不是hard-code的list，而是能够有一个独立让我修改的地方，因为这些api支持的选项可能随时变化。
+**实现说明（Web UI）**：选项列表集中在 `web/config/generation_options.json`（可用环境变量 `GENERATION_OPTIONS_PATH` 指向其他文件）。图片模型、视频/视频生成（首尾过渡）的 Veo 模型名、宽高比、分辨率、时长等均从此处加载；修改后一般无需重编前端，但若改 React/样式需 `cd web/frontend && npm run build` 更新 `web/backend/static/`。

docs/WEB_DEV_GUIDE.md ADDED Viewed

	@@ -0,0 +1,160 @@

+# Web UI — developer guide
+How to **set up, run, configure, and deploy** the Gemini Studio Web stack (FastAPI + React). End users only need the public URL and password.
+For **what to build** (features, UX intent, examples page scope), see **[`SPEC_WEB_UI.md`](./SPEC_WEB_UI.md)**.
+**Layout:** Application code lives under **`VideoGeneration-release/web/`**. All `from web.backend...` imports assume **`PYTHONPATH`** includes **`VideoGeneration-release`** (the parent of the `web/` directory). Do not set `PYTHONPATH` to `web/` itself—that will break imports.
+## 1. Prerequisites
+- **Python** 3.10+ recommended.
+- **Node.js** 18+ and **npm** (for building the frontend).
+- **ffmpeg** on your `PATH` (same as the CLI video scripts; used to strip audio from MP4s).
+## 2. Python virtual environment
+From **`VideoGeneration-release`** (the directory that contains `web/`):
+```bash
+cd /path/to/VideoGeneration-release
+python3 -m venv .venv
+source .venv/bin/activate   # Windows: .venv\Scripts\activate
+pip install -r web/requirements.txt
+```
+## 3. Environment variables
+Set these before starting Uvicorn (or put them in a `.env` file and load with your process manager—**do not** commit real secrets):
+| Variable | Purpose |
+|----------|---------|
+| `GEMINI_API_KEY` | Gemini API key (server only; never exposed to the browser). |
+| `WEB_UI_PASSWORD` | Login password for the web UI. |
+| `SESSION_SECRET` | Random string for signing session cookies, e.g. `openssl rand -hex 32`. |
+| `GENERATION_OPTIONS_PATH` | Optional. Absolute path to a JSON file that overrides the default option lists. If unset, the server uses `web/config/generation_options.json`. |
+Example for local debugging:
+```bash
+export GEMINI_API_KEY="your_key"
+export WEB_UI_PASSWORD="password"
+export SESSION_SECRET="$(openssl rand -hex 32)"
+```
+## 4. Configurable model / resolution / aspect lists
+Edit **`web/config/generation_options.json`** (or the file pointed to by `GENERATION_OPTIONS_PATH`). The UI loads these values from **`GET /api/config/generation-options`** after login—no frontend rebuild is required when you change only this JSON. Rebuild the frontend only when you change React/TS/CSS.
+Schema (informal):
+- **`image`**: `models` (`value` + `label`; e.g. **（快速）** for Flash vs **（标准）** for Pro), `aspect_ratios`, `resolutions`, `thinking_levels` (`value` + `label`).
+- **`video`** / **`video_frames`**: `models` (`value` + `label`, Veo IDs), `aspect_ratios`, `resolutions`, `durations_seconds`. On **`video`**, each model may set **`supports_reference_images`** (boolean); e.g. Veo 3.1 Lite is **`false`**. With reference images, the backend also forces **8s** duration per API rules. The **首尾帧** API route always uses **8s** (frame-conditioned video does not accept 4/6s like prompt-only 720p).
+## 5. Build the frontend once
+```bash
+cd web/frontend
+npm install
+npm run build
+```
+Output goes to **`web/backend/static/`**. If this directory is missing, the API still runs, but visiting the root URL returns 503 until you build.
+## 6. Launch the server
+From **`VideoGeneration-release`** (parent of `web/`):
+```bash
+cd /path/to/VideoGeneration-release
+PYTHONPATH=. uvicorn web.backend.main:app --host 127.0.0.1 --port 8000
+```
+For LAN testing from other devices on the same network:
+```bash
+cd /path/to/VideoGeneration-release
+PYTHONPATH=. uvicorn web.backend.main:app --host 0.0.0.0 --port 8000
+```
+Use `--reload` during development.
+## 7. Development mode (hot reload)
+**Terminal A** — API (cwd = **`VideoGeneration-release`**):
+```bash
+cd /path/to/VideoGeneration-release
+PYTHONPATH=. uvicorn web.backend.main:app --reload --host 127.0.0.1 --port 8000
+```
+**Terminal B** — Vite (proxies `/api` to port 8000):
+```bash
+cd /path/to/VideoGeneration-release/web/frontend && npm run dev
+```
+Open the URL Vite prints (e.g. `http://127.0.0.1:5173`). The API key stays on the server; the browser only talks to Vite, which forwards `/api` to Uvicorn.
+## 8. Stable URL when you change machines
+The application does **not** assign a public hostname by itself. A stable URL for users is an **infrastructure** concern:
+1. **Own domain + DNS**
+   Register a domain (e.g. `studio.example.com`). Create an **A** (or **AAAA**) record pointing to the **current** server’s public IP. When you move to a new machine, update the DNS record to the new IP. Users keep the same hostname.
+2. **Static IP or elastic IP**
+   If your cloud provider offers a static/elastic IP, attach it to whichever instance runs the app; point your DNS name to that IP.
+3. **Reverse proxy**
+   Run **nginx** or **Caddy** on the server (or a small VPS in front): TLS termination, `proxy_pass` to `127.0.0.1:8000`. Users hit `https://studio.example.com` only.
+4. **Tunnel / no public IP**
+   **Cloudflare Tunnel**, **Tailscale Funnel**, or similar gives you a stable hostname without opening ports on your home router; the tunnel endpoint can be repointed when the backend machine changes (depending on the product).
+5. **What not to expect**
+   Hard-coding `localhost` or a raw IP in the app will not give a stable branded URL. The fix is always: **one DNS name you control** → **current server location**.
+## 9. Quick checklist
+- [ ] `ffmpeg` works: `ffmpeg -version`
+- [ ] `pip install -r web/requirements.txt` in a venv
+- [ ] `npm run build` in `VideoGeneration-release/web/frontend` at least once
+- [ ] Three env vars set: `GEMINI_API_KEY`, `WEB_UI_PASSWORD`, `SESSION_SECRET`
+- [ ] Start Uvicorn with `PYTHONPATH=.` from **`VideoGeneration-release`** (folder that contains `web/`)
+## 10. Hugging Face Spaces (Docker)
+This UI is **not** Streamlit/Gradio; deploy with **`sdk: docker`** and the **`Dockerfile`** at the repo root (same directory as `web/` and `assets/`).
+1. Create a **Docker** Space and point it at this repository (or push this folder to a GitHub repo and connect the Space).
+2. In the Space **Settings → Variables and secrets**, add **Repository secrets** (or Variables) with exactly these names:
+   | Name | Purpose |
+   |------|---------|
+   | `GEMINI_API_KEY` | Same as local; never commit it. |
+   | `WEB_UI_PASSWORD` | Password users type on the login page. |
+   | `SESSION_SECRET` | Same as local, e.g. `openssl rand -hex 32`. |
+   Hugging Face injects them as environment variables; the app reads them the same way as on your laptop.
+3. The container listens on **`PORT`** if set (Spaces often set it); otherwise **`7860`**. Do not hard-code a port in the app; the provided `Dockerfile` uses `uvicorn ... --port ${PORT:-7860}`.
+4. **ffmpeg** is installed in the image (required for stripping audio from generated MP4s).
+5. Optional: set **`GENERATION_OPTIONS_PATH`** in the same secrets UI if you mount a custom JSON elsewhere; otherwise the bundled `web/config/generation_options.json` is used.
+6. Build can take several minutes on HF; first request after idle may hit cold start.
+Local test of the image (from **`VideoGeneration-release`**):
+```bash
+docker build -t gemini-studio-web .
+docker run --rm -p 7860:7860 \
+  -e GEMINI_API_KEY="your_key" \
+  -e WEB_UI_PASSWORD="your_password" \
+  -e SESSION_SECRET="$(openssl rand -hex 32)" \
+  gemini-studio-web
+```
+Then open `http://127.0.0.1:7860`.

gen_image_image_cond.py ADDED Viewed

	@@ -0,0 +1,201 @@

+#!/usr/bin/env python3
+import argparse
+import json
+import os
+import sys
+import threading
+import time
+from pathlib import Path
+from google import genai
+from google.genai import types
+from PIL import Image
+def parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(
+        description="Generate an image conditioned on one or more input images using Gemini (Nano Banana)."
+    )
+    parser.add_argument("--prompt", required=True, help="Prompt describing the desired output image.")
+    parser.add_argument(
+        "--input-image-path",
+        "--input_image_path",
+        dest="input_image_path",
+        required=True,
+        help="Path to the primary conditioning image.",
+    )
+    parser.add_argument(
+        "--extra-image-paths",
+        "--extra_image_paths",
+        dest="extra_image_paths",
+        nargs="*",
+        default=[],
+        help="Optional additional conditioning image paths (up to 13 total images).",
+    )
+    parser.add_argument(
+        "--model",
+        default="gemini-3.1-flash-image-preview",
+        help="Image generation model name (e.g. gemini-3.1-flash-image-preview, gemini-3-pro-image-preview, gemini-2.5-flash-image).",
+    )
+    parser.add_argument("--name", default="img_cond", help="Base output filename (without extension).")
+    parser.add_argument(
+        "--output-dir",
+        "--output_dir",
+        dest="output_dir",
+        default="output_dir",
+        help="Directory to save outputs (default: output_dir).",
+    )
+    parser.add_argument(
+        "--aspect-ratio",
+        default="1:1",
+        help="Aspect ratio (e.g. 1:1, 16:9, 9:16, 4:3, 3:4, 21:9).",
+    )
+    parser.add_argument(
+        "--resolution",
+        default="1K",
+        help="Output resolution: 512px, 1K, 2K, or 4K (Gemini 3 models only).",
+    )
+    parser.add_argument(
+        "--number-of-images",
+        type=int,
+        default=1,
+        help="How many images to generate (runs the request N times).",
+    )
+    parser.add_argument(
+        "--thinking-level",
+        default=None,
+        choices=["minimal", "high"],
+        help="Thinking level for Gemini 3.1 Flash Image: 'minimal' or 'high'.",
+    )
+    return parser.parse_args()
+def load_pil_image(image_path: Path) -> Image.Image:
+    if not image_path.exists():
+        raise FileNotFoundError(f"Input image not found: {image_path}")
+    return Image.open(str(image_path))
+def build_image_config(args: argparse.Namespace) -> types.ImageConfig:
+    kwargs: dict = {"aspect_ratio": args.aspect_ratio}
+    gemini3_models = {"gemini-3.1-flash-image-preview", "gemini-3-pro-image-preview"}
+    if args.model in gemini3_models:
+        kwargs["image_size"] = args.resolution
+    return types.ImageConfig(**kwargs)
+def generate_one(
+    client: genai.Client,
+    args: argparse.Namespace,
+    image_config: types.ImageConfig,
+    pil_images: list[Image.Image],
+) -> bytes | None:
+    config_kwargs: dict = {
+        "response_modalities": ["IMAGE"],
+        "image_config": image_config,
+    }
+    if args.thinking_level and args.model == "gemini-3.1-flash-image-preview":
+        config_kwargs["thinking_config"] = types.ThinkingConfig(
+            thinking_level=args.thinking_level.capitalize(),
+        )
+    contents: list = [args.prompt] + pil_images
+    response = client.models.generate_content(
+        model=args.model,
+        contents=contents,
+        config=types.GenerateContentConfig(**config_kwargs),
+    )
+    for part in response.parts:
+        if part.thought:
+            continue
+        if part.inline_data is not None:
+            return part.inline_data.data
+    return None
+def main() -> int:
+    args = parse_args()
+    if not os.getenv("GEMINI_API_KEY"):
+        print("Missing GEMINI_API_KEY environment variable.", file=sys.stderr)
+        return 1
+    primary_path = Path(args.input_image_path).expanduser().resolve()
+    all_image_paths = [primary_path] + [
+        Path(p).expanduser().resolve() for p in args.extra_image_paths
+    ]
+    pil_images: list[Image.Image] = []
+    for p in all_image_paths:
+        print(f"Loading input image: {p}")
+        pil_images.append(load_pil_image(p))
+    client = genai.Client()
+    image_config = build_image_config(args)
+    out_dir = Path(args.output_dir)
+    out_dir.mkdir(parents=True, exist_ok=True)
+    saved_files: list[str] = []
+    for idx in range(1, args.number_of_images + 1):
+        label = f" ({idx}/{args.number_of_images})" if args.number_of_images > 1 else ""
+        print(f"Generating image{label}...")
+        result: dict = {}
+        thread = threading.Thread(
+            target=lambda: result.update({"bytes": generate_one(client, args, image_config, pil_images)}),
+            daemon=True,
+        )
+        started_at = time.time()
+        thread.start()
+        while thread.is_alive():
+            thread.join(timeout=10)
+            if thread.is_alive():
+                elapsed = int(time.time() - started_at)
+                print(f"Waiting for image generation... elapsed: {elapsed}s")
+        elapsed = int(time.time() - started_at)
+        print(f"Image generation finished in {elapsed}s")
+        image_bytes = result.get("bytes")
+        if image_bytes is None:
+            print(f"No image returned for generation {idx}.", file=sys.stderr)
+            continue
+        if args.number_of_images == 1:
+            out_path = out_dir / f"{args.name}.png"
+        else:
+            out_path = out_dir / f"{args.name}_{idx}.png"
+        out_path.write_bytes(image_bytes)
+        saved_files.append(str(out_path.resolve()))
+        print(f"Saved image: {out_path.resolve()}")
+    if not saved_files:
+        print("No images were saved.", file=sys.stderr)
+        return 2
+    metadata: dict = {
+        "prompt": args.prompt,
+        "model": args.model,
+        "input_images": [str(p) for p in all_image_paths],
+        "config": {
+            "aspect_ratio": args.aspect_ratio,
+            "resolution": args.resolution,
+            "number_of_images": args.number_of_images,
+            "thinking_level": args.thinking_level,
+        },
+        "saved_images": saved_files,
+    }
+    metadata_path = out_dir / f"{args.name}.json"
+    metadata_path.write_text(json.dumps(metadata, indent=2), encoding="utf-8")
+    print(f"Saved metadata: {metadata_path.resolve()}")
+    return 0
+if __name__ == "__main__":
+    raise SystemExit(main())

gen_image_prompt_only.py ADDED Viewed

	@@ -0,0 +1,165 @@

+#!/usr/bin/env python3
+import argparse
+import json
+import os
+import sys
+import threading
+import time
+from pathlib import Path
+from google import genai
+from google.genai import types
+def parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(
+        description="Generate an image from a text prompt using Gemini (Nano Banana)."
+    )
+    parser.add_argument("--prompt", required=True, help="Prompt describing the image.")
+    parser.add_argument(
+        "--model",
+        default="gemini-3.1-flash-image-preview",
+        help="Image generation model name (e.g. gemini-3.1-flash-image-preview, gemini-3-pro-image-preview, gemini-2.5-flash-image).",
+    )
+    parser.add_argument("--name", default="generated_image", help="Base output filename (without extension).")
+    parser.add_argument(
+        "--output-dir",
+        "--output_dir",
+        dest="output_dir",
+        default="output_dir",
+        help="Directory to save outputs (default: output_dir).",
+    )
+    parser.add_argument(
+        "--aspect-ratio",
+        default="1:1",
+        help="Aspect ratio (e.g. 1:1, 16:9, 9:16, 4:3, 3:4, 21:9).",
+    )
+    parser.add_argument(
+        "--resolution",
+        default="1K",
+        help="Output resolution: 512px, 1K, 2K, or 4K (Gemini 3 models only).",
+    )
+    parser.add_argument(
+        "--number-of-images",
+        type=int,
+        default=1,
+        help="How many images to generate (runs the request N times).",
+    )
+    parser.add_argument(
+        "--thinking-level",
+        default=None,
+        choices=["minimal", "high"],
+        help="Thinking level for Gemini 3.1 Flash Image: 'minimal' or 'high'.",
+    )
+    return parser.parse_args()
+def build_image_config(args: argparse.Namespace) -> types.ImageConfig:
+    kwargs: dict = {"aspect_ratio": args.aspect_ratio}
+    gemini3_models = {"gemini-3.1-flash-image-preview", "gemini-3-pro-image-preview"}
+    if args.model in gemini3_models:
+        kwargs["image_size"] = args.resolution
+    return types.ImageConfig(**kwargs)
+def generate_one(
+    client: genai.Client,
+    args: argparse.Namespace,
+    image_config: types.ImageConfig,
+) -> bytes | None:
+    config_kwargs: dict = {
+        "response_modalities": ["IMAGE"],
+        "image_config": image_config,
+    }
+    if args.thinking_level and args.model == "gemini-3.1-flash-image-preview":
+        config_kwargs["thinking_config"] = types.ThinkingConfig(
+            thinking_level=args.thinking_level.capitalize(),
+        )
+    response = client.models.generate_content(
+        model=args.model,
+        contents=[args.prompt],
+        config=types.GenerateContentConfig(**config_kwargs),
+    )
+    for part in response.parts:
+        if part.thought:
+            continue
+        if part.inline_data is not None:
+            return part.inline_data.data
+    return None
+def main() -> int:
+    args = parse_args()
+    if not os.getenv("GEMINI_API_KEY"):
+        print("Missing GEMINI_API_KEY environment variable.", file=sys.stderr)
+        return 1
+    client = genai.Client()
+    image_config = build_image_config(args)
+    out_dir = Path(args.output_dir)
+    out_dir.mkdir(parents=True, exist_ok=True)
+    saved_files: list[str] = []
+    for idx in range(1, args.number_of_images + 1):
+        label = f" ({idx}/{args.number_of_images})" if args.number_of_images > 1 else ""
+        print(f"Generating image{label}...")
+        result: dict = {}
+        thread = threading.Thread(
+            target=lambda: result.update({"bytes": generate_one(client, args, image_config)}),
+            daemon=True,
+        )
+        started_at = time.time()
+        thread.start()
+        while thread.is_alive():
+            thread.join(timeout=10)
+            if thread.is_alive():
+                elapsed = int(time.time() - started_at)
+                print(f"Waiting for image generation... elapsed: {elapsed}s")
+        elapsed = int(time.time() - started_at)
+        print(f"Image generation finished in {elapsed}s")
+        image_bytes = result.get("bytes")
+        if image_bytes is None:
+            print(f"No image returned for generation {idx}.", file=sys.stderr)
+            continue
+        if args.number_of_images == 1:
+            out_path = out_dir / f"{args.name}.png"
+        else:
+            out_path = out_dir / f"{args.name}_{idx}.png"
+        out_path.write_bytes(image_bytes)
+        saved_files.append(str(out_path.resolve()))
+        print(f"Saved image: {out_path.resolve()}")
+    if not saved_files:
+        print("No images were saved.", file=sys.stderr)
+        return 2
+    metadata: dict = {
+        "prompt": args.prompt,
+        "model": args.model,
+        "config": {
+            "aspect_ratio": args.aspect_ratio,
+            "resolution": args.resolution,
+            "number_of_images": args.number_of_images,
+            "thinking_level": args.thinking_level,
+        },
+        "saved_images": saved_files,
+    }
+    metadata_path = out_dir / f"{args.name}.json"
+    metadata_path.write_text(json.dumps(metadata, indent=2), encoding="utf-8")
+    print(f"Saved metadata: {metadata_path.resolve()}")
+    return 0
+if __name__ == "__main__":
+    raise SystemExit(main())

gen_lyrics_batch.py ADDED Viewed

	@@ -0,0 +1,156 @@

+#!/usr/bin/env python3
+"""
+Generate one image per row of lyrics from a text file.
+Each line is used as the Chinese characters in the image generation prompt.
+"""
+import argparse
+import subprocess
+import sys
+from pathlib import Path
+def parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(
+        description="Generate images for each line of lyrics from a text file."
+    )
+    parser.add_argument(
+        "--lyrics-file",
+        "--lyrics_file",
+        dest="lyrics_file",
+        required=True,
+        help="Path to the lyrics text file (one line per image).",
+    )
+    parser.add_argument(
+        "--input-image-path",
+        "--input_image_path",
+        dest="input_image_path",
+        required=True,
+        help="Path to the primary conditioning image.",
+    )
+    parser.add_argument(
+        "--output-dir",
+        "--output_dir",
+        dest="output_dir",
+        default="output_dir",
+        help="Directory to save outputs (default: output_dir).",
+    )
+    parser.add_argument(
+        "--model",
+        default="gemini-3.1-flash-image-preview",
+        help="Image generation model name.",
+    )
+    parser.add_argument(
+        "--aspect-ratio",
+        default="16:9",
+        help="Aspect ratio (e.g. 1:1, 16:9, 9:16).",
+    )
+    parser.add_argument(
+        "--resolution",
+        default="2K",
+        help="Output resolution: 512px, 1K, 2K, or 4K.",
+    )
+    parser.add_argument(
+        "--extra-image-paths",
+        dest="extra_image_paths",
+        nargs="*",
+        default=[],
+        help="Optional additional conditioning image paths.",
+    )
+    parser.add_argument(
+        "--thinking-level",
+        default=None,
+        choices=["minimal", "high"],
+        help="Thinking level for Gemini 3.1 Flash Image.",
+    )
+    parser.add_argument(
+        "--row-ids",
+        "--row_ids",
+        dest="row_ids",
+        type=int,
+        nargs="*",
+        default=None,
+        help="Specific row IDs to generate (1-based). If not set, generate all.",
+    )
+    return parser.parse_args()
+def build_prompt(chars: str) -> str:
+    """Build the image generation prompt for the given Chinese characters."""
+    return f"""
+Replace the chinese characters with '{chars}'.
+Black text on pure white background. The thickness of the strokes should be consistent with the original image. One character.
+Strictly follow the font of the original image.
+""".strip()
+def main() -> int:
+    args = parse_args()
+    lyrics_path = Path(args.lyrics_file).expanduser().resolve()
+    if not lyrics_path.exists():
+        print(f"Error: Lyrics file not found: {lyrics_path}", file=sys.stderr)
+        return 1
+    lines = lyrics_path.read_text(encoding="utf-8").strip().splitlines()
+    # row_id = 1-based line number in file (correlates to txt row, enables selective generation later)
+    rows_to_generate = [(i, line.strip()) for i, line in enumerate(lines, start=1) if line.strip()]
+    if args.row_ids is not None:
+        row_ids_set = set(args.row_ids)
+        rows_to_generate = [(row_id, chars) for row_id, chars in rows_to_generate if row_id in row_ids_set]
+        if not rows_to_generate:
+            print("Error: No matching rows found for the given row IDs.", file=sys.stderr)
+            return 1
+    if not rows_to_generate:
+        print("Error: No non-empty lines in lyrics file.", file=sys.stderr)
+        return 1
+    output_dir = Path(args.output_dir)
+    output_dir.mkdir(parents=True, exist_ok=True)
+    script_dir = Path(__file__).resolve().parent
+    gen_script = script_dir / "gen_image_image_cond.py"
+    for idx, (row_id, chars) in enumerate(rows_to_generate, start=1):
+        name = f"row_{row_id}"
+        prompt = build_prompt(chars)
+        cmd = [
+            sys.executable,
+            str(gen_script),
+            "--prompt",
+            prompt,
+            "--input-image-path",
+            args.input_image_path,
+            "--output-dir",
+            str(output_dir),
+            "--name",
+            name,
+            "--model",
+            args.model,
+            "--aspect-ratio",
+            args.aspect_ratio,
+            "--resolution",
+            args.resolution,
+            "--number-of-images",
+            "1",
+        ]
+        if args.extra_image_paths:
+            cmd.extend(["--extra-image-paths"] + args.extra_image_paths)
+        if args.thinking_level:
+            cmd.extend(["--thinking-level", args.thinking_level])
+        print(f"[{idx}/{len(rows_to_generate)}] Row {row_id}: '{chars}' -> {output_dir / f'{name}.png'}")
+        result = subprocess.run(cmd)
+        if result.returncode != 0:
+            print(f"Error: Failed to generate image for row {row_id} ('{chars}')", file=sys.stderr)
+            return result.returncode
+    print(f"\nDone. Generated {len(rows_to_generate)} images in {output_dir}")
+    return 0
+if __name__ == "__main__":
+    raise SystemExit(main())

gen_video_image_start_end.py ADDED Viewed

	@@ -0,0 +1,194 @@

+#!/usr/bin/env python3
+import argparse
+import json
+import os
+import subprocess
+import sys
+import tempfile
+import time
+from pathlib import Path
+from google import genai
+from google.genai import types
+def parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(
+        description=(
+            "Generate a video conditioned on a start frame and an optional end frame. "
+            "If --end-image-path is omitted, the start image is reused as the end frame."
+        )
+    )
+    parser.add_argument("--prompt", required=True, help="Prompt describing the video.")
+    parser.add_argument(
+        "--start-image-path",
+        "--start_image_path",
+        dest="start_image_path",
+        required=True,
+        help="Path to the image used as the start (first) frame.",
+    )
+    parser.add_argument(
+        "--end-image-path",
+        "--end_image_path",
+        dest="end_image_path",
+        default=None,
+        help="Path to the image used as the end (last) frame. Defaults to the start image.",
+    )
+    parser.add_argument(
+        "--model",
+        default="veo-3.1-generate-preview",
+        help="Video generation model name.",
+    )
+    parser.add_argument("--name", default="generated_video", help="Base output filename.")
+    parser.add_argument(
+        "--output-dir",
+        "--output_dir",
+        dest="output_dir",
+        default="output_dir",
+        help="Directory to save outputs (default: output_dir).",
+    )
+    parser.add_argument("--resolution", default="720p", help="e.g. 720p, 1080p, 4k")
+    parser.add_argument("--duration", type=int, default=8, help="Video length in seconds.")
+    parser.add_argument(
+        "--aspect-ratio",
+        default="16:9",
+        help="Aspect ratio (e.g. 16:9, 9:16).",
+    )
+    parser.add_argument(
+        "--negative-prompt",
+        default="blurry, low quality, artifacts, text overlay, watermark",
+        help="What to avoid.",
+    )
+    parser.add_argument(
+        "--number-of-videos",
+        type=int,
+        default=1,
+        help="How many videos to generate.",
+    )
+    parser.add_argument(
+        "--poll-seconds",
+        type=int,
+        default=10,
+        help="Polling interval while generation is running.",
+    )
+    return parser.parse_args()
+def strip_audio(video_path: Path) -> None:
+    """Remove audio track from video using ffmpeg (video stream copied, no re-encode)."""
+    with tempfile.NamedTemporaryFile(suffix=".mp4", delete=False) as f:
+        temp_path = Path(f.name)
+    try:
+        subprocess.run(
+            ["ffmpeg", "-y", "-i", str(video_path), "-an", "-c:v", "copy", str(temp_path)],
+            check=True,
+            capture_output=True,
+        )
+        temp_path.replace(video_path)
+    finally:
+        if temp_path.exists():
+            temp_path.unlink()
+def load_image(image_path: Path):
+    if not image_path.exists():
+        raise FileNotFoundError(f"Input image not found: {image_path}")
+    try:
+        return types.Image.from_file(location=str(image_path))
+    except TypeError:
+        # Compatibility fallback for SDK variants using positional arg.
+        return types.Image.from_file(str(image_path))
+def main() -> int:
+    args = parse_args()
+    if not os.getenv("GEMINI_API_KEY"):
+        print("Missing GEMINI_API_KEY environment variable.", file=sys.stderr)
+        return 1
+    start_path = Path(args.start_image_path).expanduser().resolve()
+    end_path = Path(args.end_image_path).expanduser().resolve() if args.end_image_path else start_path
+    print(f"Start frame: {start_path}")
+    print(f"End frame:   {end_path}")
+    first_image = load_image(start_path)
+    last_image = load_image(end_path)
+    client = genai.Client()
+    config = types.GenerateVideosConfig(
+        resolution=args.resolution,
+        duration_seconds=args.duration,
+        aspect_ratio=args.aspect_ratio,
+        negative_prompt=args.negative_prompt,
+        number_of_videos=args.number_of_videos,
+        last_frame=last_image,
+    )
+    operation = client.models.generate_videos(
+        model=args.model,
+        prompt=args.prompt,
+        image=first_image,
+        config=config,
+    )
+    started_at = time.time()
+    while not operation.done:
+        elapsed_seconds = int(time.time() - started_at)
+        print(f"Waiting for video generation... elapsed: {elapsed_seconds}s")
+        time.sleep(args.poll_seconds)
+        operation = client.operations.get(operation)
+    generated = operation.response.generated_videos
+    if not generated:
+        print("No videos returned by API.", file=sys.stderr)
+        return 2
+    out_dir = Path(args.output_dir)
+    out_dir.mkdir(parents=True, exist_ok=True)
+    base_name = args.name
+    saved_files = []
+    if len(generated) == 1:
+        video_obj = generated[0].video
+        client.files.download(file=video_obj)
+        out_path = out_dir / f"{base_name}.mp4"
+        video_obj.save(str(out_path))
+        strip_audio(out_path)
+        saved_files.append(str(out_path.resolve()))
+        print(f"Saved video: {out_path.resolve()}")
+    else:
+        for idx, item in enumerate(generated, start=1):
+            video_obj = item.video
+            client.files.download(file=video_obj)
+            each_path = out_dir / f"{base_name}_{idx}.mp4"
+            video_obj.save(str(each_path))
+            strip_audio(each_path)
+            saved_files.append(str(each_path.resolve()))
+            print(f"Saved video: {each_path.resolve()}")
+    metadata_path = out_dir / f"{base_name}.json"
+    metadata = {
+        "prompt": args.prompt,
+        "model": args.model,
+        "start_image_path": str(start_path),
+        "end_image_path": str(end_path),
+        "config": {
+            "resolution": args.resolution,
+            "duration_seconds": args.duration,
+            "aspect_ratio": args.aspect_ratio,
+            "negative_prompt": args.negative_prompt,
+            "number_of_videos": args.number_of_videos,
+            "poll_seconds": args.poll_seconds,
+        },
+        "saved_videos": saved_files,
+    }
+    metadata_path.write_text(json.dumps(metadata, indent=2), encoding="utf-8")
+    print(f"Saved metadata: {metadata_path.resolve()}")
+    return 0
+if __name__ == "__main__":
+    raise SystemExit(main())

gen_video_prompt_only.py ADDED Viewed

	@@ -0,0 +1,155 @@

+#!/usr/bin/env python3
+import argparse
+import json
+import os
+import subprocess
+import sys
+import tempfile
+import time
+from pathlib import Path
+from google import genai
+from google.genai import types
+def strip_audio(video_path: Path) -> None:
+    """Remove audio track from video using ffmpeg (video stream copied, no re-encode)."""
+    with tempfile.NamedTemporaryFile(suffix=".mp4", delete=False) as f:
+        temp_path = Path(f.name)
+    try:
+        subprocess.run(
+            ["ffmpeg", "-y", "-i", str(video_path), "-an", "-c:v", "copy", str(temp_path)],
+            check=True,
+            capture_output=True,
+        )
+        temp_path.replace(video_path)
+    finally:
+        if temp_path.exists():
+            temp_path.unlink()
+def parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(
+        description="Generate a video from a text prompt using Gemini (Veo)."
+    )
+    parser.add_argument("--prompt", required=True, help="Prompt describing the video.")
+    parser.add_argument(
+        "--model",
+        default="veo-3.1-generate-preview",
+        help="Video generation model name.",
+    )
+    parser.add_argument("--name", default="generated_video", help="Base output filename.")
+    parser.add_argument(
+        "--output-dir",
+        "--output_dir",
+        dest="output_dir",
+        default="output_dir",
+        help="Directory to save outputs (default: output_dir).",
+    )
+    parser.add_argument("--resolution", default="1080p", help="e.g. 720p, 1080p, 4k")
+    parser.add_argument("--duration", type=int, default=8, help="Video length in seconds.")
+    parser.add_argument(
+        "--aspect-ratio",
+        default="16:9",
+        help="Aspect ratio (e.g. 16:9, 9:16, 1:1).",
+    )
+    parser.add_argument(
+        "--negative-prompt",
+        default="blurry, low quality, artifacts, text overlay, watermark",
+        help="What to avoid.",
+    )
+    parser.add_argument(
+        "--number-of-videos",
+        type=int,
+        default=1,
+        help="How many videos to generate.",
+    )
+    parser.add_argument(
+        "--poll-seconds",
+        type=int,
+        default=10,
+        help="Polling interval while generation is running.",
+    )
+    return parser.parse_args()
+def main() -> int:
+    args = parse_args()
+    if not os.getenv("GEMINI_API_KEY"):
+        print("Missing GEMINI_API_KEY environment variable.", file=sys.stderr)
+        return 1
+    client = genai.Client()
+    config = types.GenerateVideosConfig(
+        resolution=args.resolution,
+        duration_seconds=args.duration,
+        aspect_ratio=args.aspect_ratio,
+        negative_prompt=args.negative_prompt,
+        number_of_videos=args.number_of_videos,
+    )
+    operation = client.models.generate_videos(
+        model=args.model,
+        prompt=args.prompt,
+        config=config,
+    )
+    started_at = time.time()
+    while not operation.done:
+        elapsed_seconds = int(time.time() - started_at)
+        print(f"Waiting for video generation... elapsed: {elapsed_seconds}s")
+        time.sleep(args.poll_seconds)
+        operation = client.operations.get(operation)
+    generated = operation.response.generated_videos
+    if not generated:
+        print("No videos returned by API.", file=sys.stderr)
+        return 2
+    out_dir = Path(args.output_dir)
+    out_dir.mkdir(parents=True, exist_ok=True)
+    base_name = args.name
+    saved_files = []
+    if len(generated) == 1:
+        video_obj = generated[0].video
+        client.files.download(file=video_obj)
+        out_path = out_dir / f"{base_name}.mp4"
+        video_obj.save(str(out_path))
+        strip_audio(out_path)
+        saved_files.append(str(out_path.resolve()))
+        print(f"Saved video: {out_path.resolve()}")
+    else:
+        for idx, item in enumerate(generated, start=1):
+            video_obj = item.video
+            client.files.download(file=video_obj)
+            each_path = out_dir / f"{base_name}_{idx}.mp4"
+            video_obj.save(str(each_path))
+            strip_audio(each_path)
+            saved_files.append(str(each_path.resolve()))
+            print(f"Saved video: {each_path.resolve()}")
+    metadata_path = out_dir / f"{base_name}.json"
+    metadata = {
+        "prompt": args.prompt,
+        "model": args.model,
+        "config": {
+            "resolution": args.resolution,
+            "duration_seconds": args.duration,
+            "aspect_ratio": args.aspect_ratio,
+            "negative_prompt": args.negative_prompt,
+            "number_of_videos": args.number_of_videos,
+            "poll_seconds": args.poll_seconds,
+        },
+        "saved_videos": saved_files,
+    }
+    metadata_path.write_text(json.dumps(metadata, indent=2), encoding="utf-8")
+    print(f"Saved metadata: {metadata_path.resolve()}")
+    return 0
+if __name__ == "__main__":
+    raise SystemExit(main())

gen_video_prompt_only_extend.py ADDED Viewed

	@@ -0,0 +1,281 @@

+#!/usr/bin/env python3
+"""
+Generate a video from a text prompt and optionally extend it multiple times.
+Final length = duration * (num_extend + 1).
+Extension only works with VEO-generated videos (API rejects non-VEO sources).
+"""
+import argparse
+import json
+import os
+import subprocess
+import sys
+import tempfile
+import time
+from pathlib import Path
+from google import genai
+from google.genai import types
+def strip_audio(video_path: Path) -> None:
+    """Remove audio track from video using ffmpeg (video stream copied, no re-encode)."""
+    with tempfile.NamedTemporaryFile(suffix=".mp4", delete=False) as f:
+        temp_path = Path(f.name)
+    try:
+        subprocess.run(
+            ["ffmpeg", "-y", "-i", str(video_path), "-an", "-c:v", "copy", str(temp_path)],
+            check=True,
+            capture_output=True,
+        )
+        temp_path.replace(video_path)
+    finally:
+        if temp_path.exists():
+            temp_path.unlink()
+def load_image(image_path: Path):
+    """Load an image file into a types.Image for video conditioning."""
+    if not image_path.exists():
+        raise FileNotFoundError(f"Input image not found: {image_path}")
+    try:
+        return types.Image.from_file(location=str(image_path))
+    except TypeError:
+        return types.Image.from_file(str(image_path))
+def parse_args() -> argparse.Namespace:
+    parser = argparse.ArgumentParser(
+        description="Generate a video from a text prompt and optionally extend it (VEO only)."
+    )
+    parser.add_argument(
+        "--prompt",
+        action="append",
+        required=True,
+        help="Prompt(s) for video. Pass once for all segments, or num_extend+1 times for initial + each extension.",
+    )
+    parser.add_argument(
+        "--model",
+        default="veo-3.1-generate-preview",
+        help="Video generation model name.",
+    )
+    parser.add_argument("--name", default="generated_video", help="Base output filename.")
+    parser.add_argument(
+        "--output-dir",
+        "--output_dir",
+        dest="output_dir",
+        default="output_dir",
+        help="Directory to save outputs (default: output_dir).",
+    )
+    parser.add_argument("--resolution", default="1080p", help="e.g. 720p, 1080p, 4k")
+    parser.add_argument("--duration", type=int, default=8, help="Video length in seconds.")
+    parser.add_argument(
+        "--aspect-ratio",
+        default="16:9",
+        help="Aspect ratio (e.g. 16:9, 9:16, 1:1).",
+    )
+    parser.add_argument(
+        "--negative-prompt",
+        default="blurry, low quality, artifacts, text overlay, watermark",
+        help="What to avoid.",
+    )
+    parser.add_argument(
+        "--number-of-videos",
+        type=int,
+        default=1,
+        help="How many videos to generate. When num-extend > 0, only the first is extended.",
+    )
+    parser.add_argument(
+        "--num-extend",
+        type=int,
+        default=0,
+        help="How many times to extend the video. Final length = duration * (num_extend + 1).",
+    )
+    parser.add_argument(
+        "--start-image",
+        "--start_image",
+        dest="start_image",
+        default=None,
+        help="Path to image used as the first frame (initial generation only).",
+    )
+    parser.add_argument(
+        "--end-image",
+        "--end_image",
+        dest="end_image",
+        default=None,
+        help="Path to image used as the last frame (initial generation only; extensions do not support image conditioning).",
+    )
+    parser.add_argument(
+        "--poll-seconds",
+        type=int,
+        default=10,
+        help="Polling interval while generation is running.",
+    )
+    return parser.parse_args()
+def main() -> int:
+    args = parse_args()
+    if not os.getenv("GEMINI_API_KEY"):
+        print("Missing GEMINI_API_KEY environment variable.", file=sys.stderr)
+        return 1
+    if args.num_extend < 0:
+        print("--num-extend must be >= 0.", file=sys.stderr)
+        return 1
+    prompts: list[str] = args.prompt
+    if len(prompts) > 1:
+        expected = args.num_extend + 1
+        if len(prompts) != expected:
+            print(
+                f"With {len(prompts)} prompts, expected num_extend+1 = {expected}. "
+                f"Got num_extend={args.num_extend}.",
+                file=sys.stderr,
+            )
+            return 1
+    else:
+        prompts = [prompts[0]] * (args.num_extend + 1)
+    client = genai.Client()
+    first_image = None
+    if args.start_image:
+        start_path = Path(args.start_image).expanduser().resolve()
+        first_image = load_image(start_path)
+        print(f"Using start image: {start_path}")
+    last_image = None
+    if args.end_image:
+        end_path = Path(args.end_image).expanduser().resolve()
+        last_image = load_image(end_path)
+        print(f"Using end image: {end_path}")
+    config_kwargs = {
+        "resolution": args.resolution,
+        "duration_seconds": args.duration,
+        "aspect_ratio": args.aspect_ratio,
+        "negative_prompt": args.negative_prompt,
+        "number_of_videos": args.number_of_videos,
+    }
+    if last_image is not None:
+        config_kwargs["last_frame"] = last_image
+    config = types.GenerateVideosConfig(**config_kwargs)
+    # Initial generation
+    print("Generating initial video...")
+    gen_kwargs = {"model": args.model, "prompt": prompts[0], "config": config}
+    if first_image is not None:
+        gen_kwargs["image"] = first_image
+    operation = client.models.generate_videos(**gen_kwargs)
+    started_at = time.time()
+    while not operation.done:
+        elapsed_seconds = int(time.time() - started_at)
+        print(f"Waiting for video generation... elapsed: {elapsed_seconds}s")
+        time.sleep(args.poll_seconds)
+        operation = client.operations.get(operation)
+    if operation.response is None:
+        err = getattr(operation, "error", None)
+        print(f"API returned no response. Error: {err}", file=sys.stderr)
+        return 2
+    generated = operation.response.generated_videos
+    if not generated:
+        print("No videos returned by API.", file=sys.stderr)
+        return 2
+    out_dir = Path(args.output_dir)
+    out_dir.mkdir(parents=True, exist_ok=True)
+    base_name = args.name
+    saved_files = []
+    # Save initial video as _1 (when extending, only first is used; when not, save all)
+    if args.num_extend > 0:
+        video_obj = generated[0].video
+        client.files.download(file=video_obj)
+        out_path = out_dir / f"{base_name}_1.mp4"
+        video_obj.save(str(out_path))
+        strip_audio(out_path)
+        saved_files.append(str(out_path.resolve()))
+        print(f"Saved video: {out_path.resolve()}")
+    else:
+        for idx, item in enumerate(generated, start=1):
+            video_obj = item.video
+            client.files.download(file=video_obj)
+            out_path = out_dir / f"{base_name}_{idx}.mp4"
+            video_obj.save(str(out_path))
+            strip_audio(out_path)
+            saved_files.append(str(out_path.resolve()))
+            print(f"Saved video: {out_path.resolve()}")
+    # Extend num_extend times (only extends the first video; each stage saved as _2, _3, ...)
+    for ext_idx in range(args.num_extend):
+        print(f"Extending video ({ext_idx + 1}/{args.num_extend})...")
+        video_to_extend = generated[0].video
+        client.files.download(file=video_to_extend)
+        extend_config = types.GenerateVideosConfig(
+            number_of_videos=1,
+            resolution=args.resolution,
+        )
+        operation = client.models.generate_videos(
+            model=args.model,
+            video=video_to_extend,
+            prompt=prompts[ext_idx + 1],
+            config=extend_config,
+        )
+        started_at = time.time()
+        while not operation.done:
+            elapsed_seconds = int(time.time() - started_at)
+            print(f"Waiting for extension... elapsed: {elapsed_seconds}s")
+            time.sleep(args.poll_seconds)
+            operation = client.operations.get(operation)
+        if operation.response is None:
+            err = getattr(operation, "error", None)
+            print(f"Extension API returned no response. Error: {err}", file=sys.stderr)
+            return 2
+        generated = operation.response.generated_videos
+        if not generated:
+            print("No videos returned by extension API.", file=sys.stderr)
+            return 2
+        # Save this extended video as _2, _3, _4, etc.
+        video_idx = ext_idx + 2
+        video_obj = generated[0].video
+        client.files.download(file=video_obj)
+        out_path = out_dir / f"{base_name}_{video_idx}.mp4"
+        video_obj.save(str(out_path))
+        strip_audio(out_path)
+        saved_files.append(str(out_path.resolve()))
+        print(f"Saved video: {out_path.resolve()}")
+    final_duration_approx = args.duration * (args.num_extend + 1)
+    metadata_path = out_dir / f"{base_name}.json"
+    metadata = {
+        "prompts": prompts,
+        "model": args.model,
+        "config": {
+            "resolution": args.resolution,
+            "duration_seconds": args.duration,
+            "num_extend": args.num_extend,
+            "final_duration_approx_seconds": final_duration_approx,
+            "aspect_ratio": args.aspect_ratio,
+            "negative_prompt": args.negative_prompt,
+            "number_of_videos": args.number_of_videos,
+            "poll_seconds": args.poll_seconds,
+            "start_image": str(Path(args.start_image).expanduser().resolve()) if args.start_image else None,
+            "end_image": str(Path(args.end_image).expanduser().resolve()) if args.end_image else None,
+        },
+        "saved_videos": saved_files,
+    }
+    metadata_path.write_text(json.dumps(metadata, indent=2), encoding="utf-8")
+    print(f"Saved metadata: {metadata_path.resolve()}")
+    print(f"Final length (approx): {final_duration_approx}s")
+    return 0
+if __name__ == "__main__":
+    raise SystemExit(main())

generate_lyrics.sh ADDED Viewed

	@@ -0,0 +1,46 @@

+#!/usr/bin/env bash
+set -euo pipefail
+input_image_path=/Users/lehongwu/Projects/others/lyrics/VideoGeneration/contents_zhuyu/qilin_example.png
+export http_proxy="http://127.0.0.1:7890"
+export https_proxy="http://127.0.0.1:7890"
+if [[ -z "${GEMINI_API_KEY:-}" ]]; then
+  echo "Error: GEMINI_API_KEY is not set."
+  echo 'Run: export GEMINI_API_KEY="your_api_key"'
+  exit 1
+fi
+# if ! python3 -c "from google import genai" >/dev/null 2>&1; then
+#   echo "Installing dependency: google-genai"
+#   python3 -m pip install --upgrade google-genai
+# fi
+datetime=$(date +%m%d%H%M%S)
+name=gen_lyrics
+output_dir="output_image/${name}_${datetime}"
+# prompt="
+# A title 'Autumn Leaves' appears in the image, as if composed of leaves.
+# 'Autumn' in the first row and 'Leaves' in the second row.
+# Some leaves of different sizes and positions scatterred around the title, some are partially out of the image.
+# The leaves should look diverse in shapes, sizes, and colors should be red and gold. They come from the example image.
+# Black background.
+# "
+prompt="
+Put chinese characters '雪白的天色' in the center of the image.
+The sizes of each character should be consistent, and similar with the original image.
+Follow the font style of the original image. Black calligraphy on white background.
+"
+python gen_image_image_cond.py \
+  --prompt "$prompt" \
+  --input-image-path "$input_image_path" \
+  --model gemini-3.1-flash-image-preview \
+  --aspect-ratio 16:9 \
+  --resolution 2K \
+  --number-of-images 1 \
+  --name "$name" \
+  --output-dir "$output_dir"

generate_lyrics_batch.sh ADDED Viewed

	@@ -0,0 +1,48 @@

+#!/usr/bin/env bash
+set -euo pipefail
+# Edit these variables as needed
+script_dir="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
+lyrics_file="/Users/lehongwu/Projects/others/lyrics/VideoGeneration/contents_zhuyu/zhuyu_lyrics_v1.txt"
+input_image_path="/Users/lehongwu/Projects/others/lyrics/VideoGeneration/output_image/midian_example_0312135300/midian_example_2.png"
+model="gemini-3.1-flash-image-preview"
+aspect_ratio="16:9"
+resolution="1080p"
+# Output dir (default: output_image/gen_lyrics_batch_<timestamp>)
+datetime=$(date +%m%d%H%M%S)
+output_dir="${script_dir}/output_image/gen_lyrics_batch_${datetime}"
+# Specific row IDs to generate (empty = all). e.g. row_ids="1 5 10"
+row_ids="48 49 50 51 52 53 54 55 56 57"
+# Proxy (optional)
+export http_proxy="${http_proxy:-http://127.0.0.1:7890}"
+export https_proxy="${https_proxy:-http://127.0.0.1:7890}"
+if [[ -z "${GEMINI_API_KEY:-}" ]]; then
+  echo "Error: GEMINI_API_KEY is not set."
+  echo 'Run: export GEMINI_API_KEY="your_api_key"'
+  exit 1
+fi
+if [[ ! -f "$lyrics_file" ]]; then
+  echo "Error: Lyrics file not found: $lyrics_file"
+  exit 1
+fi
+echo "Lyrics file: $lyrics_file"
+echo "Input image: $input_image_path"
+echo "Output dir: $output_dir"
+[[ -n "$row_ids" ]] && echo "Row IDs: $row_ids"
+echo ""
+cmd=(python "$script_dir/gen_lyrics_batch.py" \
+  --lyrics-file "$lyrics_file" \
+  --input-image-path "$input_image_path" \
+  --output-dir "$output_dir" \
+  --model "$model" \
+  --aspect-ratio "$aspect_ratio" \
+  --resolution "$resolution")
+[[ -n "$row_ids" ]] && cmd+=(--row-ids $row_ids)
+"${cmd[@]}"

image_super_resolution.sh ADDED Viewed

	@@ -0,0 +1,45 @@

+#!/usr/bin/env bash
+set -euo pipefail
+input_image_path=/Users/lehongwu/Projects/others/lyrics/VideoGeneration/contents_zhuyu/midian_example.png
+export http_proxy="http://127.0.0.1:7890"
+export https_proxy="http://127.0.0.1:7890"
+if [[ -z "${GEMINI_API_KEY:-}" ]]; then
+  echo "Error: GEMINI_API_KEY is not set."
+  echo 'Run: export GEMINI_API_KEY="your_api_key"'
+  exit 1
+fi
+# if ! python3 -c "from google import genai" >/dev/null 2>&1; then
+#   echo "Installing dependency: google-genai"
+#   python3 -m pip install --upgrade google-genai
+# fi
+datetime=$(date +%m%d%H%M%S)
+name=super_resolution
+output_dir="output_image/${name}_${datetime}"
+# prompt="
+# A title 'Autumn Leaves' appears in the image, as if composed of leaves.
+# 'Autumn' in the first row and 'Leaves' in the second row.
+# Some leaves of different sizes and positions scatterred around the title, some are partially out of the image.
+# The leaves should look diverse in shapes, sizes, and colors should be red and gold. They come from the example image.
+# Black background.
+# "
+prompt="
+Make it higher resolution. Extract black characters on white background, but style unchanged.
+Only keep the second row of text "潮湿的路上" and place it in the center of the image.
+"
+python gen_image_image_cond.py \
+  --prompt "$prompt" \
+  --input-image-path "$input_image_path" \
+  --model gemini-3.1-flash-image-preview \
+  --aspect-ratio 16:9 \
+  --resolution 1080p \
+  --number-of-images 1 \
+  --name "$name" \
+  --output-dir "$output_dir"

run_gen_image_image_cond.sh ADDED Viewed

	@@ -0,0 +1,42 @@

+#!/usr/bin/env bash
+set -euo pipefail
+input_image_path=/Users/lehongwu/Projects/others/lyrics/VideoGeneration/output_video/leaves_video_0309184011/debug_frame6.png
+export http_proxy="http://127.0.0.1:7890"
+export https_proxy="http://127.0.0.1:7890"
+if [[ -z "${GEMINI_API_KEY:-}" ]]; then
+  echo "Error: GEMINI_API_KEY is not set."
+  echo 'Run: export GEMINI_API_KEY="your_api_key"'
+  exit 1
+fi
+# if ! python3 -c "from google import genai" >/dev/null 2>&1; then
+#   echo "Installing dependency: google-genai"
+#   python3 -m pip install --upgrade google-genai
+# fi
+datetime=$(date +%m%d%H%M%S)
+name=flowers_to_leaves
+output_dir="output_image/${name}_${datetime}"
+# prompt="
+# Transform this image into: Autumn leaves drifting on the black background, but the locations can follow the original image.
+# The autumn leaves should look diverse in shapes, and colors should be red and gold.
+# The sizes are diverse, especially small ones, as if they are far from the camera.
+# Not all leaves are facing the camera, instead, they are in random directions as if drifting in the wind.
+# But overall, the leaves should be sparse and not too large.
+# "
+prompt="Change the background to gold and with soft sun glows from top."
+python gen_image_image_cond.py \
+  --prompt "$prompt" \
+  --input-image-path "$input_image_path" \
+  --model gemini-3.1-flash-image-preview \
+  --aspect-ratio 16:9 \
+  --resolution 2K \
+  --number-of-images 1 \
+  --name "$name" \
+  --output-dir "$output_dir"

run_gen_image_prompt_only.sh ADDED Viewed

	@@ -0,0 +1,36 @@

+#!/usr/bin/env bash
+set -euo pipefail
+export http_proxy="http://127.0.0.1:7890"
+export https_proxy="http://127.0.0.1:7890"
+if [[ -z "${GEMINI_API_KEY:-}" ]]; then
+  echo "Error: GEMINI_API_KEY is not set."
+  echo 'Run: export GEMINI_API_KEY="your_api_key"'
+  exit 1
+fi
+# if ! python3 -c "from google import genai" >/dev/null 2>&1; then
+#   echo "Installing dependency: google-genai"
+#   python3 -m pip install --upgrade google-genai
+# fi
+datetime=$(date +%m%d%H%M%S)
+name=leaves
+output_dir="output_image/${name}_${datetime}"
+prompt="
+Autumn leaves drifting on the black background, but the locations can follow the original image.
+The autumn leaves should look diverse in shapes, sizes, and colors should be red and gold.
+Not all leaves are facing the camera, instead, they are in random directions as if drifting in the wind.
+But overall, the leaves should not be too dense or too large.
+"
+python gen_image_prompt_only.py \
+  --prompt "$prompt" \
+  --model gemini-3.1-flash-image-preview \
+  --aspect-ratio 16:9 \
+  --resolution 2K \
+  --number-of-images 1 \
+  --name $name \
+  --output-dir $output_dir

run_gen_video_image_start_end.sh ADDED Viewed

	@@ -0,0 +1,30 @@

+#!/usr/bin/env bash
+set -euo pipefail
+start_image_path=/Users/lehongwu/Projects/others/lyrics/VideoGeneration/contents_zhuyu/bg3_frame0.png
+export http_proxy="http://127.0.0.1:7890"
+export https_proxy="http://127.0.0.1:7890"
+if [[ -z "${GEMINI_API_KEY:-}" ]]; then
+  echo "Error: GEMINI_API_KEY is not set."
+  echo 'Run: export GEMINI_API_KEY="your_api_key"'
+  exit 1
+fi
+datetime=$(date +%m%d%H%M%S)
+name=img_cond_start_end
+output_dir="output/${name}_${datetime}"
+prompt="
+Transform this image into: The water flows slowly under the moonlight. Loop video.
+"
+python gen_video_image_start_end.py \
+  --prompt "$prompt" \
+  --start-image-path "$start_image_path" \
+  --resolution 4k \
+  --duration 8 \
+  --aspect-ratio 16:9 \
+  --name "$name" \
+  --output_dir "$output_dir"

run_gen_video_image_start_end_diff.sh ADDED Viewed

	@@ -0,0 +1,38 @@

+#!/usr/bin/env bash
+set -euo pipefail
+start_image_path=/Users/lehongwu/Projects/others/VideoGeneration/input/example.png
+end_image_path=/Users/lehongwu/Projects/others/VideoGeneration/input/example_end.png
+export http_proxy="http://127.0.0.1:7890"
+export https_proxy="http://127.0.0.1:7890"
+if [[ -z "${GEMINI_API_KEY:-}" ]]; then
+  echo "Error: GEMINI_API_KEY is not set."
+  echo 'Run: export GEMINI_API_KEY="your_api_key"'
+  exit 1
+fi
+# if ! python3 -c "from google import genai" >/dev/null 2>&1; then
+#   echo "Installing dependency: google-genai"
+#   python3 -m pip install --upgrade google-genai
+# fi
+datetime=$(date +%m%d%H%M%S)
+name=img_cond_diff
+output_dir="output/${name}_${datetime}"
+prompt="
+A cinematic transition where the scene smoothly morphs from the first image to the last image,
+with fluid motion, consistent lighting, and a natural, seamless progression.
+"
+python gen_video_image_start_end.py \
+  --prompt "$prompt" \
+  --start-image-path "$start_image_path" \
+  --end-image-path "$end_image_path" \
+  --resolution 4k \
+  --duration 8 \
+  --aspect-ratio 16:9 \
+  --name "$name" \
+  --output_dir "$output_dir"

run_gen_video_prompt_only.sh ADDED Viewed

	@@ -0,0 +1,35 @@

+#!/usr/bin/env bash
+set -euo pipefail
+export http_proxy="http://127.0.0.1:7890"
+export https_proxy="http://127.0.0.1:7890"
+if [[ -z "${GEMINI_API_KEY:-}" ]]; then
+  echo "Error: GEMINI_API_KEY is not set."
+  echo 'Run: export GEMINI_API_KEY="your_api_key"'
+  exit 1
+fi
+# if ! python3 -c "from google import genai" >/dev/null 2>&1; then
+#   echo "Installing dependency: google-genai"
+#   python3 -m pip install --upgrade google-genai
+# fi
+datetime=$(date +%m%d%H%M%S)
+name=leaves_video
+output_dir="output_video/${name}_${datetime}"
+prompt="
+Autumn leaves drifting slowly on the black background.
+The leaves are of different shapes, colors ranging from red to gold, and distances to camera are diverse.
+Overall the leaves should be sparse and extremely small, because this serves as background of some video.
+"
+python gen_video_prompt_only.py \
+  --prompt "$prompt" \
+  --resolution 720p \
+  --duration 8 \
+  --aspect-ratio 16:9 \
+  --name $name \
+  --output_dir $output_dir

run_gen_video_prompt_only_extend.sh ADDED Viewed

	@@ -0,0 +1,49 @@

+#!/usr/bin/env bash
+set -euo pipefail
+export http_proxy="http://127.0.0.1:7890"
+export https_proxy="http://127.0.0.1:7890"
+if [[ -z "${GEMINI_API_KEY:-}" ]]; then
+  echo "Error: GEMINI_API_KEY is not set."
+  echo 'Run: export GEMINI_API_KEY="your_api_key"'
+  exit 1
+fi
+# if ! python3 -c "from google import genai" >/dev/null 2>&1; then
+#   echo "Installing dependency: google-genai"
+#   python3 -m pip install --upgrade google-genai
+# fi
+datetime=$(date +%m%d%H%M%S)
+name=leaves_video_emit
+output_dir="output_video/${name}_${datetime}"
+# How many times to extend. Final length = duration * (num_extend + 1)
+num_extend=4
+# Must have num_extend+1 prompts (initial + one per extension)
+prompts=(
+  "Autumn leaves drifting slowly on the black background. The leaves are of different shapes, colors ranging from red to gold, and distances to camera are diverse. Overall the leaves should be sparse and extremely small. Start from a black image."
+  "Autumn leaves drifting slowly on the black background. The leaves are of different shapes, colors ranging from red to gold, and distances to camera are diverse. Overall the leaves should be sparse and extremely small. The density and speed of the leaves should be consistent."
+  "Autumn leaves drifting slowly on the black background. The leaves are of different shapes, colors ranging from red to gold, and distances to camera are diverse. Overall the leaves should be sparse and extremely small. The density and speed of the leaves should be consistent. More leaves are coming in the back."
+  "Autumn leaves drifting slowly on the black background. The leaves are of different shapes, colors ranging from red to gold, and distances to camera are diverse. More leaves. Even more leaves are coming in the back."
+  "Autumn leaves drifting slowly on the black background. The leaves are of different shapes, colors ranging from red to gold, and distances to camera are diverse. The density of leaves are consistent. Finally disappear and ends with a black image."
+)
+prompt_args=()
+for p in "${prompts[@]}"; do
+  prompt_args+=(--prompt "$p")
+done
+# video extension only supports 720p
+python gen_video_prompt_only_extend.py \
+  "${prompt_args[@]}" \
+  --resolution 720p \
+  --duration 8 \
+  --aspect-ratio 16:9 \
+  --num-extend $num_extend \
+  --start-image /Users/lehongwu/Projects/others/lyrics/VideoGeneration/output_video/leaves_video_emit_0310100712/leaves_video_debug_frame0.2.png \
+  --end-image /Users/lehongwu/Projects/others/lyrics/VideoGeneration/output_video/leaves_video_emit_0310100712/leaves_video_debug_frame11.png \
+  --name $name \
+  --output_dir $output_dir

run_gen_video_prompt_only_extend_2.sh ADDED Viewed

	@@ -0,0 +1,48 @@

+#!/usr/bin/env bash
+set -euo pipefail
+export http_proxy="http://127.0.0.1:7890"
+export https_proxy="http://127.0.0.1:7890"
+if [[ -z "${GEMINI_API_KEY:-}" ]]; then
+  echo "Error: GEMINI_API_KEY is not set."
+  echo 'Run: export GEMINI_API_KEY="your_api_key"'
+  exit 1
+fi
+# if ! python3 -c "from google import genai" >/dev/null 2>&1; then
+#   echo "Installing dependency: google-genai"
+#   python3 -m pip install --upgrade google-genai
+# fi
+datetime=$(date +%m%d%H%M%S)
+name=leaves_video_drop
+output_dir="output_video/${name}_${datetime}"
+# How many times to extend. Final length = duration * (num_extend + 1)
+num_extend=4
+# Must have num_extend+1 prompts (initial + one per extension)
+prompts=(
+  "Autumn leaves falling down at a fixed speed. The leaves are of different shapes, colors ranging from red to gold, and distances to camera are diverse. Change the background to gold and with soft sun glows from top. The dropping speed and leaves density should be consistent."
+  "Autumn leaves falling down at a fixed speed. The leaves are of different shapes, colors ranging from red to gold, and distances to camera are diverse. Change the background to gold and with soft sun glows from top. The dropping speed and leaves density should be consistent."
+  "Autumn leaves falling down at a fixed speed. The leaves are of different shapes, colors ranging from red to gold, and distances to camera are diverse. Change the background to gold and with soft sun glows from top. The dropping speed and leaves density should be consistent."
+  "Autumn leaves falling down at a fixed speed. The leaves are of different shapes, colors ranging from red to gold, and distances to camera are diverse. Change the background to gold and with soft sun glows from top. The dropping speed and leaves density should be consistent."
+  "Autumn leaves falling down at a fixed speed. The leaves are of different shapes, colors ranging from red to gold, and distances to camera are diverse. Change the background to gold and with soft sun glows from top. The dropping speed and leaves density should be consistent."
+)
+prompt_args=()
+for p in "${prompts[@]}"; do
+  prompt_args+=(--prompt "$p")
+done
+# video extension only supports 720p
+python gen_video_prompt_only_extend.py \
+  "${prompt_args[@]}" \
+  --resolution 720p \
+  --duration 8 \
+  --aspect-ratio 16:9 \
+  --num-extend $num_extend \
+  --start-image /Users/lehongwu/Projects/others/lyrics/VideoGeneration/output_video/leaves_video_0309184011/debug_frame0.png \
+  --end-image /Users/lehongwu/Projects/others/lyrics/VideoGeneration/output_video/leaves_video_0309184011/debug_frame6_gold.png \
+  --name $name \
+  --output_dir $output_dir

web/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ # Web application package

web/__pycache__/__init__.cpython-312.pyc ADDED Viewed

Binary file (151 Bytes). View file

web/backend/__init__.py ADDED Viewed

	@@ -0,0 +1 @@


1	+ # Backend package

web/backend/__pycache__/__init__.cpython-312.pyc ADDED Viewed

Binary file (159 Bytes). View file

web/backend/__pycache__/config.cpython-312.pyc ADDED Viewed

Binary file (1.44 kB). View file

web/backend/__pycache__/deps.cpython-312.pyc ADDED Viewed

Binary file (776 Bytes). View file