comfyui workflow for "Cafe" music video
created a month ago
character
upscale
video
lora
43 nodes

990

252

Credits
latentdream civitai provided underlying workflow
Outputs
Description

The result of this workflow is in "Cafe" music video here https://www.youtube.com/watch?v=x70IGT41QpY&list=PLVCJTJhkunkQSY_QZBMFclmB9-LXOi8WY&index=5

follow my YT channel for future progress and workflows

This was the first one I did and so the other workflows are probably better as they improved over time.

Time Taken: It took 4 days, including learning new tools. I avoided the 3-month rabbit hole I fell into making “Fallen Angel” using Unreal Engine and Metahumans. This time, I stuck to a strict timeline of 5 days. Rendering 2 seconds (max my PC could do) of 512x416 video at 24 fps took 5–8 minutes per render. Many prompts no tweaking would fix—so I kept to strict time limits to get it done in 5 days. Day 1 was for main content, Day 2 for fixing ideas, Day 3 for tidying in DaVinci, and Day 4 for final edits and color grading (likely overdone—sorry, colorists).

Equipment & Tools: Software: ComfyUI AI (portable, free) with Hunyuan text-to-video models (GGUF for better results), DaVinci Resolve (free version), and FFmpeg for slowing clips and smoothing interpolation. Hardware: A Windows 10 PC with an RTX 3060 (12GB VRAM). 512x416 resolution balanced quality with my PC's capabilities. Bigger sizes caused issues, and smaller ones lost clarity. Prompts worked best when kept simple, e.g., “hot female model in a red pencil dress walking away at an old English train station, realistic and cinematic, daytime.” Current Challenges: AI generated max 2 seconds per prompt on my PC else it fell over, and prompts hit character limits around 350. While the results were clear, stretching 2 seconds to 8 via FFmpeg which worked to buy time, but added blur and distortion.

Built-in nodes
SamplerCustomAdvanced
BasicGuider
ModelSamplingSD3
FluxGuidance
BetaSamplingScheduler
EmptyHunyuanLatentVideo
RandomNoise
Custom nodes
ImageUpscaleWithModel
ImageScale
FILM VFI
UpscaleModelLoader
MathExpression|pysssss
GetImageSize+
VAEDecodeTiled
LatentUpscaleBy
BasicScheduler
VHS_VideoCombine
VHS_LoadVideo
UnetLoaderGGUF
UNETLoader
KSamplerSelect
VAELoader
LoraLoaderModelOnly
DualCLIPLoader
CLIPTextEncode
0
0
0
0