991
252
The result of this workflow is in "Cafe" music video here https://www.youtube.com/watch?v=x70IGT41QpY&list=PLVCJTJhkunkQSY_QZBMFclmB9-LXOi8WY&index=5
follow my YT channel for future progress and workflows
This was the first one I did and so the other workflows are probably better as they improved over time.
Time Taken: It took 4 days, including learning new tools. I avoided the 3-month rabbit hole I fell into making “Fallen Angel” using Unreal Engine and Metahumans. This time, I stuck to a strict timeline of 5 days. Rendering 2 seconds (max my PC could do) of 512x416 video at 24 fps took 5–8 minutes per render. Many prompts no tweaking would fix—so I kept to strict time limits to get it done in 5 days. Day 1 was for main content, Day 2 for fixing ideas, Day 3 for tidying in DaVinci, and Day 4 for final edits and color grading (likely overdone—sorry, colorists).
Equipment & Tools: Software: ComfyUI AI (portable, free) with Hunyuan text-to-video models (GGUF for better results), DaVinci Resolve (free version), and FFmpeg for slowing clips and smoothing interpolation. Hardware: A Windows 10 PC with an RTX 3060 (12GB VRAM). 512x416 resolution balanced quality with my PC's capabilities. Bigger sizes caused issues, and smaller ones lost clarity. Prompts worked best when kept simple, e.g., “hot female model in a red pencil dress walking away at an old English train station, realistic and cinematic, daytime.” Current Challenges: AI generated max 2 seconds per prompt on my PC else it fell over, and prompts hit character limits around 350. While the results were clear, stretching 2 seconds to 8 via FFmpeg which worked to buy time, but added blur and distortion.