529
89
π₯ "Sirena" β Behind the Scenes
(see full music video here https://www.youtube.com/watch?v=r8V7WD2POIM )
Music: "Under The Water" by Mark DK Berry β Listen on Bandcamp - https://markdkberry.bandcamp.com/track/under-the-water-2
---
π§ββοΈ About the Project
"Sirena" is my seventh AI music video, and for this one, I deliberately stepped out of my comfort zone to tackle something different: an underwater romance.
My main goal was to improve image and animation quality across the board. Unfortunately, despite giving myself extra time and effort, I didnβt quite reach the level Iβd hoped for. While hardware played a part, character consistency and my limited knowledge were the real bottlenecks.
---
β οΈ Key Challenges
1. Character Consistency
A nightmare. I trained Flux Loras, which were decent, but unless itβs a full-face image, they donβt stay consistent.
- Face swapping? A black art. Gave up after days of fiddling.
- Wan 2.1 Lora training: Took two days to get working and produced okay results, but it slowed my machine so much it became unusable.
I'm holding out hope for keyframing and video inpainting (like VACE) as future solutions.
2. Legs, Hair, Body Shape
Characters often morphed or warped, even from behind. Three shots had bad leg warping that I didnβt have time to fix. The whole thing took 18 days β longer than I planned.
3. Clothing Consistency
I put real effort into this, but many outfits still didnβt stick. The trick seems to be choosing the right starting image β something Iβll refine next time.
4. Flux Tiling Artifacts
Only spotted these late in production. The culprit? Flux during upscaling. I patched some of it using SDXL upscale in Krita, but didnβt have time to redo every clip.
5. Psychological Toll
Once you go past the 8β10 day mark, you're chasing diminishing returns. At 18 days, I was questioning whether I was going insane tweaking pixels no one would notice or care about.
---
π§ Workflows & Tools
#### ComfyUI Workflows:
- Wan 2.1 workflow originated from - https://civitai.com/user/oscarchuncha654 . Still the best on my system after tweaking.
- For improving Wan 2.1 prompting check out - https://github.com/Wan-Video/Wan2.1/blob/main/wan/utils/prompt_extend.py
- Flux inpainting with Multi-Lora stacker.
- Krita + ACLY for inpainting and upscaling. I have shared it as an exported API version which I used for python batch overnight runs.
- Character creation (360Β°): - https://www.youtube.com/watch?v=8DRQenukHhk ( I used MickMumpitz Flux method too)
- Face swapping (Ace++): Worked OK but caused major issues β destroyed my ComfyUI install and conflicted with Kijai nodes and Florence 2 updates.
- Tried 20+ other workflows. Most got dropped after testing.
---
β±οΈ Time Investment
- 13 days to reach the first rough cut (with lots of testing + research)
- 1 day colour grading
- 1 day fixing what I could
β Total: 18 working days (and some nights running batch renders, and a number of additional days installing, re-installing, fixing broken installs, workflows, nodes, etc.)
---
π» Hardware
- GPU: RTX 3060 (12GB VRAM)
- RAM: 32GB
- OS: Windows 10
All of it was done on a regular home PC.
---
π§° Software Stack
- ComfyUI (Flux, Wan 2.1, inpainting models)
- Krita + ACLY plugin β Fast inpainting and upscaling
- Topaz β Used only for 16fps to 120fps interpolation (not enhancement)
- Reaper DAW β Storyboarding with shot names and timecode burned into MP4
- Davinci Resolve 19 β Final cut and colour grade
- LibreOffice β Tracking shot names, prompts, colour themes, fixes, etc.
---
π¨ Loras Used & Trained
- Trained Flux Loras for both the fisherman and mermaid (10 images each; ~3 hours per Lora)
- Trained Wan 2.1 Lora on WSL2 with help from [The Art Official](https://www.youtube.com/@TheArt-OfficialTrainer) β ultimately not usable due to i2v issues
- FLux Loras from Civitai used in the video (see example videos for their "looks"):
- Tango Dancer - https://civitai.com/models/96785/andrew-atroshenko-style
- Oil painting animation - https://civitai.com/models/757042/ob-oil-painting-with-bold-brushstrokes
- Impressionism paint effect - https://civitai.com/models/545264/impressionism-sdxl-pony-flux
---
πΊ Resolution & Rendering Details
- Flux output: 1344x768 (upscaled x2, then downscaled in Wan 2.1)
- Tested Wan 2.1 resolutions extensively:
- Started with 848x480 (too soft)
- Final choices: 1024x592 and 1344x768 (40β120 mins per clip run overnight in batches mostly)
- Interpolated all clips to 120fps via Topaz (Davinci Resolve Free allows up to 60fps)
- Tried to bring back an "analog" vibe in final color grading
---
π΅βπ« Final Thoughts
The biggest hurdle was still character consistency. I trained Loras, tested face-swapping, tried everything β but nothing quite nailed it. Underwater scenes and low-res footage made things harder.
Prompting and camera direction was another headache. Wan 2.1 is better than Hunyuan, but not exactly "obedient." I tried short prompts, long prompts, β3-sentenceβ tricks β mixed results. Minor setting changes often broke everything.
By the end, I was feeling frustrated. I had hoped for more photorealism and tighter characters. Instead, the video still felt cartoonish (though that was partly intentional). Maybe I haven't fully mastered Flux yet β or maybe the tech just isnβt quite there.
---
π Extra Thanks To:
- Cullen Kelly β Great color grading tutorials on YouTube
- The Art Official β Helped me get Wan training working in WSL2
"Time is the Enemy, Quality is the Battleground. Sacrifices must be made."
These are separate files that the creator has uploaded for this workflow.
Flux-inpainting w Multi Stacked Lora.json
flux image creation w-loras.json
Krita_Acly-upscaling-workflow.json
Ace++ - Face Swap method.json