"Sirena" - Music video by Mark DK Berry - workflows and process
created 12 days ago
character
inpainting
video
26 nodes

529

89

Credits
see the description
Outputs
Description

πŸŽ₯ "Sirena" – Behind the Scenes

(see full music video here https://www.youtube.com/watch?v=r8V7WD2POIM )

Music: "Under The Water" by Mark DK Berry – Listen on Bandcamp - https://markdkberry.bandcamp.com/track/under-the-water-2

---

πŸ§œβ€β™€οΈ About the Project

"Sirena" is my seventh AI music video, and for this one, I deliberately stepped out of my comfort zone to tackle something different: an underwater romance.

My main goal was to improve image and animation quality across the board. Unfortunately, despite giving myself extra time and effort, I didn’t quite reach the level I’d hoped for. While hardware played a part, character consistency and my limited knowledge were the real bottlenecks.

---

⚠️ Key Challenges

1. Character Consistency

A nightmare. I trained Flux Loras, which were decent, but unless it’s a full-face image, they don’t stay consistent.

- Face swapping? A black art. Gave up after days of fiddling.

- Wan 2.1 Lora training: Took two days to get working and produced okay results, but it slowed my machine so much it became unusable.

I'm holding out hope for keyframing and video inpainting (like VACE) as future solutions.

2. Legs, Hair, Body Shape

Characters often morphed or warped, even from behind. Three shots had bad leg warping that I didn’t have time to fix. The whole thing took 18 days β€” longer than I planned.

3. Clothing Consistency

I put real effort into this, but many outfits still didn’t stick. The trick seems to be choosing the right starting image β€” something I’ll refine next time.

4. Flux Tiling Artifacts

Only spotted these late in production. The culprit? Flux during upscaling. I patched some of it using SDXL upscale in Krita, but didn’t have time to redo every clip.

5. Psychological Toll

Once you go past the 8–10 day mark, you're chasing diminishing returns. At 18 days, I was questioning whether I was going insane tweaking pixels no one would notice or care about.

---

πŸ”§ Workflows & Tools

#### ComfyUI Workflows:

- Wan 2.1 workflow originated from - https://civitai.com/user/oscarchuncha654 . Still the best on my system after tweaking.

- For improving Wan 2.1 prompting check out - https://github.com/Wan-Video/Wan2.1/blob/main/wan/utils/prompt_extend.py

- Flux inpainting with Multi-Lora stacker.

- Krita + ACLY for inpainting and upscaling. I have shared it as an exported API version which I used for python batch overnight runs.

- Character creation (360Β°): - https://www.youtube.com/watch?v=8DRQenukHhk ( I used MickMumpitz Flux method too)

- Face swapping (Ace++): Worked OK but caused major issues β€” destroyed my ComfyUI install and conflicted with Kijai nodes and Florence 2 updates.

- Tried 20+ other workflows. Most got dropped after testing.

---

⏱️ Time Investment

- 13 days to reach the first rough cut (with lots of testing + research)

- 1 day colour grading

- 1 day fixing what I could

β†’ Total: 18 working days (and some nights running batch renders, and a number of additional days installing, re-installing, fixing broken installs, workflows, nodes, etc.)

---

πŸ’» Hardware

- GPU: RTX 3060 (12GB VRAM)

- RAM: 32GB

- OS: Windows 10

All of it was done on a regular home PC.

---

🧰 Software Stack

- ComfyUI (Flux, Wan 2.1, inpainting models)

- Krita + ACLY plugin – Fast inpainting and upscaling

- Topaz – Used only for 16fps to 120fps interpolation (not enhancement)

- Reaper DAW – Storyboarding with shot names and timecode burned into MP4

- Davinci Resolve 19 – Final cut and colour grade

- LibreOffice – Tracking shot names, prompts, colour themes, fixes, etc.

---

🎨 Loras Used & Trained

- Trained Flux Loras for both the fisherman and mermaid (10 images each; ~3 hours per Lora)

- Trained Wan 2.1 Lora on WSL2 with help from [The Art Official](https://www.youtube.com/@TheArt-OfficialTrainer) β€” ultimately not usable due to i2v issues

- FLux Loras from Civitai used in the video (see example videos for their "looks"):

- Tango Dancer - https://civitai.com/models/96785/andrew-atroshenko-style

- Oil painting animation - https://civitai.com/models/757042/ob-oil-painting-with-bold-brushstrokes

- Impressionism paint effect - https://civitai.com/models/545264/impressionism-sdxl-pony-flux

---

πŸ“Ί Resolution & Rendering Details

- Flux output: 1344x768 (upscaled x2, then downscaled in Wan 2.1)

- Tested Wan 2.1 resolutions extensively:

- Started with 848x480 (too soft)

- Final choices: 1024x592 and 1344x768 (40–120 mins per clip run overnight in batches mostly)

- Interpolated all clips to 120fps via Topaz (Davinci Resolve Free allows up to 60fps)

- Tried to bring back an "analog" vibe in final color grading

---

πŸ˜΅β€πŸ’« Final Thoughts

The biggest hurdle was still character consistency. I trained Loras, tested face-swapping, tried everything β€” but nothing quite nailed it. Underwater scenes and low-res footage made things harder.

Prompting and camera direction was another headache. Wan 2.1 is better than Hunyuan, but not exactly "obedient." I tried short prompts, long prompts, β€œ3-sentence” tricks β€” mixed results. Minor setting changes often broke everything.

By the end, I was feeling frustrated. I had hoped for more photorealism and tighter characters. Instead, the video still felt cartoonish (though that was partly intentional). Maybe I haven't fully mastered Flux yet β€” or maybe the tech just isn’t quite there.

---

πŸ™ Extra Thanks To:

- Cullen Kelly – Great color grading tutorials on YouTube

- The Art Official – Helped me get Wan training working in WSL2

"Time is the Enemy, Quality is the Battleground. Sacrifices must be made."

Built-in nodes
WanImageToVideo
ModelSamplingSD3
CFGZeroStar
Custom nodes
KSamplerAdvanced
ApplyTeaCachePatch
CLIPTextEncode
VAEDecodeTiled
VAELoader
CLIPLoader
CLIPVisionLoader
CLIPVisionEncode
LoadImage
easy cleanGpuUsed
LayerUtility: PurgeVRAM
ImageResizeKJ
PathchSageAttentionKJ
VHS_VideoCombine
ImageScale
RIFE VFI
VAEDecode
WanVideoTeaCacheKJ
UnetLoaderGGUF
SkipLayerGuidanceWanVideo
Custom files
new

These are separate files that the creator has uploaded for this workflow.

Flux-inpainting w Multi Stacked Lora.json

flux image creation w-loras.json

Krita_Acly-upscaling-workflow.json

Ace++ - Face Swap method.json

0
0
0
0