AI Style Transfer Video With Voice Replacement
Midjourney + Ebsynth + Voice.ai = Your Avatar on Acid
One of the problems I’m currently throwing myself at in the Jungle is creating some new video workflows with the usual tools and new AI tools that seem to pop up every day.
There’s a lot to keep up with, there will be a growing need for video content into 2023, and I’m but one cartoon. Can’t do it all.
So today we’re exploring a bit of what I’ve been working on and give you a feel for how you might be able to to use some of these tools for your own efforts.
Lets start with not doxing your voice. Lots of people don’t care about this. But say you have a large podcast or youtube following and want to deploy those skills in the Jungle where you can be yourself without worry about the cancel culturalists coming after you.
Traditional voice change software is something simple like a vocoder effect that makes you sound like a robot or pitch shift which can be easily reversed. On the video side, you could map a blur over your face, wear a physical mask, or just talk over slides instead of showing your physical location.
With the progress of AI absolutely flying we have new options developing constantly.
Remember the crypto bull market and how frantic everything felt? Like constantly trying to drink out of a firehose in February? AI media technology is like that, except I see no reason for it to slow down anytime soon.
So buckle up, today we’re going to be putting together some traditional pieces of software with some bleeding edge tech to generate an AI video like this:
The bad news is there is not an easy way to do this yet and probably won’t be for a while. Most tools I come across are really boring corporate training video AI generators.
The good news is it is possible if you’re determined right now and you’ll get a glimpse of what the future will be like as technologies like real time AI voice changers and Neural Radiance Fields (NeRF) move along. Eventually the process here will be an app you log into where you can look and sound like whatever you want, Ready Player One style.
But we’re not there yet. So for now we grind it out.
Some paid software like Photoshop will make certain steps easier, but I’ve done my best to stick to free stuff as much as possible.
Voice.ai
To generate your AI voice, you will be using Voice.ai
(I get a few free credits to test with if you use that link, small way to support the growth of the Jungle if you plan on trying the software yourself)
Unfortunately, it is Windows only right now. The Mac version is due in February, but I ended up just buying another cheap computer because I need this for some projects.
There’s a live mode and a record mode, limited to 15 seconds of audio with a watermark at the end. You simply press the record button and talk into your mic, press the record button again and let it process.
It then allows you to download the converted voice file and you can easily cut out the watermark or edit together clips if you don’t want to buy credits.
There are a ton of voices to choose from. Click the white and purple icon below the avatar picture to select one. Quality varies, but they have all kinds of stuff from celebrities to famous cartoon characters. You’ll eat up your initial credit on the first voice so don’t pick something without an audio demo.
Take a video of yourself talking
You can do this at the same time as your audio for the best results, or lip sync to your recordings after the fact. Either way the next step of this involves taking a normal video of yourself with a webcam or phone. Get that on your computer for the next step.
Export a PNG Sequence
You need to export every single frame of your video to a PNG file for Ebsynth to analyze and paint over.
Adobe After Effects allows you to do this in the render menu. Instead of exporting an MP4 or whatever, just select PNG sequence and dump everything into a folder which we will later point Ebsynth to.
In Resolve it works a little differently, you need to load your video clip into the Fusion tab and use the Saver node to generate the PNGs. Not hard, but not super intuitive.
Here is a basic tutorial if you’re unfamiliar:
Paint Your Keyframe
Now we have a few options. We need to pick a keyframe from the PNG sequence you exported, then paint a style over that keyframe, then Ebsynth will try to apply that to every frame.
This works best with a video that doesn’t have too much motion and where all the elements are visible in your keyframe.
So for example if you have a keyframe of you standing on a basketball court and put a style on that, but 2 seconds later another dude jumps into the shot, Ebsynth is not going to know what to do so things will get distorted and tear apart.
The fun part? You Can Literally Just Paint.
You can trace over your keyframe in Photoshop or Canva and the Ebsynth animation makes these Wojack looking cartoons that are kind of funny.
I wanted to go a little further, so I generated an image in Midjourney, then used Photoshop to shrink and align that image to my face.
To be clear, I look nothing like this AI woman. I just lined up the eyes and mouth, then decided to let the chips fall where they may. Took a couple tries of generating images in Midjourney to get something with the right proportions.
I then masked out the rest of the room and used a combination of Content-Aware Fill and the clone stamp tool to fill in the background with some blur and smudge to cover up the hard edges that remained.
Save both the project file for later if you want to make edits or new keyframes and export your painted keyframe as a PNG with the same number as the frame you painted. Your original frame will be something like your_frame_name_00101.png so you should save your keyframe as 00101.png so Ebsynth knows what frame to start from.
Ebsynth!
You should now have a folder with a PNG for every frame in your original video and a painted keyframe.
Download and open Ebsynth (it’s free for now) and drag your keyframe image over the keyframe field of Ebsynth and it should fill the project directory automatically. Then drag the folder with your PNGs into the field labeled video inside Ebsynth.
Choose an output path where you want Ebsynth to dump the rendered frames.
Note: Your keyframe and video frames must all be the same size or you’ll get an error. If your video is 1920x1080 resolution, your keyframe can’t be anything else. Most common mistake here will be exporting to something like 1280x720. If this happens, resize your keyframe and try again.
Click the Synth button and wait. Since this will take some time, I recommend waiting 2-3 min, then checking your rendered frames to make sure nothing is way off. You can even drag the rendered frames into your editing software to watch it play back and get a feel for the motion so far, then decide if you want to proceed with the rest of the render.
Combining The Rendered PNG Frames to Video
Once it is done, simply highlight all of your rendered frames and drag them into a Davinci Resolve timeline. Resolve will see that it’s a numbered PNG sequence and automatically combine them into a single clip like a normal video file.
You can then set your In and Out points and render out an mp4 or mov file like any other video.
In this case, there was a problem that may affect you too. My rendered video was longer than my original audio file. How is this possible? Frame rate was off. Had to change my sequence from 24fps to 30 fps. Common mistake.
Ways To Improve This
As you can see from the animation, by the end of the video, my Midjourney maiden completely loses her eyes by the end of the video.
To fix this, I’d need to go back to the frame where it starts to get screwed up, paint that as a keyframe, then re-render from that point and cut the two video together as smoothly as possible. Super time consuming, but if quality is mission critical you do what you have to do.
The other thing you can do is clean up the voice. Plugins like Izotope de-noiser are hugely helpful along with basic EQ and De-essing to get a more pleasing result. With less noise.
You can also combine Voice.ai with [redacted_AI_audio_tool.exe] which cleans it up automatically but
I’ll do my best to assist in the comments if you get stuck or have questions, suggestions, etc.