For my independent learning experience, I learned how to use Rogo Digital's Lipsync, an external plugin for Unity (like Playmaker). This plugin allows you to sync an audio file with facial animations, allowing your characters to come to life with dialogue. The plugin has some issues and a bit of a learning curve, but it is very powerful and has excellent developer support.
Lipsync works by mapping "phonemes," or specific sounds, to different facial movements. Sometimes, different sounds are produced by making the same mouth shape (i.e., M, B, and P sounds). It works extremely well with Mixamo Fuse characters because these models are created with facial blend shapes, making it easy to customize mouth shapes on the character for each phoneme. In total, the plugin accounts for 10 different phonemes (including a closed mouth rest state), which is enough to cover most sounds in the English language. Lipsync is so well integrated with Mixamo Fuse characters that it offers preset phoneme animations for them. I had to edit these a tiny bit to get them to look right for my character, but overall the presets were very accurate. When setting these mouth shapes, there is an overlay showing what the mouth should look like, which would help if you were using custom made characters.
After creating the phoneme animations, you must go through the audio recording you wish to use and mark each sound. Thankfully, Lipsync has an automatic phoneme detection system. Unfortunately, this feature is only available on PCs (I'll cover more on this later). It also allows you to manually mark each phoneme, but detecting each sound in a vocal track is an extremely cumbersome process.
I recorded my audio with a high quality microphone in a quiet room, and I think this helped to make the automatic phoneme marking more accurate. While interviewing my subject, I made sure not to make any background noise and to not talk over him, as this would ruin the experience in VR.
After marking each phoneme on the track, you link the track with the character for which you found phoneme animations. The elegance of this system is that it allows you to easily swap out dialogue between characters. As long as an audio track has its phonemes marked and as long as the character's phoneme facial animations are configured, all vocal tracks in your project can be used in any character model.
The biggest problem I had with this software was working with it across Mac and Window platforms. To test the autosync feature, I downloaded Unity onto a Window partition on my Mac computer and installed the plugin. Setting up the phoneme mapping on the audio file worked perfectly, and I should have been able to save this mapping file and open it on my Mac version. I had issues with this, and eventually had to import my entire project that I had made in my Mac version into the PC in the innovation lab (my Windows partition didn't have enough horsepower to process the whole scene).
Lipsync is still in Beta, so I am thankful that this is the biggest problem I had. There were also some issues with latency, but I think this is just due to the high number of very detailed objects in my scene. I used a PDF guide created by the developer to learn how to use the plugin, and also referenced YouTube videos and a dedicated thread on the Unity forums. I posted on the forum about the Mac/PC compatibility problems, and the developer replied in less than half an hour! He is also planning on adding autosync to the Mac version in the future. There are a few other alternatives to this plugin (SALSA with RandomEyes being the most prominent), but with such good developer support, I would recommend Lipsync to anyone looking to add vocal animations to their Unity characters.
COMMENTS