I’ve been a fan of Vocaloid (mainly Hatsune Miku) all the way since elementary school. No, I don’t understand Japanese (and thus I have the luxury of just enjoying the music itself – rather than its lyrics). Recently, I got a bit bored and decided to create a cover of a song through Vocaloid.
If you don’t know, Vocaloid is a voice synthesizer software – sort of like a MIDI software – but with voices (instead of instrument). There is a vast selection of voicebanks available on the market, which allows you to create different timbre of voice. For example, Hatsune Miku is a type of voicebank – and has her own amine figure – and her own concert – basically she is just a virtual singer – and yes I did buy a pair of tickets to her concert (but it was cancelled due to the obvious reason). Other voicebank, such as Meiko, Gumi, or IA, has its own quirks and features (and corresponding fan groups as well). Nonetheless, I think most people will say Miku is the most representative and the most well-known one among all the Vocaloid voicebanks.
However, I did not use Miku as my first attempt. Miku mainly “speaks” Japanese. Theoretically, I can just plug each individual phoneme (think of it as an “unit” of sound – the word “hello” has two phonemes, for example) into the software and create the melody. However, word pronunciation is a tricky business, especially when you try to combine multiple words sequentially: the vowel of the next phoneme merges with the consonant of the previous phoneme. For example, the phrase “look at that” may be pronunced as “loo ka dat” – ish. If I type the three word into the software, it will give a very unnatural pronunciation as Vocaloid doesn’t “know” how to merge the vowels and the consonant, requiring users to separate them manually. I don’t know Japanese, so I don’t know the pronunciation rules in this language neither. So Japanese is out of the game. Miku does offer an English voicebank, but I found it too “screamy” as I think it involves too much of treble. Don’t get me wrong, it is an excellent voicebank, but for me, I want to have a deeper tone to mimic human voice. Thus, I begin with a voicebank called Magurine Luka for my cover project.
For my music selection – I decided my all time favorite “Moonlight Shadow” by Mike Oldfield should be my first test subject. After I drew all the melody in the editor, I typed in the lyrics “as is” in the editor as well. However, this is far from over: I need to change the phonemes to make it sounds “natural” as mentioned before. As you can see below, each segment is followed by the phonemes expression. For example, in bar 28, the original lyric is “carried away” but I changed it into “car rid de way” to make daisy chain the vowels and consonants. Another important part is the parameters, each note comes with a series of parameters, such as pitch, volume, velocity, gender factor, and brightness, etc. I found myself using the pitch parameters the most, usually adding a slight pitch up at the beginning of the note helps a lot.
After creating the voice, I need to mix with the backing track of the original song. This job is done in Audacity – a free software that I’ve been using since high school and it is good enough for me since I’m just a novice in this field. The mixing is pretty easy (for me) as long as the tempo in the Vocaloid is the same as the backing track. Therefore, I only need to align once (at least for the constant tempo songs) to do the job. After that comes some fine tuning of the volume and it’s done. Anyway, attached below is the video (audio to be the more precise term … It’s just a still image with the music). Enjoy.