{"id":218,"date":"2022-04-25T19:37:54","date_gmt":"2022-04-25T19:37:54","guid":{"rendered":"https:\/\/www.yubinlin.com\/?p=218"},"modified":"2022-04-25T19:37:54","modified_gmt":"2022-04-25T19:37:54","slug":"synthesizer-v-the-vocaloid-killer","status":"publish","type":"post","link":"https:\/\/www.yubinlin.com\/index.php\/2022\/04\/25\/synthesizer-v-the-vocaloid-killer\/","title":{"rendered":"Synthesizer V &#8211; The Vocaloid Killer?"},"content":{"rendered":"\n<p>Recently I got my hands on a new voice generation engine &#8211; Synthesizer V (SynthV). It&#8217;s like Vocaloid (please see my previous post about that) but more automated. One of the most prominent features of this software is its use of DNN (Deep neural network) for voice generation. I&#8217;m no expert in machine learning and neural network but I assume DNN is used to make more natural transition between words. The UI is similar to that of Vocaloid but looks \u201cfancier\u201d and contains more functionality as well.<\/p>\n\n\n\n<figure class=\"wp-block-image size-full\"><img loading=\"lazy\" decoding=\"async\" width=\"936\" height=\"528\" src=\"https:\/\/www.yubinlin.com\/wp-content\/uploads\/2022\/04\/image.png\" alt=\"\" class=\"wp-image-228\" srcset=\"https:\/\/www.yubinlin.com\/wp-content\/uploads\/2022\/04\/image.png 936w, https:\/\/www.yubinlin.com\/wp-content\/uploads\/2022\/04\/image-300x169.png 300w, https:\/\/www.yubinlin.com\/wp-content\/uploads\/2022\/04\/image-768x433.png 768w\" sizes=\"auto, (max-width: 936px) 100vw, 936px\" \/><figcaption>Synthesizer V&#8217;s User Interface<\/figcaption><\/figure>\n\n\n\n<p>The control panel on the right side is full of parameters for fine-tuning the pronunciation of words, and on top of that, there\u2019s a \u201cauto-tune\u201d function that adds random fluctuations on the pitch to make it sounds more \u201chuman\u201d. I\u2019ve had a lot of fun using this function since this automation saves me a lot of time manipulating the pitch bar (in the Vocaloid software). For the voicebank, I bought Eleanor Forte AI, Qing Su, and Stardust Infinity. The latter two are \u201cChinese\u201d voicebank in default but SynthV can \u201ctweak\u201d those banks into English or Japanese pronunciation (and it\u2019s on par with the native English voicebank).<\/p>\n\n\n\n<p>Of course, with all these AI and NNs the sound is very natural, sometimes so real that it can fool people with untrained ears. The \u201crealness\u201d on the word level is superb but I think there\u2019s improvement on the \u201csegment\u201d level. A pop song can be broken down into intro, bridge, pre-chorus, chorus, outro, etc. If I let SynthV does its job, each section will sound exactly the same \u2013 the software doesn\u2019t know what part of the song you are entering right? Therefore, as natural as it sounds, the generated voice is unavoidably \u201cflat\u201d \u2013 lacking any dynamic and tension changes, without manual intervention. I\u2019m thinking of making a small add-on script that allows me to apply some \u201cpresets\u201d for different portion of the song for improved automation.<\/p>\n\n\n\n<p>When Vocaloid first came out in the mid-2000s, it was viewed as a novel synthesizer that generates a pseudo-human sound. The limited technology at that time was responsible for the \u201crobot\u201d voice that is so distinguishable and so representative of a Vocaloid. At this point, you can see this two software are not enemies, but complement to each other. SynthV is trying to imitate human sounds at all costs, while Vocaloid offers more flexibility and can be tailored for creator\u2019s need \u2013 not necessarily striving for human replacement. So no, I don&#8217;t think SynthV will kill Vocaloid, but a nice addition to the community.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Recently I got my hands on a new voice generation engine &#8211; Synthesizer V (SynthV). It&#8217;s like Vocaloid (please see my previous post about that) but more automated. One of the most prominent features of this software is its use of DNN (Deep neural network) for voice generation. I&#8217;m no expert in machine learning and &hellip; <\/p>\n<p class=\"link-more\"><a href=\"https:\/\/www.yubinlin.com\/index.php\/2022\/04\/25\/synthesizer-v-the-vocaloid-killer\/\" class=\"more-link\">Continue reading<span class=\"screen-reader-text\"> &#8220;Synthesizer V &#8211; The Vocaloid Killer?&#8221;<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[1],"tags":[20,18],"class_list":["post-218","post","type-post","status-publish","format-standard","hentry","category-uncategorized","tag-music","tag-vocaloid","entry"],"_links":{"self":[{"href":"https:\/\/www.yubinlin.com\/index.php\/wp-json\/wp\/v2\/posts\/218","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.yubinlin.com\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.yubinlin.com\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.yubinlin.com\/index.php\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.yubinlin.com\/index.php\/wp-json\/wp\/v2\/comments?post=218"}],"version-history":[{"count":4,"href":"https:\/\/www.yubinlin.com\/index.php\/wp-json\/wp\/v2\/posts\/218\/revisions"}],"predecessor-version":[{"id":229,"href":"https:\/\/www.yubinlin.com\/index.php\/wp-json\/wp\/v2\/posts\/218\/revisions\/229"}],"wp:attachment":[{"href":"https:\/\/www.yubinlin.com\/index.php\/wp-json\/wp\/v2\/media?parent=218"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.yubinlin.com\/index.php\/wp-json\/wp\/v2\/categories?post=218"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.yubinlin.com\/index.php\/wp-json\/wp\/v2\/tags?post=218"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}