OpenUtau Tutorial-Notes
OpenUtau Tutorial-Notes
OPENUTAU TUTORIAL
This tutorial will be very in-depth, and include instructions on my entire workflow (including a bit
on mixing). I will kind of assume you know basically nothing about vocal synths/UTAU and the
terminology, and will explain WHY you are doing everything you are doing. If you find any errors
or new info that you think I should add to this document, let me know. Skip to the parts in bold to
find specific info. I hope this helps at least one person out there, and if anyone makes
something with this tutorial, send it to me :) I’d love to see what you make!
(Haven’t watched these all the way through yet but here are some simpler video tutorials)
● Download OU
○ (OpenUtau) from here: https://www.openutau.com/ (According to your platform. Sorry Apple
users you won’t be able to do all of these things and also it doesn’t run that well. Whoops)
● Download a Singer
○ I will use Kasane Teto so we can mess around with different voicebanks and
she’s popular, here: http://utau.wikidot.com/utau:teto-kasane
○ VCV/CV/VCCV/etc???
■ V is for vowel and C is for consonant. These guys sing from recorded
samples, in a couple different formats. Most of the time, singing
Japanese, all the syllables are technically “CV.” Adding another vowel in
the beginning lets the different syllables blend together, and VCCV/CVVC
even more so. VCV is the most common. Teto has a CV and VCV
voicebank, as well an English CVVC one. This technically uses a phonetic
system called X-Sampa but we can worry about that later
○ Encoded/aliased?????
■ What language you have to type in, either kana or romaji. Kana is
hiragana/katakana, Japanese characters, and romaji is roman letters.
LiKe ThEsE! Encoded is for the file names, and aliased is for the note
names. Notes go by “aliases.” Even if your UTAU is kana encoded, you
don’t need to have a Japanese keyboard and you don’t need to go copy
and paste from anywhere either
○ Voicebanks????????
■ Essentially different collections of sounds. Every UTAU has at least one
voicebank. Teto has 61. These correspond to different vocal “modes” you
might want. Normal, whisper, rikimi (strain), sakebi (shout), edge, smooth,
and English
○ How do I download Kasane Teto and what do I download?
■ Go to the official site linked on the wikidot (https://kasaneteto.jp/utau/)
1
She also has an OpenUtau specific one that merges the first 5, an “extra” bank that has some
specific notes but doesn’t work on its own like the others, and a few old ones.
■ Translate the page if you want, but you can recognize what to download
just by looking at the English text
■ Click the page at the top that says UTAU
■ Then the OpenUtau section, and download
■ To show you how to merge different voicebanks into one, you can also
download “English Voice” below. This combined will give you all of Teto’s
current stuff (feel free to download the “Extra Voice” as well, it contains
some breaths and a rolled r)
■ Both of these are .zip files. DO NOT UNZIP THEM as it will corrupt a lot
of the text data, which is pretty important. Don’t use Safari when you are
doing this because it will automatically unzip them. You could also just
turn that feature off. If you look in an unzipped UTAU file, all the Japanese
text will be replaced by random characters kind of like this–
^$#&(#%&&%%%...$^^%%
● Install a Singer
○ Open OU and just drag in the .zip files one at a time
○ If you see either files named something like “gyagyigyugye/kakekikakeko” or
Japanese characters, don’t change anything. This is OU trying to understand
how to read the files you are giving it. If it looks like a jumble of random
punctuation, change from Japanese Shift-JIS to other ones until it looks like
something more normal
○ Same for the next popup. Usually an UTAU will use the same encoding twice
○ Then, click UTAU. Enunu and Diffsinger are other types of voicebanks involving
AI (not even the awful kind, usually)
■ Enunu and SynthV work in similar ways; set up in CV, traines models off
of singing, sounds much smoother than individual sample recording
■ This is how it sounds
■ Or a demo reel (I think there can be automatic pitchbends in here but I’m
not sure)
■ I have never used Enunu but I assume this video is good because
HARAAO is good at tuning??? I watched it and it looks simple enough
(you will need to know a bit about OU first though)
○ Then your UTAU will install itself and you can track the progress at the bottom.
It’s pretty fast. Don’t try and do anything else until the blue bar disappears
● The Track Editor Menu
○
○ The top bar is your standard menu. Under file we have;
■ New. Self explanatory. OpenUtau uses a special file format called .ustx.
All vocalsynth programs use different formats, and OpenUtau can import
.vsqx/.vsq (VOCALOID), or .ust (UTAU). .ust files cannot have multiple
tracks, but otherwise they are the same. Also midi files but those won’t
come with lyrics. Convert files on this website:
https://sdercolin.github.io/utaformatix3/
■ Open will open up the folder you most recently used to store a .ustx. I
recommend making a dedicated folder
■ Hover over Open Recent to see your latest opened files. You can change
which file formats OU will put here in the preferences
■ Save will autosave whatever you are working on if you have not yet set
the file name/destination
■ Import tracks lets you import any of the above filetypes. This is you you
can import multiple .ust files at once (or to an existing project). If you try to
drag in more than one, only the first one will actually import
■ You can also import audio/midi files using these buttons, or by dragging
them in
■ Audio only exports as .wav files (one file per track, the mixdown doesn’t
seem to work). Only unmuted tracks will export
■ Projects automatically save as .ustx, but you can export a project as a
midi file or as multiple .ust files. Most modifications you perform will not be
saved when exporting .usts, but they will save to your .ustx. This is handy
if someone wants to use your .ustx, but you don’t want them to just use all
the modifications (tuning) you worked hard on without changing anything
■ You will have to have saved a project at least once to open the Export
Location
○ Edit contains undo and redo
○ Project contains expressions (which I will go over in depth later) and adjust
tempo (preserve timing). This lets you change the BPM of a project without
moving around all the notes you already placed
○ Tools contains;
■ Layout. This will set up a handy layout when you have multiple OU
windows open. You can have two; the menu you are looking at and the
individual track editor. We will open this in a minute
■ Clear your cache if the program is running slower than usual, or generally
if the data reaches the megabytes
■ Debug window opens a log that you can use to troubleshoot. It tells you
(often in computer jargon) what the program is doing. I have uploaded this
to forums when I am trying to get other people to help me understand why
the program is not working. These logs also exist as files
■ Singers opens a complicated menu we will look at later
■ Install singer prompts you for an UTAU .zip file. You can just install
singers by dragging the .zip right into the program, like we did earlier
■ Dependencies are a specific type of plugin. None are exactly essential for
the average workflow, but you can research some that might be
interesting
■ Preferences is your settings. You can change your audio output device,
scrollbar settings, singer locations, some stuff about how your computer is
running, appearance, oto editing (The oto is a file that tells OU how much
the different notes overlap with each other, and what parts of the notes
are supposed to stretch out. Most of the time, people making UTAUs do
an alright job on this and you should really only have to make changes if
you are making your own UTAU), and some advanced tools. You can tell
OU what file types it should remember in “Open Recent” here. The default
lyrics helper is good for Japanese, and it makes it so you won’t have to
actually type IN JAPANESE for your input to be recognized. Some
changes will require a restart to be applied
○ Help can send you to the wiki, open the logs, or you can check for program
updates
○ The second section has project specific things
■ Change the time signature or tempo
■ Play/pause/skip ahead
■ Time you are at in the song
○ Then you have the track menu. Once a singer is selected, their portrait will
display here
■ “Track1” is the default name of the track. Change it with one click
■ Select which singer will perform this track. Select the UTAU you
downloaded here, under “Classic.” The most recently used voicebanks
will appear above the classic menu. If you downloaded Teto’s voicebanks,
select the one that says “OU.” You can change these names to be
whatever you want later
■ “DEFAULT” is the phenomizer set to this track. A phenomizer changes
how the phonemes (sequences of letters) are recognized. If you type “ka”
without selecting a phenomzier, OU will not change it. If your UTAU is not
romaji aliased, this won’t match up with any files and will not make sound.
This is why you need to know if your voicebank is CV/VCV/etc, because
there are different phenomizers for each. There are also different ones for
different languages, depending on the phonetics system used by the
UTAU’s creator. Some OU specific voicebanks will automatically select a
phenomizer for you, like the Kasane Teto OU bank. Her English
voicebank uses the [EN X-SAMPA] English phenomizer. This will make
OU automatically choose how to pronounce words, but you can make
changes if it messes up
■ There is a blank bar where you can select the renderer. Playing these
audio files at specific pitches requires your computer to do a lot of
complex math, and there are small differences that can change the types
of sound produced. Changing the resampler is like printing the same
picture on different printers. Some colors will appear brighter on different
printers, and you should change the printer depending on what colors are
the most important. OU comes with a default resampler, WORLDLINE-R,
which is pretty good. Classic UTAU uses a different resampler, and we
can go over how to download more later. The CLASSIC resampler,
without any modifications, is just worldline again. Pick whichever for now
■ The bottom left bar is the individual track volume
■ M mutes the track and S plays only that track (solo)
■ Once you download more resamplers, the settings icon is where you can
apply them. The “renderer” comes in two parts, the resampler and the
wavtool. Usually you have to install a resampler twice as both a resampler
and wavtool for it to work properly. Right clicking on this icon gives you
more settings
● Remove track
● Rearrange existing tracks
● Another spot to rename the track
● Change the track color (purely cosmetic)
● Duplicate the track or its settings
● Remap voice color (not cosmetic, we will go over this later)
■ The smaller blue bar is the panning of the audio. For some reason, this
isn’t always saved when exporting audio. It’s okay though, because a
DAW (digital audio workspace) can do it just as easily
■ The plus button beneath adds a new track with all the same settings
○ The larger section of the menu is where you can see the different singing parts.
Click somewhere to the right of your singer, and you will create a new part. Each
track can have multiple parts. These can overlap, but doing so makes it hard to
see all of what is playing at once. I recommend just creating new tracks instead.
Right click on a part to delete or rename it, and drag it to change its position. You
can copy and paste parts on any track. Above the part section, you can view the
project time signature, BPM, and bars. The small gray arrow in this section is the
playhead. Click in the numbered bar section to change where you will play from.
Pressing space will play the notes, but won’t play right now as there are no notes
○ Double click the part to open the actual editor! You can close the piano roll editor
and you will be sent back to the track editor, but if you close the track editor the
whole program will close
● The Piano Roll Menu
○
○ Many more buttons and a piano roll. By default, the tooltips menu will be active.
Press T to show and hide it. Here you can scroll up/down/left/right (shift+scroll to
scroll to the side with a mouse). Control+scroll to change the horizontal zoom,
and alt+scroll to change the vertical zoom
○ The edit menu contains undo, redo, copy, paste, etc. These all use the normal
keyboard shortcuts. You can also search for specific notes and lock certain edits.
○ View changes how some things look. You can change note names to solfege,
enable or disable the character portrait on the piano roll, and more. I recommend
setting “show other tracks’ notes” so you can see all the notes that are playing at
once without returning to the other screen.
○ Batch edits has a lot of options we will go over later
○ When notes are selected, you can go to Edit Lyrics. This will open a text box with
all existing lyrics. Separate notes with a space,or,like,this
○ Note defaults changes the default note preset. You can set a default lyric, usually
“a.” Portamento is how much the notes overlap. Vibrato has a lot of settings, but
the default vibrato is pretty standard and can be changed usually on a
note-by-note basis
○ The question mark shows the tooltips section
○ The next few buttons are your tools. You can click them or use the numbers on
your keyboard in the corresponding order
■ The arrow button is a selector. Use this to move notes around and select
them. You can select multiple notes by clicking and dragging AROUND
the notes (if you drag while a note is selected/pink it will just move that
note) or clicking a note and shift-clicking another note.
■ The pencil draws notes. You can only select one note at a time here.
Draw a line on the piano roll to create a note. A tone will play to tell you
what the pitch of the note will be. You can also click on the letter notes on
the far left to do the same thing. Drag the center of the note to move it,
and the left and right edges to change the length. Double click the note to
change the lyric. Using a Japanese lyrics helper, the hiragana of what you
type will appear below the textbox. Using these is preferred, provided you
can read them. There might be a lot of different options below this, but
you don’t need to worry about these. When using an english phenomizer,
you can just type out the word normally
■ The pencil plus draws and deletes notes. Normally you can select a note
and click “delete” on your keyboard, but right clicking on a note with the
pencil plus tool will delete selected notes quickly
■ The eraser tool deletes notes with one click
■ The draw pitch tool will let you directly edit the pitch of the notes. This is
actually the more difficult way to edit pitch within individual notes, but it is
useful for making tiny changes
■ Overwrite pitch (with the same icon) does the same, but will overwrite
previous pitches created with the vibrato or modulation tools
■ Click with the knife tool to split notes
■ Next we have some more view options. First toggles the note tone, which
is the tone that plays when you place a note. I usually keep this on to
figure out if the pitch is what I want it to be
■ Next is the vibrato viewer. This adds a little symbol below the notes that
will toggle/edit the vibrato. Turning the viewer off will not delete the
vibrato, only temporarily lock edits
■ Next is “view pitch bend.” Pitch bending is the easiest way to edit multiple
pitches within singular notes. This is complicated (again, I’ll go over it in
detail)
■ Next is “view final pitch to render.” If you edit the pitch with the draw pitch
tool, you will not be able to see it unless this is turned on. This combines
all the different pitch editing tools into one visual
■ Next is “view waveform”. This displays gray lines when notes are put
down. It’s very useful to leave on. If it is on, you can see whether or not a
note you put down will play.
You can see here the first three notes have audio waveforms under them
and should make sound. The last one is not recognized by the
phenomzier and won’t do anything
■ Next is the phoneme viewer. OU is not actually playing the files “ra ra ra.”
It’s playing “-ら aら aら” (You may notice some other characters in these
notes, usually pitches like C5 or G4. This is fine). The phoneme viewer
tells you what the actual notes are. The hyphen here represents the note
that should begin the phrase, and the vowels match up with the previous
notes. So something like “-ka aki iku uke eko.” You can double click on
these lower notes to change them, useful if you want to overwrite what
the phenomizer thinks is correct. The English phenomizers are not always
perfect. You can also change the note overlaps here.
Here, you can see again “raa” doesn’t match up with any note and won’t
play. Drag the top dot or the pink line to change the positioning of the
note, and the bottom dot to change the overlap. The dots will fill in if they
have been edited from the default, like the pink dot above. Note overlap
does not usually need to be changed, but it’s useful in certain scenarios. If
the trapezoids are crossed, the notes are overlapping. If they aren’t, the
notes will probably sound choppy and disconnected. You can use this
stylistically if you want to
■ How to use VCCV English?
■ Next is the snap. You may have noticed that when creating/editing notes,
they will snap to certain lengths. Turn this on or off here, but be warned it
is difficult to make notes align with each other if you turn it off
■ Next is the note subdivisions. Change this to edit how precise the note
lengths can be. If set to automatic, it will change based on the horizontal
zoom
■ Next is a button that says “1=C.” I thought that this would change the
pitch somehow, but it doesn't. I am not sure what exactly it does and have
not found any mention of it online. Let me know if you figure it out
■ There is a parameter menu at the bottom of the screen we can go over
later
● Audio Troubleshooting
○ There are several reasons the audio will not play. Turn on the waveform viewer. If
you don’t see a waveform, it could be one of these reasons:
■ You have not correctly selected a phenomizer. Go back to the track editor,
and try another one
■ You have not correctly selected a resampler/renderer. Try a different one
■ You have not correctly input notes/there are typos. Try copy-pasting in a
hiragana character to see if this works
■ The UTAU has internal issues. Try using another one and see if the issue
persists
○ If you do see a waveform, it could be one of these reasons:
■ You have not correctly selected the audio output device. Go to
Tools>Preferences to change this
■ The track is muted. Go to click “M” in the track editor, or slide the big blue
bar more to the right. Watch your ears! If the blue bar goes to the right of
the center, there may be clipping/distortion issues after export. If the
vocals are too quiet compared to the instrumental, turn the instrumental
down instead
■ Another track is in solo mode. See if any tracks have “S” selected
■ Your computer audio is just off/low. Happens to me more times than I’d
like to admit
■ The UTAU has internal issues. Try using another one and see if the issue
persists
○ If the audio is playing but it sounds like beeps,squeaks, or very crunchy, it could
be one of these reasons:
■ You are using a Mac computer. This has a lot of rendering issues when
certain parameters are changed. Mac development of OU is still in
progress and sometimes this cannot be fixed. Look into a Windows
emulator or Bootcamp
■ The notes you are playing are very short or high pitched. This can cause
beeps or squeaks instead of normal audio. Most UTAU are not meant to
sing very high/low notes, and most have a recommended range
somewhere on their download page
■ You have changed a bunch of parameters and they have combined in a
weird way. Select the notes, click “Batch Edits” in the piano roll editor, and
reset parameters or pitchbends
● How to Start Making a Song Without a .ustx or .midi
○ To start making a cover of a song, you need a few things.
■ An audio file of the song
■ The BPM of the song (If you don’t know this, there are websites that can
find it from the audio file)
■ The lyrics
■ An instrumental (If you don’t have this, there are websites that can
separate vocals from instrumentals. I recommend installing the Ultimate
Vocal Remover)
○ Once you have all of these, drag the song audio right into the track editor
○ On the top left, change the BPM accordingly. Zoom in really close, and align the
peaks of the waveform with the vertical lines. This makes everything a bit easier
to work with later on
○ Create a new part within the track. Line this part up with one of the brighter
vertical lines. Drag the right of the part to make it as long as you want
○ Double click the part to enter the note editor. Now you have to actually draw out
all the notes. This can be a bit tedious, and you can use someone else’s .ust if
the song is popular enough
○ Note timings should generally line up and form some kind of pattern. If you lined
up the part and audio nicely, stuff like this usually won’t happen (sometimes,
though, songs have really precise/weird timings):
It will more likely look like this with less random lengths. See how each note is
made up of an even number of sections:
Notes not lining up properly with the song is the first sign that a cover was poorly
made. They do not have to be exact, but closer is better
○ Using the audio as a guide, match the lyrics that you have to the pitches you’ve
written
○ On its own, a project will not sound very fancy. Sometimes this is what you want,
but if you want more details on pitch bends/parameters/voice colors, look below
○ If a song has overlapping parts or harmonies, create different tracks for each
voice. Having a good ear is helpful here, but you can spend as long as you want
on this. You can listen to other covers for harmony inspiration, look at sheet
music if you can read it, listen to the original song, or make up your own
○ If you feel like your cover doesn’t need any modification, or you’re just messing
around, you can save and export here. Mix (bring the audio into a DAW) it if you
want
● Downloading a .ustx or .midi
○ Finding a .ust/.ustx file is like downloading a template that you can work from.
There are many available on the internet. Usually when a video/song links a
vocalsynth file, they will list it in the title. You can find .usts easily by just looking
up “senbonzakura ust” or “never gonna give you up +ust.” People work hard on
these, if you use one please link back to the original creator. Even if you modify it
heavily. Sometimes the author’s information will be linked in a readme file within
the folder that you download. Again, you can convert between file types with
utaformix
○ You will also need:
■ The lyrics (sometimes a project file will have mistakes or stylistic
revisions)
■ An instrumental (If you don’t have this, there are websites that can
separate vocals from instrumentals. I recommend installing the Ultimate
Vocal Remover)
○ You can just drop in a .ustx file, or import several files from the track editor.
Generally, the BPM is already included. If it still says 120, I would double check
by looking it up
○ Unless you have installed the exact same voicebanks that the creator used, you
will have to reselect the singer for each track
○ Some .usts are tuned (modified) and some are untuned. If you import a midi file,
it will be untuned and will have no lyrics. If you find a tuned .ust and leave it
exactly the way it is, this is called a plug-n-play. You haven’t really done anything
here. If you post it, credit both the .ust and the tuning to the creator. It also will
sound different depending on the singer, so don’t be surprised if the original
cover sounds better than your unmodified one
○ You can reset the tuning by:
■ Select all the notes
■ Go to Batch Edits>Reset
■ Reset the parameters, pitch bends, and expressions, and clear the
vibratos
■ Go to Batch Edits>Notes
■ If they are present, remove ending “-” or “R”
■ Go to Batch Edits>Lyrics
■ If there are characters on the notes besides plain hiragana, remove tone
and letter suffixes, switch from VCV to CV (the phenomizer will change it
back for you if appropriate), and remove phonetic hints
■ If you can’t read hiragana, select “hiragana to romaji”
■ There still might be some random extra notes around. Generally you can
either replace them with the correct lyrics if they are notes, or just delete
them
○ Then you will have a plain, completely unmodified .ust. If you feel like your cover
doesn’t need any modification, or you’re just messing around, you can save and
export here. Mix (bring the audio into a DAW) it if you want
● Tuning Overview
○ Tuning refers to the process of editing the pitch bends of the notes. Or, more
generally, editing any parameters and adding special notes. Tuning is the part of
the process that allows you to really present your own style. There are many,
many different ways to make the exact same song. There is no “right” way to
tune vocals. If you like the specific tuning style of a certain producer, you can try
and replicate that (if they release tuned .usts, try and study what they do). I will
show you the basics of how to make these edits, and some situations where you
might want to make certain changes, but all these edits are all optional. Opinion
warning starts here
○ If you want my playlist with visible pitch bending, here it is;
https://youtube.com/playlist?list=PLhk97FNNi_8roy1CLn14sa36rh2mkX2Mu&si=
zfjVNn5O4EQIHoJ_
And my playlist of tuning I really like;
https://www.youtube.com/playlist?list=PLhk97FNNi_8peDM8VYuFfCVh67yerFAS
-
My favorite tuners;
■ Yasutange uses mostly Kasane Teto (UTAU or SynthV) and makes their
own art. Also might be a wizard
■ Creuzer Is the smartest person alive
■ Sukoya Cathedral is a slightly lesser-known producer (made Judas,
though) using mostly KAITO, Gackpo, and Hiyama Kiyoteru. The vocaloid
wiki calls their tuning sexy and I agree wholeheartedly
■ Mothman is an UTAU user who is great at expressiveness and language
mixing
■ Me. Kidding. I think I’m pretty decent I guess
○ Here’s an interesting conversation on tuning on the UTAU forum;
https://utaforum.net/threads/essential-tuning-skills.15983/
It’s a little old and some parts are UTAU specific but there is some useful
information
● Pitch Bending
○ Some say the most essential part of a good cover are good pitch bends (I
disagree, but I digress). Pitch bending refers to changing the pink pitch bend
lines, or drawing out the pitches with the draw pitch tool. The draw pitch tool is
generally only good for making very small edits. Most of the time, I go through the
entire process without touching it. Note bending refers to cutting up the notes and
moving them around instead of editing the bends. This is less precise, and can
do funky things to the timings of the vowels/consonants. If you decide to cut the
notes, adding a plus sign on a connected second note will continue the first one
○ There are a couple of specific techniques that I use when working with pitch
bends;
■ By right clicking on the pitch bend, you can change the transition type
between points. Click normally to add points, and right click points to find
where to delete them
■ How gradually the pitch shifts change how smooth the transition is. If the
transition is short (think 90 degree angles), you will get a more electronic
sound. Sometimes OU has issues rendering these. More realistic vocals
do not jump this fast and require a bit more transition time.
Super long transitions can sound more lilted. They are used in more
specific situations (you can hear when a singer does this), but kind of
sound weird when used in between every note
■ When the human voice makes a large jump up in pitch, sometimes there
is a short dip before the high note (as shown above). The most skilled
singers will do this less, but it sounds nice when done occasionally with
UTAU. You can do short or long dips, depending on the length of the jump
or the length of the notes
■ Generally, to make a note sound more powerful, start it off a little (or a lot)
higher, like this:
If the note stays high for too long, it may sound like just singing two
different notes
■ You can combine multiple techniques, of course. This dip is very
commonly seen;
This gives long notes more interest, but I wouldn’t do it too often. Stick to
notes that are within the key of the song, too
■ In the most realistic of vocals, notes will change in pitch a lot during the
consonant. In programs like Synthesizer V, you can see exactly where
this is. In OU, you don’t have that. You do have the audio waveform to
guide you, so keep that on. Notice here how the waveform gets smaller in
certain sections; this means it is quieter, and is usually where the
consonant is. That’s where large pitch transitions can happen quickly
without sounding odd (also yes, these are just random lyrics)
■ Sometimes, less is more! But as you go, you may find pitch bends can get
more complicated. Here’s an example of something I made for a scream
vocal a few days ago;
The pitches of spoken words generally never stay completely flat
○ Details of the vibrato can be edited by dragging around the bars that show up
when you have the vibrato viewer (top) on. Click the icon underneath the note to
turn vibrato on, individually note by note
○ Pitch bending and tuning in general is kind of “do whatever you want and change
it if it sounds bad”
○ You’ll need further knowledge like stuff from below to follow it but I like this tuning
tutorial
● Extra Notes
○ What about breaths? The ends of the notes sound too abrupt? Can I make vocal
fry? A glottal stop? All of these depend on the voicebank you are using
○ I will still be using Kasane Teto here, but I will show you how to find what extra
features your UTAU has
■ Make a new project file
■ Go to the singers menu in the tools section of the track editor
● The very top of this menu has the name of the voicebank you are
currently viewing. Set this to the voicebank you want to look
through
● Below that, you can see the file location of the voicebank, some
settings (if you accidentally clicked something incorrectly when
installing a singer, you can change it here) (or you can set a
default phenomizer so you don’t have to change it every time), or
play a set sample from the audio files. vLabler is a separate
program for editing note prefixes/suffixes
● Below that, there is a list of every note within the voicebank(s).
This is what we will look through
● The bottom waveform represents the oto (which calculates
rendering and overlap), which generally you never need to mess
with
● At the top right is where you see different voice “colors.” These are
the different vocal tones, often downloaded separately. Teto has a
few that we can look at, and I’ll show you how to apply and edit
them later
■ Look in the big chart to find all the extra notes. I recommend clicking
“Phonetic” to sort the notes. Any note that isn’t formatted like “ら” or “りゃ”
is an extra note. I always keep a list that I can refer back to when working
● Normal notes;
■ Go to the piano roll editor. Add in new notes, one for each extra note you
found. Listen to them to try and find out what they are, and write that
down too. Now you can use those whenever you want! Combine these
with the pitch bends for nice results
● Multipitch Voicebanks
○ “Multipitch” refers to a voicebank that has several different sets that are better
suited for high/low/medium pitches. This sometimes changes the power level of
the voice (powerscale/reverse powerscale), and sometimes just extends the
natural range. Kasane Teto is monopitch, meaning she only has the one (and it is
unlabeled). When we saw the voice color section, each color had a range of
tones. C1-B7 is UTAU’s entire range, so this doesn’t completely accurately
represent Teto’s range. She gets a little squeaky higher up and scratchy lower
down
○ Each pitch is located in a different subfolder and contains its own oto file
○ OU will automatically sort out and play the correct pitch notes for you. UTAU
creators include a file called a “prefix.map” that tells it how to do this. If you want
to test this out, I recommend my favorite VCV multipitch queen2 Yokune Ruko ♂
https://long-sleeper.net/index.php?id=46
● Parameters/Flags
○ This is a chunky part. The bottom menu of the piano roll editor (you can also
access these by clicking on the menu icon in the very top right) has lots of
different parameters to edit, which have parameters of their own. Which params
you can change are based on the resampler you are using. If the current
resampler does not support the current param, it will tell you
○ By clicking on the abbreviation (defaults are DYN, PITD, CLR, ENG, VOIC), you
select the parameter you are editing. The bright pink param is selected, but
alterations to the light pink param will be currently visible
2
Don’t bully me, I’m right.
○ List of default params and what they do;
■ DYN - dynamics - Changes the volume. Drawn
● You can make vocal fry with this too
■ PITD - pitch derivation (curve) - View/draw pitches normally created with
the draw pitch tools. Drawn
■ CLR - voice color - Select from set up voice colors, adds suffixes/prefixes.
Teto already has five of these! Options
■ ENG - resampler engine - Changes the renderer, does not change
available parameters. Options
■ VEL - velocity- Changes the length of the consonant/length of the overlap.
Lower = shorter. Note per note
■ VOL - volume - Changes the volume. Note per note
■ ATK - attack - Changes the volume of the beginnings of notes. Doesn’t
work well for most VCV voicebanks. Note per note
■ DEC - decay - Changes the volume of the endings of notes. Same
restrictions apply. Note per note
■ GEN - gender - Higher = more masculine, lower = more feminine. Note
per note
■ GENC - gender (curve) - Higher = more masculine, lower = more
feminine. Drawn
■ BRE - breath - Changes the breathiness. Note per note
■ BREC - breathiness (curve) - Changes the breathiness. Drawn
■ LPF - lowpass filter - Boosts lower frequencies. Note per note
■ NORM - normalize - Changes the volume of notes to be more similar.
Note per note
■ MOD - modulation - Very slightly changes the pitch of notes throughout
the note. Useful for when you have 2 simultaneous parts that sound too
similar. Note per note
■ MOD+ - modulation plus - Very slightly changes the pitch of notes
throughout the note, through drawn pitch. Note per note
● You can make a doubling effect partly with this
■ ALT - alternate - Can’t find what this does anywhere. Note per note
■ DIR - direct - Plays the raw sample without any modification, pitch
change, overlap, or params. Good for breaths. On/off
■ SHFT - tone shift - Shifts the tone, depends on the voicebank. Note per
note
■ SHFC - tone shift (curve) - Shifts the tone, depends on the voicebank.
Drawn
■ TENC - tension (curve) - Makes the voice more/less tense. Drawn
■ VOIC - voicing (curve) - Changes how much the notes are voiced. Lowest
is a whisper. Drawn
○ You can also add more resampler specific parameters. Click the settings icon,
and click plus to add more. Parameters change values called “flags.” For
example, breath is modifying the “B” flag. Flags are usually one or two letters,
and capitalization is important here
Here is a list of all the flags in Moresampler (the most commonly used resampler)
■ A - Amplitude modulation. Accentuation will now change with pitch
■ b/B - Amplitude modification for unvoiced consonants (s/t/k/etc.)
■ c/C - Lowpass
■ D - Cuts midrange pitches
■ e - Stretches vowels instead of looping them (which is the default)
■ F - Strengthens the formant filter
■ G - Regenerates .frq files, sometimes fixing odd noises
■ H - Lowpass
■ g - Gender
■ M - Lessens metallic noise
■ N - Turns off the formant filter
■ P - Normalize
■ R - Regenerates .pmk files
■ t - Shifts the pitch by ten cents. One cent is 1/100 of one semitone
(normal note)
■ S - Adjusts formant strength
V - Regulates voice power
■ W - Deemphasizes consonants
■ w - Growl
■ Me - Forces looping
■ Mt - Tension
■ Mb - Breath
■ Mo - Openness of the mouth while speaking
■ Md - Dryness. Very subtle
■ Ms - Stabilization
■ Mm - Finds the average between the old/new speech synthesis programs.
Default is the newest
■ ME - Emphasizes vowels
■ MC - Coarseness
■ MG - Growl
■ MD - Distortion. Loud and a bit metallic sounding
■ Mr - Creates a "singer's formant.” I can’t find/understand exactly what this
means, but the formant is usually the vowel. Seems to be opera related?
■ Mp - Modulation
■ ? - Ignores the prefix.map
■ < - Growl strength
■ > - Growl length
■ _ - Volume changes with the growl
■ % - Volume changes with the vibrato pitch
○ Change the parameters in combination with everything else. You can do a lot by
combining them. Here’s when I made Teto scream!
The Tutorial
(The .wav file)
● Voice Color/Combining Separate Voicebanks/Editing UTAU Information
○ If you want a video tutorial on all this, check here
○ Before, I had you install Teto’s OU voicebank along with her English one and the
extras. Let’s combine them into one voicebank
■ First, reopen the singers menu (Track Editor>Tools>Singers) and select
the voicebank you will use as the default one. For Teto, select the OU
voicebank
■ Go to the folder location
■ Navigate back to the singers folder, and find Teto’s English folder. Inside,
there will be more folders
■ The only things from this that you need are the oto.ini file, the .frq files,
and the .wav files. Delete the “character” files, the readme, and the
images
■ If the oto.ini is not already in the same folder as the .wav files and the .frq
files, put it inside. Drag the folder containing only these files back to the
singers tab, and into Teto’s OU folder. The oto.ini inside the English folder
will apply to those notes, and the oto.ini in the main folder will apply to the
others. OU can only read one folder deeper than the main singer’s folder,
so all the folders in the main Teto folder should not have more folders
inside them
○ Teto’s OU bank comes with 5 different voice colors already merged. Those
already had preset suffixes/prefixes, but you can actually go into the oto.ini and
change these. I am going to download two of Tokumei’s four voicebanks for
tutorial purposes, here (https://tll555.wixsite.com/index/tokumei). There’s a merge
voicebank but I’m ignoring it!
■ For this, I’ll just install two; Rock and Soft. Install them as normal
■ Go to the singers menu, and locate the folders. I’ll go to Soft, since Rock
is my base
■ There are a couple of folders, so this may take a bit of time. Each of these
folders has their own oto.ini file we will need to edit. Go to the first folder
and find it
■ Within the oto.ini are a lot of numbers, it doesn’t matter if the text looks
corrupted as long as you can see “wav=”
■ Go to Edit>Replace. Type “wav=” in the top textbox and “wav=prefix” in
the second. I want to use the prefix SF, so I’ll type “wav=SF.” Replace all,
and save the file before you close it
■ Repeat that exactly in each of the Soft subfolders, with the same prefix
each time, saving as you go
■ Drag each of the subfolders into the Rock folder. If a folder with the same
name exists, you can rename one of them without consequence
■ Return to OU and reopen the singers menu. You’ll have to restart the
program
■ Click “Edit Subbanks”
■ Add color. I’ll call mine “Soft”
■ Click “Select All,” and type your prefix in the prefix section. Mine was “SF”
■ Click “Set”
■ Click “Import prefix.map” and find the prefix.map from the original Soft
folder. Open it when prompted
■ Click “Save”
■ Your voice colors should be set up. Return to OU and check the
parameter menu to make sure it worked. You are now free to delete the
old mostly empty Soft folder
○ The “character” files are what we can use to edit the portrait icon and UTAU
name
■ Navigate back to the folder of the singer you’ll edit
■ Find the character.txt. NOT the character.yaml
■ Find the line reading “name=”
■ Don’t add a space, but type in the name. Like this;
You can see there should also be the name of an image (bmp) file. If you
want to change the character icon, find another square image, convert it
to a .bmp, and add it to the folder. Change the “image=” line to the
corresponding filename
■ You can change the names of the folders any time you want!
● Resamplers
○ Resamplers are the renderers, and they are all good for different things. They
work with another program called a wavtool (these are all mostly the same,
honestly). OU comes with WORLDLINE by default, but here are all the most
common ones;
■ WORLDLINE - OU default
■ resampler - UTAU default
■ moresampler - Lots of flags. Both a resampler and a wavtool
■ fresamp - Lessens engine noise and strengthens strong voices. Nasally
and a bit slow
■ TIPS - Good for soft/low voices. Uses .pmk files instead of .frq files
■ bkh01 - Really depends on specific flag usage
■ world4utau (w4u) - Doesn’t work for all voices. Strengthens engine noise
■ phaavoco - Like a vocoder
■ utaugrowl - Detailed growl parameters
○ Each resampler can be downloaded from different places. Here is where I got
moresampler (https://utau.fandom.com/f/p/4400000000000056544). To install a
resampler, unzip the file and drag the .exe into OU
○ For moresampler, you will have to install it twice (as a resampler and as a
wavtool)
○ Click “Classic” in the track editor
○ Click the gear icon
○ Set both options to moresampler.exe
○ Set both to default, and you’re set!
● Mixing
○ Mixing is just making the vocals sound nice with the instrumental. To be
completely honest I am not very good at this, but I can tell you the basics and
show you good places to look for more in-depth info
○ Mixing can really make or break a cover. Even if you have the best most
expressive tuning on the planet, bad mixing will make it sound, well, bad
○ For this, you need an external audio software (I just use Audacity) (I USED to just
put stuff in CapCut and line it up. DON’T DO THIS. FOR THE LOVE OF GOD).
Drag all your vocals and the instrumental into there. Line them up as exactly as
you can
○ Set your vocals to a volume that they can be heard, but aren’t completely
overpowering the instrumental. Harmonies will obviously be quieter
○ Audacity comes with some presets that I generally use. Here’s another forum
post that might be of help outside that. There’s tips on several different softwares
(https://utaforum.net/threads/mixing-tips-as-many-programs-you-can-think-of.585
/). Other than that, here’s some general filters that are standard;
■ Reverb/echo
■ Equalizers
■ Panning
■ High-pass (“radio” effect) and low-pass
○ There are a lot of tutorials on how to mix vocals out there that will be useful. Keep
in mind that real singers recording their voices already have some reverb from
their surroundings, but vocal synths do not. You might need to compensate for
that
Here are a few examples of my tuned .ustxs and .svps that you can feel free to look at/edit (one
with lots of edits and one with minimal edits) (also the one I used while making this document).
Credit me if you use them publicly, and let me know if you do (not mandatory but I think it’s
awesome)!
jansuta SVP/USTX
Teto JP (Merged)
BANKS
- smooth
- weak
- whisper
- shout (sakebi)
- strain (rikimi)
- vocal fry (edge)
BEFORE NOTE
* a (glottal?)
AFTER NOTE
a R (short end)
a R’ (glottal end)
a (cons.)
a R息 (soft long exhale)
a R吸 (soft med. exhale/inhale)
a R吸2 (alt)
a R吸↑ (high/power)
a R吸↓ (low/weak)
NOTES
巻 (pre-rolled r)
* ra/ri (rolled r)
nga/ngi
x ka/ki/ta/te (hissy?)
息1/2/3 or b1/2/3 (breaths)
Teto EN
VOWELS
V - love
A - love
a - hot
{ - cat
@ - should
3 - word
E - help
e - help
i - see
I - see
O - for
o - row
u - too
U - love
aI - I
aU - out
eI - hay
OI - noise
oU - row
CONSONANTS
g - Good
v - Very
dZ - juDGe
Z - Judge
z - Zero
S - SHould
s - SHould (there is no real “s” unfortunately)
T - maTH
t - maT
D - THe
d - Dog
j - You
N - siNG
n - siN
tS - CHurCH
rest (lowercase) are as expected
Defoko (Kollection)
BANKS
- high
- low
AFTER NOTE
a R (short end)
a - (end exhale)
a ‘ (end glottal)
Ritsu Kire
NOTES
dya/dyi
ブレス1/2/3/4 (breath)
n2
Ruko♀
BANKS
- shout
- sweet
- dark
- cheerful
- sad
- mellow
- whisper
- natural
AFTER NOTE
a R (more glottal than an end breath)
NOTES
a2/i2
gi2/ngi2
nga/ngi/ngya/ngyi
Ruko♂ (Merged)
BANKS
- kire (default)
- whisper
AFTER NOTE
a R (softer glottal)
a息 R (end exhale)
NOTES
吸 (inhale)
a吐/2 (toned? inhale)
Meiji (Merged)
BANKS
- intense (high)
- low
- old
AFTER NOTE
a R (short end)
a R2 (alt)
a k/kk/ky/p/pp/py (end const.)
NOTES
a2/n2/ze2/ge2/to2
BEFORE NOTE
・ka/ke/sa/se
-a
AFTER NOTE
a R (soft end)
a R2
a R3 (VK)
a RG (VK/CF)
a・(glottal)
a -2
a 吸 (exhale/inhale)
a (cons.)
a - (soft end)
NOTES
a2/sa2/na2
a3 (CF)
aE (VK/MP)
ae (VK)
nga/ngi
br1/2/3/4 (short/long/power/weak)
@br1/2/3/4 (short/long/power/weak) (CF)
CF/VK/MP br1/2/3/4/5/6 (short/long/power/weak/weak)
Sukone Tei
NOTES
a2/n2/ka2/na2
samples (kirai/zamaa/shingou/suki/aishiteru) (spoken)
BEFORE NOTE
AFTER NOTE
a (cons.)
a (tsu)
aR
a R/
a R/2
a RG
a RR
aV
a VV
NOTES
consonants
a/a2/a3/ka2/ka3/etc
a’/a’2/3/4/5 + a’ (rmj)
al/al2
n閉/n閉2/nl閉
aG/aG_
aa/ii/uu
nga/ngi/ngu
ha母/hi母/fu母
ha笑
sa/su/se長
hoe/howa?
rar - rolled
a明/a明l
a/