The aim of this study is to get more people interested in singing and diminish possible anxiety associated with it. This proposed system provides pitch correction and enables dynamic self-accompaniment to enjoy solo duet singing with computer-assistance.
Most of the problems of those people who are embarrassed and unconfident to sing in public can be related to their problems to tune correct sounds. This may be also related to less experience of singing . Especially, in the case of a person who can not sing the correct sounds (so-called tone deafness) it is possible some times to fix the tone deafness by training. One important aspect is to get used to singing since it could help to solve those anxieties by positively singing with the appropriate pitch.
Several products already exist for singing, for example,
karaoke is a system where only accompaniment is played and the user can sing along. There is real-time auto-tuning systems used in recording studios and live performances, but they are usually employed for pitch correction and rarely for improving singing skills. Finally, there is a device called “harmonizer” to make a user’s audio double with a different pitch. But the artificial voice has always the same interval to the original, so it is not possible to have dynamic intervals. Based on these findings, I propose a function of auto-tuning leading to better user’s singing ability and a “harmonizer” that have dynamic intervals.
The proposed system is available to everyone, it is built in Pure-data , a visual programming language widely used for audio real-time software applications. In order to ease the deployment of this application in not so powerful computers, we separate the GUI program from the DSP (Digital Signal Processing) as showed in Figure 1. The GUI uses an application called TouchOSC  in an iPad, iPhone or Android so that MIDI notes are transmitted from a virtual piano to a computer doing the DSP. TouchOSC is a modular OSC and MIDI control surface for Android, iPhone and iPad. It supports sending and receiving Open Sound Control and MIDI messages over Wi-Fi. OSC (Open Sound Control) is a communication protocol developed by CNMAT, a laboratory located at the University of California.
Figure 1. Operate Pure-data thorough Touch OSC
Also, to build the interface as shown in Figure 2. This interface can be built using the editor application provided by Hexler Company that develops TouchOSC.
Figure 2. The developed GUI on TouchOSC.
- Pitch correction
In performing pitch correction, the singer’s pitch is tracked by detecting its constituent sinusoids from vocalic parts as implemented in [sigmund~] object in Pure-data. Pitch adjustments are performed using a ‘Pitch-Synchronous Overlap and Add’ (PSOLA) algorithm . The PSOLA implementation used in our project was ported from a Max/MSP object called [shifter~] . I ported it to Windows, and Raspberry Pi Debian. In this implementation, pitch periods are computed over the last 1000 samples, and adjustments are made only if the ratio r between the desired and current pitch is 0.1< r ≤ 4.0. I reduced this range to 0.5 ≤ r ≤ 2.0.
As in the pitch detection stage, these changes are only performed in the vocalic parts of the voice .
To understand the operation, refer to the numbers in Figure 2.
(1) Different pitch [%]: Display error e between correct pitch p and actual pitch p̂. The correct pitch (required pitch) is acquired by pressing a key. The closer to 0, the more accurate the pitch. This is represented as follows.
(2) Mode: users can select between 2 modes: Self-duet mode or Auto-tuning mode. (Explained later)
(3) mic and hmy: Adjust the volume of the real and artificial voices.
(4) DSP on/off : DSP on/off setting (i.e., processing sound or not)
(5) reverb roomsize, wet and dry: Reverb room-size represents the volume of the room. If the value is small, it will have short echo, wet and dry corresponds to the amount of processed and unprocessed versions presented in the audio mix.
(6)Virtual piano: It sends MIDI note number to Pure-data by pressing the key of the required sound.
The middle of virtual piano is C4 . (C4 : MIDI note number = 60). As a reference, the average tone range of women is A3－C5 (MIDI note:57－72) and the average tone range of men is D3－G#4 (MIDI note: 50－68).
- Self-duet mode
When self-duet mode is selected, the user can enjoy self-duet. The procedure is as follows.
- User sings a melody.
- User plays a target interval from C4 with a virtual piano while singing. For example, user sings at X Hz, and presses D4 in the key board. The program outputs two sounds simultaneously: p0 and p1.
p0 is the unprocessed voice of the user, and p1 is major 2nd above p0. p1 is as follows. Please refer to figure 2 for figures.
Where f1 is the frequency associated to p1’s MIDI note － 60.
Table 1. Table of MIDI note numbers, musical scales, harmony tones and ratio.
- Auto-tuning mode
When the auto tuning mode is selected, the user can play the melody through the virtual MIDI keyboard and adjust the pitch of the singer according to the frequency of the selected sound.
- User plays a melody with a virtual piano while singing.
- Based on the input of virtual piano (MIDI signals), the program change user’s pitch. For example, when user sings A4 (440 Hz) at 430 Hz, The program corrects user’s pitch by +10 Hz.
Please refer to Figure 3 for this sound’s score.
Figure 3. Demonstration’s score.
(1) A demonstration was performed using self-duet mode.
sample 1: Only user’s voice.
sample 2: Using self-duet mode.
(2) A demonstration was performed using Auto-tuning mode.
sample 3: Only user’s voice. (Not good singer)
sample 4: Using auto-tuning mode.
sample 5: Unprocessed voice (sample 3) and processed voice (sample 4).
I used the song “Sounds of Silence” (1965) by Simon & Garfunkel (Columbia Legacy).
- Future work
It is understood that a desktop-computer based on Raspberry Pi, μ-processor is convenient. But, when using Raspberry Pi, the CPU performance or memory is insufficient, for the complete system yielding the operation very slow. So, I have aimed at performance improvement investigating the causes of such bottleneck and implemented this solution as two different programs (GUI, DSP). In the future, the two programs may coexist in the source platform, provided treat the performance issues are solved.
As shown in the demonstration, the developed function outputs different harmony. And it is possible that such feature leads to improvement in singing skills by using auto tuning mode in the same software. The demonstration results, the proposed system may be useful to correct pitch training but also for enjoying harmony without the pressure of peer-reviewing.And I hope this program could help people who are not good at singing.
This research was partially funded by Kawai foundation for Sound and Technology & Music.
福島英 (2005) 『ヴォイストレーニングがわかる Q&A100』 音楽之友. (in Japanese: “Can understand how to voice-training. Q&A100”).
 Miller Puckette, (2007). The Theory and Techniques of Electronic Music. World Scientific Pub Co Inc, 323p.
美山千香士 (2013) 『Pure Data Aチュートリアル&リファレンス』 ワークスコーポレーション pp.311-319 (Chikashi Miyama, “Pure Data Tutorial&Refference)
 M. Puckette, T. Apel, and D. Zicarelli, “Real-time audio analysis tools for pd and msp,” in Proc. Int. Computer Music Conf., 1998.
 T. Jehan, “Tristan Jehan’s Home Page.”[Software]. Retrieved October 21, 2016. Available from web.media.mit.edu/~tristan, 2008.
 Julián Villegas “音程感覚の習得とよりよい歌唱体験の補助” SOUND 32,2017,10-13 (in Japanese: “Acquire interval sense and assistance for singing experience.”)