ホーム » (English) Publications


    (English) Publications

    申し訳ありません、このコンテンツはただ今 アメリカ英語 のみです。 For the sake of viewer convenience, the content is shown below in the alternative language. You may click the link to switch the active language.

    Journal articles

    [1]    Julián Villegas. Improving singing experience for people with tuning difficulties. Sound, 32:10–14, Jan 2017. In Japanese.

    Based on the association of psychoacoustic roughness and musical pitch, and inspired by the common tuning technique of eliminating aural beats between the strings of an instrument, we hypothesize that users of our system could adjust their intonation in order to minimize the interference between their current and desired pitch (a modulated version of his or her current voice). It is our hope that this process could lead to long-term singing improvements, as well. This work-in-progress report discusses implementation issues, expressionpossibilities, and future evaluations of this tool as an alternative for improving singing skills in self-refrained singers.

    [2]    Julián Villegas, Jeremy Perkins, and Seunghun J. Lee. Automatic prediction of creaky voice with psychoacoustic roughness. J. Acoust. Soc. Am., 2017. (submitted).

    The use of psychoacoustic roughness as a predictor of creaky voice is reported in this article. We found that the presence of high levels of roughness and sudden changes on the roughness temporal profile are correlated with the presence of creaky episodes in speech. When using expert classifications as reference, a creakiness classifier based on an objective roughness model performed similarly to a state-of-the-art artificial neural network-based predictor. Roughness-based classification of creakiness was more similar to that given by Vietnamese listeners than to the classification given by Japanese listeners. In northern dialects of Vietnamese, creakiness is phonemically contrastive, but not so in Japanese. These findings suggest that roughness prediction models could be successfully used for classification of creaky intervals in speech and that listeners of languages in which creakiness plays acontrastive role could be using psychoacoustic roughness to help distinguish between tones.

    [3]    Jorge González-Alonso, Julián Villegas, and M.P. García-Mayo. English compound processing in bilingual and multilingual speakers: The role of dominance. Second Language Research, May 2016.

    This article reports a study which investigated the relative influence of the first and dominant language on L2 and L3 morpho-lexical processing. A lexical decision task compared the responses to English NV-er compounds (e.g. taxi driver) and non-compounds provided by a group of native speakers and three groups of learners at various levels of proficiency in English: L1 English-L2 Spanish sequential bilinguals and two groups of early Spanish-Basque bilinguals with English as their L3. Crucially, the two trilingual groups differed in their first and dominant language (i.e. L1 Spanish-L2 Basque vs. L1 Basque-L2 Spanish). Our materials exploit an (a)symmetry between these languages: while Basque and English pattern together in the basic structure of NV-er compounds, Spanish presents a very different construction. Results show differences in response times that may be ascribable to two factorsbeyond proficiency: the number of languages spoken by a given participant and the nature of their L1. An exploration of response bias reveals an influence of the participants’ L1 on the processing of NV-er compounds. Our data suggest that morphological information in the nonnative lexicon may extend beyond morphemic structure, that there are costs to additive multilingualism in lexical retrieval, and that most of these effects are attenuated by proficiency.

    [4]    Julián Villegas. Locating virtual sound sources at arbitrary distances in real-time binaural reproduction. Virtual Reality, 19(3):201–212, Oct 2015.

    A real-time system for sound spatialization via headphones is presented. Conventional headphone spatialization techniques effectively place sources on the surface of a virtual sphere around the listener. In the new system, sources can be spatialized at different distances from a listener by interpolating Head-Related Impulse Responses (HRIRs) measured between 20 and 160cm. These HRIRs are stored in different databases depending on the audio sampling rate. To ease the realtime constraints, users can choose the number of hrir taps used in the convolution, and an alternative interpolation technique (simplex interpolation) was im- plemented instead of trilinear interpolation. Subjective tests showed that such simplifications yield satisfactory spatialization for some angles and distances.

    [5]    Michael Cohen, Julián Villegas, and Woodrow Barfield. Special issue on spatial sound in virtual, augmented, and mixed-reality environments. Virtual Reality, 19(3):147–148, 2015.

    [6]    Martin Cooke, Catherine Mayo, and Julián Villegas. The contribution of durational and spectral changes to the Lombard speech intelligibility benefit. J. Acoust. Soc. Am., 135(2):874–883, Feb 2014.

    Speech produced in the presence of noise (Lombard speech) is typically more intelligible than speech produced in quiet (plain speech) when presented at the same signal-to-noise ratio, but the factors responsible for the Lombard intelligibility benefit remain poorly understood. Previous studies have demonstrated a clear effect of spectral differences between the twospeech styles and a lack of effect of fundamental frequency differences. The current study investigates a possible role for durational differences alongside spectral changes. Listeners identified keywords in sentences manipulated to possess either durational or spectral characteristics of plain or Lombard speech. Durational modifications were produced using linear or nonlinear time warping, while spectral changes were applied at the global utterance level or to individual time frames. Modifications were made to both plain and Lombard speech. No beneficial effects of durational increases were observed in any condition. Lombard sentences spoken at a speech rate substantially slower than their plain counterparts also failed to reveal a durational benefit. Spectral changes to plain speech resulted in large intelligibility gains, although not to the level of Lombard speech. These outcomes suggest that the durational increases seen in Lombard speech have little or no role in the Lombard intelligibility benefit.

    [7]    Michael Cohen, Rasika Ranaweera, Hayato Ito, Shun Endo, Sascha Holesch, and Julián Villegas. “twin spin”: Steering karaoke (or anything else) with smartphone wands deployable as spinnable affordances. SIG-MOBILE Mobile Computing and Communications Review, 16(4):4–5, Oct 2012.

    We have built haptic interfaces featuring smartphones and tablets that use magnetometerderived orientation sensing to modulate virtual displays, especially spatial sound, allowing, for instance, each side of a karaoke recording to be separately steered around a periphonic display. Embedding such devices into a spinnable affordance allows a “spinning plate”- style interface, a novel interaction technique. Either static (pointing) or dynamic (spinning) modes can be used to control “whirled” multimodal display, including a rotary motion platform, panoramic movies, and the positions of avatars in virtual environments.

    [8]    Julián Villegas and Michael Cohen. Roughness Minimization Through Automatic Intonation Adjustments. J. of New Music Research, 39(1):75–92, 2010.

    We have created a reintonation system that minimizes measured roughness of parallel sonorities as they are produced. Intonation adjustments are performed by finding, within a user-defined vicinity, a combination of fundamental frequencies that yields minimal roughness. The vicinity imposition limits pitch drift and eases realtime computation. Prior knowledge of the temperament and notes being played is not necessary for the operation of the algorithm. We test a proof of concept prototype adjusting equal temperament intervals reproduced with a harmonic spectrum towards pure intervals in realtime. Pitch drift of the rendered music is not prevented but limited. This prototype exemplifies musical and perceptual characteristics of roughness minimization by adaptive techniques. We discuss the results obtained, limitations, possible improvements, and future work.

    [9]    Julián Villegas and Michael Cohen. Exploring tonal music through operational research methodology. Communications of the Operations Research Society of Japan, 54(9):554–562, October 2009. In Japanese.

    Two operational research applications in music are presented. Initially, the mapping of musical scales into multi-dimensional topologies is discussed, and the advantages of projecting these structures into simple spaces explained. We also present the Helical Keyboard, an interactive installation displaying three-dimensional musical scales aurally and visually. Subsequently, the problem of minimizing musical dissonance between audio streams in realtime is discussed, and a solution based on local minima search described.

    [10]    Mohammad Sabbir Alam, Michael Cohen, Julián Villegas, and Ashir Ahmed. Narrowcasting for Articulated Privacy and Attention in sip Audio Conference. J. of Mobile Multimedia, 5(1):12–28, 2009.

    In traditional conferencing systems, participants have little or no privacy, as their voices are by default shared with all others in a session. Such systems cannot offer participants the options of muting and deafening other members. The concept of narrowcasting can be applied to make these kinds of filters available in multimedia conferencing systems. Our system treats media sinks (in the simplest case, listeners) as full citizens, peers of the media sources (conversants’ voices), and we defined therefore duals of mute & select: deafen & attend, which respectively block a sink or focus on it to the exclusion of others. In this article, we describe our prototyped application, which uses existing standard Session Initiation Protocol (sip) methods to control fine-grained narrowcasting sessions. The runtime system considers the policy configured by the participants and provides a policy evaluation algorithm for media mixing and delivery. We have integrated a “virtual reality”-style interface with this sip backend to display and control articulated narrowcasting with figurative avatars.

    [11]    Julián Villegas, Yuuta Kawano, and Michael Cohen. Harmonic Stretching with the Helical Keyboard. 3D Forum: J. of Three-Dimensional Images, 20(1):29–34, 2006.

    An extended version of the paper published in Proc. HC-2005: Eighth International Conference on Humans and Computers, introducing other possibilities to achieve harmonic stretching using only the midi protocol.

    Refereed conference articles

    [1]    Julián Villegas and Takaya Ninagawa. Pure-data-based transaural filter with range control. In Proc. 5th Int. Pure Data Convention, Nov 2016.

    We present an extension to Pure-data by which users can truly spatialize sound via a pair of loudspeakers, i.e., spatialize monaural sound sources at an arbitrary azimuth, elevation, and distance. Although transaural techniques have been long explored, our system takes advantage of a recently collected Head-Related Impulse Response (hrir) dataset measured in the near field (20–160 cm from the center of a mannequin’s head) to allow a more accurate distance control, a missing feature in other implementations.

    [2]    Julián Villegas, Tore Stegenborg-Andersen, Nick Zacharov, and Jesper Ramsgaard. A comparison of stimulus presentation methods for listening tests. In Proc. 141 Audio Eng. Soc. Int. Conv., Sep. 2016.

    This study investigates the impact of relaxing presentation methods on listening tests by comparing results from two identical listening experiments carried out on two countries and comprising two presentation methods: the ITU-T P.800 Absolute Category Rating (ACR) recommendation and a modified version of it where assessors had more control on the reproduction of the samples. Compared with the standard method, test duration was reduced on average 37used on the ratings of codecs were found, but a significant effect of site on ratings and duration were found. We hypothesize that in the latter case, cultural differences and instructions to the assessors could explain these effects.

    [3]    Jeremy Perkins, Seunghun Lee, and Julián Villegas. The roles of phonation and f0 in Wuming Zhuang tone. In Proc. 22nd Himalayan Languages symp., Jun. 2016.

    This study reports phonetic measurements of the tonal system of Wuming Zhuang. While previous analyses have described Wuming Zhuang’s tone contrasts using F0 only, this study finds that that creaky phonation can distinguish pairs of tones that have similar F0 contours, suggesting that creakiness, in addition to F0, may play a role in distinguishing tones. Acomposite acoustic algorithm is applied as a way to compute creaky phonation and is offered as an alternative method for linguists interested in measuring phonation from the acoustic signal.

    [4]    Donna Erickson, Julián Villegas, Ian Wilson, Yuki Iguro, Jeff Moore, and Daniel Erker. Some acoustic and articulatory correlates of phrasal stress in Spanish. In Proc. 8 Speech Prosody, Boston, MA, May 2016.

    All spoken languages show rhythmic patterns. Recent work with a number of different languages (English, Japanese, Mandarin Chinese, French) suggest that metrically assigned stress levels of the utterance show strong correlations with the amount of jaw displacement, and corresponding F1 values. This paper examines some articulatory and acoustic correlates of Spanish rhythm; specifically, we ask if there is a correlation between phrasal stress values metrically assigned to each syllable with acoustic and articulatory values. We used video recordings of 3 Salvadoran Spanish speakers to measure for each vowel maximum jaw displacement, mean F0, mean intensity, mean duration, and mid vowel F1 of two Spanish sentences. The results show weak but significant correlations between jaw displacement and F1/ intensity, but no correlation between jaw displacement and F0. We also found strongcorrelations between stress, duration, and F1, and weaker, but significant correlations between stress and mean intensity /maximum jaw displacement.

    [5]    Jeremy Perkins, Seunghun Lee, and Julián Villegas. An interplay between F0 and phonation in Du’an Zhuang tone. In TAL: Proc. 5 Int. Symp. on Tonal Aspects of Languages, Buffalo, May 2016.

    This paper undertook an acoustic study of the tone system of Du’an Zhuang, finding that unlike the standard dialect, Wuming Zhuang, its tone system involved phonation differences in addition to F0 and duration differences. It was found that two of the six tones in unchecked syllables in Du’an Zhuang involved significant creakiness near the midpoint of the vowel. In checked syllables, a three-way tonal contrast was observed based on F0 contours, but not creakiness. These results suggest a phonological tone contrast that involves both F0 and creakiness. Among pairs of tones that differed in their phonation, significant differences in the timing of F0 fall were discovered. Additionally, the two creaky tones differed in the timing of the maximum creakiness. Future research on the perception side could establish whether and to what extent Du’an Zhuang speakers utilize creakiness and F0, and their relative timing, in discerning between tonal categories.

    [6]    Julián Villegas. An online benchmarking platform for visualizing ionizing radiation doses in different cities. In Proc. of eatis: 8th Euro-American Conf. on Telematics and Information Systems, April 2016.

    A working prototype for alternative visualizations of environmental data (currently, ionizing radiation) measured with bGeigie nano Safecast sensors is presented. Contrary to previous interfaces, in this visualization users have finer control of the displayed data (i.e., can determine date ranges, compare locations, decide the averaging areas, etc.) and more detailed information of the resulting visualization (size of the samples per day and per region, etc.). With this new data visualization, it is easier to compare local environment figures with those of other regions of the planet.

    [7]    Michael Cohen, Rasika Ranaweera, Kensuke Nishimura, Yuya Sasamoto, Shun Endo, Tomohiro Oyama, Tetunobu Ohashi, Yukihiro Nishikawa, Ryo Kanno, Anzu Nakada, Julián Villegas, Yong Ping Chen, Sascha Holesch, Jun Yamadera, Hayato Ito, Yasuhiko Saito, and Akira Sasaki. “Tworlds”: Twirled worlds for multimodal ‘padiddle’ spinning & tethered ‘poi’ whirling. In Proc. of SIGGRAPH, pages 67:1–67:1, Nov. 2013.

    Modern smartphones and tablets have magnetometers that can be used to detect yaw, which data can be distributed to adjust ambient media. Either static (pointing) or dynamic (twirling) modes can be used to modulate multimodal displays, including 360 imagery and virtual environments. Azimuthal tracking especially allows control of horizontal planar displays, including panoramic and turnoramic imaged-based rendering, spatial sound, and the position of avatars, virtual cameras, and other objects in virtual environments such as Alice, as well as rhythmic renderings such as musical sequencing.

    [8]    Michael Cohen, Rasika Ranaweera, Kensuke Nishimura, Yuya Sasamoto, Tomohiro Oyama, Tetsunobu Ohashi, Anzu Nakada, Julián Villegas, Yong Ping Chen, Sascha Holesch, Jun Yamadera, Hayato Ito, Yasuhiko Saito, and Akira Sasaki. Twirled affordances, self-conscious avatars, & inspection gestures. In Proc. SIGGRAPH Asia: Symposium on Mobile Graphics and Interactive Applications, pages 95:1–95:1, Nov. 2013.

    Contemporary smartphones and tablets have magnetometers that can be used to detect yaw, which data can be distributed to adjust ambient media. We have built haptic interfaces featuring smartphones and tablets that use compass-derived orientation sensing to modulate virtual displays. Embedding mobile devices into pointing, swinging, and flailing affordances allows “padiddle”-style interfaces, finger spinning, and “poi”-style interfaces, whirling tethered devices, for novel interaction techniques.

    [9]    Julián Villegas and Martin Cooke. Maximising objective speech intelligibility by local f0 modulation. In Proc. Interspeech, Sep. 2012.

    We investigated the effect on objective speech intelligibility of scaling the fundamental frequency (f0) of voiced regions in a set of utterances. The frequency scaling was driven by maximising the glimpse proportion in voiced epochs, inspired by musical consonance maximisation techniques. Results show that depending on the energetic masker and the signal to noise ratio, f0 modifications increased the mean glimpse proportion by up to 15%. On average, lower mean f0 changes resulted in greater glimpse proportions. It was also found that the glimpse proportion could be a good predictor of music consonance.

    [10]    Vincent Aubanel, Martin Cooke, Julián Villegas, and Maria Luisa Garcia Lecumberri. Conversing in the presence of a competing conversation: effects on speech production. In Proc. Interspeech, 2011.

    This study investigates how a background conversations affect foreground conversations, and how speakers may adjust their speech to overcome the perturbations. Three pairs of speakers were recorded in different combinations of simultaneous dialogues, and speech production modifications were investigated at an acoustical and interactional level. In addition to displaying standard Lombard effects, speakers were found to produce less back-channels and more interruptions in the presence of a background conversation. A decrease in the precision of turn taking was also observed. These results provide a better understanding of the strategies speakers may be developing in dealing with a concurrent conversation in the view of incorporating them into spoken dialogue systems.

    [11]    Julián Villegas, Martin Cooke, Vincent Aubanel, and Marco A. Piccolino-Boniforti. mtrans: A multi-channel, multi-tier speech annotation tool. In Proc. Interspeech, 2011.

    mtrans, a freely available tool for annotating multi-channel speech is presented. This software tool is designed to provide visual and aural display flexibility required for transcribing multi-party conversations; in particular, it eases the analysis of speech overlaps by overlaying waveforms and spectrograms (with controllable transparency), and the mapping frommedia channels to annotation tiers by allowing arbitrary associations between them.mtrans supports interoperability with other tools via the Open Sound Control protocol.

    [12]    Julián Villegas and Michael Cohen. “Gabriel”: Geo-Aware Broadcasting for In-vehicle Entertainment and Larger Safety. In Proc. 135 Audio Eng. Soc. Int. Conv., October 2010.

    We have retrofitted a vehicle with location-aware advisories/announcements, delivered via wireless headphones for passengers and bone-conduction headphones for the driver. Our prototype differs from other research in the spatialization of the aural information. Besides the commonly used landmarks to trigger audio streams delivery, our prototype uses geo-located virtual sources to synthesize the spatial soundscapes. Intended as a “proof of concept” and testbed for future research, our development features multilingual tourist information, navigation instructions, and traffic advisories rendered simultaneously.

    [13]    Julián Villegas, Michael Cohen, Ian Wilson, and William Martens. Influence of Roughness on Preference of Musical Intonation. In Proc. 128 Audio Eng. Soc. Conv., London, May 2010.

    We have found evidence suggesting that for musically naïve participants, when selecting among similar renditions of the same musical fragment, psychoacoustic roughness is an influencing factor on preference. We designed an experiment to compare the acceptability of three different music fragments, rendered with three different intonations, and contrasted the results with those of isolated chords of the same fragment–intonation combinations.

    [14]    Julián Villegas and Michael Cohen. “Roughometer”: Realtime Roughness Calculation and Profiling. In Proc. 125 Audio Eng. Soc. Conv., San Francisco, October 2008.

    A software tool capable of determining auditory roughness in real-time is presented. This application, based on Pure-Data (Pd), calculates the roughness of audio streams using a spectral method originally proposed by Vassilakis. The processing speed is adequate for many realtime applications, and results indicate limited but significant agreement with an internet application of the chosen model. Finally, the usage of this tool is illustrated by the computation of a roughness profile of a musical composition that can be compared to itsperceived patterns of ‘tension’ and ‘relaxation.’

    [15]    Michael Cohen, Ishara Jaysingha, and Julián Villegas. Spin-Around: Phase-Locked Synchronized Rotation and Revolution in Multistandpoint Panoramic Browsers. In Proc. ieee cit2007: 7th Int. Conf. on Computer and Information Technology, pages 511–516, Aizu Wakamatsu, Japan, October 2007.

    Using multistandpoint panoramic browsers as dis- plays, we have developed a control function that syn- chronizes revolution and rotation of a visual perspective around a designated point of regard in a virtual environ- ment. The phase-locked orbit is uniquely determined by the focus and the start point, and the user can pa- rameterize direction, step size, and cyclespeed, and in- voke an animated or single-stepped gesture. The images can be monoscopic or stereoscopic, and the rendering supports the usual scaling functions (zoom/unzoom). Additionally, via sibling clients that can directionalize realtime audio streams, spatialize hdd-resident audio files, or render rotation via a personal rotary motion platform, spatial sound and propriceptive sensations can be synchronized with such gestures, providing com- plementary multimodal displays.

    [16]    Julián Villegas and Michael Cohen. Synœsthetic Music or the Ultimate Ocular Harpsichord. In Proc. ieee cit: 7th Int. Conf. on Computer and Information Technology, pages 523–527, Aizu Wakamatsu, Japan, October 2007.

    We address the problem of visualizing microtuned scales and chords such that each representation is unique and therefore distinguishable. Using colors to represent the different pitches, we aim to capture aspects from the musical scale impossible to represent with numerical ratios. Inspired by the neurological phenomenon known as synæshesia, we built a system to reproduce microtuned midi sequences aurally and visually. This system can be related to Castel’s historic idea of the ‘Ocular Harpsichord.’

    [17]    Julián Villegas and Michael Cohen. Möbius Tones and Shepard Geometries: An Alternative Synæsthetic Analogy (poster). In Proc. siggraph npar: 5th Int. Symp. on Non-Photorealistic Animation and Rendering, San Diego, August 2007.

    “Why did the chicken cross the Möbius strip?” – “Because it wanted to get to the same side.” We created a 3d animation to illustrate a different visual analogy for Shepard tones based on the well-known “Möbius Strip II” by Escher. This animation presents a sphere that, like the ants in Escher’s woodcut, moves longitudinally over the surface. The path followed by the ball varies its transverse position randomly but smoothly. We sample this path to render a melody of Shepard tone dyads, each tone in the dyad having a frequency equivalent to the position of the ball relative to the edge of the surface. The idea of using a Möbius strip to create music is not new. Tremblay, for example, used it to show how to construct a music-box able to play sequences backward and forward. However, our work differs from other developments in the use of this non-orientable geometry to illustrate the mentioned aural paradox.

    Books and book chapters

    [1]    Michael Cohen and Julián Villegas. Applications of audio augmented reality. Wearware, everyware, anyware, and awareware. In Fundamentals of Wearable Computers and Augmented Reality, chapter 13, pages 309–329. CRC Press, 2nd edition, July 2015.

    [2]    Julián Villegas and Michael Cohen. Mapping Musical Scales Onto Virtual 3d Spaces. In Yôiti Suzuki, Douglas Brungart, Hiroaki Kato, Kazuhiro Iida, Densil Cabrera, and Yukio Iwaya, editors, Principles and Applications of Spatial Hearing. World Scientific, 2011.

    We introduce an enhancement in the Helical Keyboard, an interactive installation displaying three-dimensional musical scales aurally and visually. This improvement in the audio display is intended to facilitate didactic purposes by enhancing users’ immersion in a virtual environment. The new system allows spatialization of audio sources with elevation angles between 40 and +90 and azimuth angles between 0 and 355. In this fashion, we could overcome previous limitations on the audio display of the Helical Keyboard, for which weheretofore usually displayed only azimuth.

    [3]    Sabbir Alam, Michael Cohen, Julián Villegas, and Ashir Ahmed. Narrowcasting in SIP: Articulated Privacy Control. In Syed Ahson and Mohammad Ilyas, editors, SIP Handbook: Services, Technologies, and Security of Session Initiation Protocol, chapter 14, pages 323–345. CRC Press, 2009.

    Non-refereed articles

    [1]    Yuki Iguro, Ian Wilson, and Julián Villegas. Articulatory settings of English-French bilinguals reanalyzed by SS-ANOVA. In J. Acoust. Soc. Am., volume 140, pages 3222–3222, Dec 2016.

    To improve the skill of speaking a second language (L2), one good way may be to be aware of the underlying tongue position for a language. We focused on such underlying position differences between English and French, particularly when pausing for a short time between speaking; something called inter-speech posture (ISP). In past research, Wilson and Gick investigated ISP between English and French spoken by bilinguals. In that research, bilinguals had distinct articulatory settings for each language, mostly in the lips. However, their tongue data was for only 4 points of articulatory settings: distance from the ultrasound probe to tongue root, tongue dorsum, tongue body, and tongue tip, but not overall shape. Furthermore, to measure tongue tip position, past research relied on the alveolar ridge, which is unclear to see: possibly making the results inaccurate for tongue tip. In this study, we analyzed the whole shape of the tongue and made models of them using SS-ANOVA in R so that we could compare the difference from past research using a different measurementmethod. Our results showed that bilinguals who are perceived as native in both languages have a different ISP in the posterior half of the tongue.

    [2]    Julián Villegas, Jeremy Perkins, and Seunghun J. Lee. Psychoacoustic roughness as creaky voice predictor. In J. Acoust. Soc. Am., volume 140, pages 3394–3394, Dec 2016.

    The use of psychoacoustic roughness as a predictor of creaky voice is reported. Roughness, a prothetic sensation elicited by rapid changes in the temporal envelop of a sound (15-300 Hz), shares qualitative similarities with a kind of phonation known as vocal fry or creakiness. When a creakiness classification made by trained linguists was used as a reference, a classifier based on an objective temporal roughness model yielded results similar to an artificial neural network-based predictor of creakiness, but the former classifier tended to produce more type I errors. We also compare the results of the roughness-based prediction with those predicted by samples of three populations who use creakiness contrastively in different degrees: Japanese (where creakiness is not systematically used for phonetic contrast), Mandarin (where creakiness is used as a secondary cue), and Vietnamese (wherecreakiness is used as a phonetic contrast between tones). The roughness-based classification seems to better agree with classifications made by the untrained listeners. Our findings suggest that extreme roughness values (>4 asper) in combination with local prominences on the roughness temporal profile of vocalic segments could be used for classification of creaky intervals in running speech.

    [3]    Ian Wilson, Yuki Iguro, and Julián Villegas. Smoothing-spline ANOVA comparison of Japanese and English tongue rest positions of bilinguals. In Proc. 1 Int. Symp. on Applied Phonetics, Mar. 2016.

    [4]    Seunghun J. Lee, Jeremy Perkins, and Julián Villegas. The roles of phonation and F0 in Zhuang. In Annual Summer Conf. of the Linguistic Society of Korea: Its 60th Anniversary meeting, 2016.

    [5]    Julián Villegas, Ian Wilson, Yuki Iguro, and Donna Erickson. Effect of a fixed ultrasound probe on jaw movement during speech. In Proc. Ultrafest VII, Dec 2015.

    The use of an ultrasound probe for observing tongue movements potentially modifies speech articulation in comparison with speech uttered without holding the probe under the jaw. To determine the extent of such modification, we analyzed jaw displacements of three Spanish speakers speaking with and without a mid-sagittal ultrasound probe. We found a small and not significant effect of the presence of the probe on jaw displacement. Counterintuitively, when speakers held the probe against their jaw larger displacements were found. This could be explained by a slight overcompensation on their speech production.

    [6]    Ian Wilson, Yuki Iguro, and Julián Villegas. Articulatory settings of Japanese-English bilinguals. In Proc. Ultrafest VII, Dec 2015.

    In a similar experiment to Wilson & Gick (2014; JSLHR), who investigated the articulatory settings of French-English bilinguals, the present study is focused on Japanese-English bilinguals of various proficiencies. We analyze interspeech posture (ISP), and look at the differences between individuals and whether this is correlated with the perceived nativeness of the speakers in each of their languages

    [7]    Donna Erickson, Julián Villegas, Ian Wilson, and Yuki Iguro. Spanish articulatory rhythm. In Proc. Acoust. Soc. Japan, Autumn meeting, Aizu Wakamatsu, Japan, Sep. 2015.

    This paper addresses Spanish articulatory rhythm. Preliminary work with Spanish suggests that the initial syllable of a phrase has the strongest phrasal stress while the last one, the weakest. We recorded in audio and video utterances in Spanish, English, and Japanese, from three paid Salvadorian female siblings with different language background. Traces of markers in their faces were used for analyzing their jaw movement patterns. The obtained results, especially the results of the most English proficient participant, suggest that immersion in a second language may produce changes to the rhythmic patterns observed in the native language of the speaker. These differences in second language immersion seem to be reflected in jaw displacement of the different languages studied, and this is especially evident in the fact that there was not a pattern that was common to even any two of the speakers in their native Salvadorian Spanish.

    [8]    Seunghun J. Lee, Jeremy Perkins, and Julián Villegas. Acoustic correlates of tone in du’an zhuang: An interplay between pitch and phonation. In Proc. Int. Conf. on Phonetics and Phonology, Tokyo, Sep. 2015.

    This paper undertook an acoustic study of the tone system of Du’an Zhuang, finding that unlike the standard dialect, Wuming Zhuang, it involved phonation differences in addition to F0 and duration differences. It was found that two of the six tones in unchecked syllables in Du’an Zhuang involved significant creakiness near the midpoint of the vowel. In checked syllables, a three-way tonal contrast was observed based on F0 contours, but not creakiness. These results suggest a phonological tone contrast that involves both F0 and creakiness. Among pairs of tones that differed in their phonation, significant differences in the timing of F0 fall were discovered. Additionally, the two creaky tones differed in the timing of the maximum creakiness. Regard- ing duration, the same two-way contrast between long and short vowels reported for Wuming Zhuang was found among checked syllables in Du’an Zhuang. Further, duration is necessary to distinguish two identical checked tones, providing evidence that vowel length is contrastive in Du’an Zhuang. The data here allowed only a duration investigation of checked syllables, but it is likely that a vowel length contrast exists also for unchecked syllables. Future research on the perception side could establish whether and to what extent Du’an Zhuang speakers utilize creakiness and F0, and their relative timing, in discerning between tonal categories.

    [9]    Julian Villegas, Ian Wilson, and Jeremy Perkins. Effect of task on the intensity of speech in noisy conditions. In Proc. Acoust. Soc. Japan, Autumn meeting, Aizu Wakamatsu, Japan, Sep. 2015.

    We investigated the differences in speech intensity of Japanese speakers subjected to alternating periods of silence and Gaussian noise while engaged in four different tasks: two requiring communication effort (free dialog, and playing a game with a partner) and two requiring none (free soliloquy, and text reading). Two of the tasks were goal oriented (game, and reading) while the others were not. Regardless of noise presence, higher levels of intensity were observed on communicative tasks. During quiet periods, significant leveldifferences were observed for non-communicative tasks, with goal-oriented tasks yielding higher levels. In noise-to-silence transitions, speakers decreased their intensity to their average speech level faster than they increased it in the opposite transitions. In either case, such intervals were longer than typical reflex times. The effects of goal and communication effort in the transitions were complex: smaller in the noise-to-silence transitions, with text reading having the least variation, and dialog the greatest. Highest levels in quiet and noisy conditions were observed in tasks requiring communication efforts, regardless of goal orientation. In the transitions, speakers were faster to lower their speech level than in raising it when exiting and entering a noisy period, respectively.

    [10]    Ian Wilson, Jeremy Perkins, Julián Villegas, and Ayaka Orihara. Reaction time to unnatural and natural japanese pronunciation by native and non-native speakers. In Proc. Acoust. Soc. Japan, Autumn meeting, Aizu Wakamatsu, Japan, Sep. 2015.

    Reaction times (RTs) have been shown to be faster when lis- tening to stimuli that have a natural phonological process (e. g., Japanese high-vowel devoicing) at the expense of acoustic in- formation, rather than stimuli with unnatural phonology (non- devoiced high vowels in a context where they should be de- voiced) – Ogasawara and Warner, 2009, Language and Cogni- tive Processes. However, those results were for native listeners listening to native speakers. In the case of non-native speak- ers, it is unclear how pronunciation errors (bothphonetic and phonological) produced in the same natural/unnatural contexts would influence RTs. We tested 30 listeners, using 3 speakers (1 native and 2 non-native of high and low proficiencies) with both natural and unnatural tokens. A linear mixed effects anal- ysis showed that overall, RTs were faster for natural versus un- natural stimuli, but this was not true for the non-native speaker subset of data. In the native speaker data subset, RTs were faster for low-high pitch-accented words. In the non-native speaker data subset, there was a 2-way interaction between naturalness and pitch accent errors: tokens that were both unnatural (i. e., non-devoiced high vowels) and produced with incorrect pitch accent had significantly slower RTs.

    [11]    Jeremy Perkins, Seunghun J. Lee, and Julián Villegas. Ocp effects in suffixes with burmese creaky tone. In Proc. 25 Annual Meeting Southeast Asian Linguistic Soc., Chiang Mai, Thailand, May 2015.

    Background: This study explores OCP effects involving creaky tone in Burmese. While cases of tonal OCP among adjacent tones involving F0 changes are well-documented, there are fewer cases where it occurs with phonation. Phonologically, since tone can include both F0 and phonation, it is expected that OCP effects should exist for phonation as well. There are four Burmese tones, which are differentiated based on a combination of F0 and phonation (Bradley, 1982; Gruber, 2011), providing a test case for this question. Method: Eight native Burmese speakers were recorded. Stimuli contained low (L) and creaky (C) tone, and were placed in sentences with a root followed by a suffix. Four C tone roots and four L tone roots (two verbs and two nouns each) were combined with two C tone and two L tone suffixes, yielding 32 combinations. There were four stimuli with no prefix, resulting in 36 stimuli. Four repetitions were made, yielding a total of 144 sentences per speaker. C and L tone stimulus vowels were analyzed for F0 and creakiness. F0 analysis was done via Praat; creakiness analysis was done via a Matlab algorithm that uses a combination of acoustic features (Drugman et al., 2014). The output of the algorithm is a probability of creakiness. Results: Creaky tone suffixes are subject to OCP when following creaky tone roots (“Ccsuffix” condition in Figures 1 and 2 below). F0 is lowered in suffixes in this case, but with increased creakiness also observed. These facts are explained if the OCP applies to F0 in creaky suffixes. However, the contrast is maintained via the increased creakiness. Figures 1 and 2 below show 95respectively using a cubic smoothing-spline anova model (Gu, 2014). In the figure legends, the tone of the preceding root is the initial capital letter, with the suffix tone following it in lower case.

    [12]    Julián Villegas. Visualization of Fukushima Wheel radiation data using R. In Proc. of issm, the 14 Int. Symp. on Spatial Media, Aizu Wakamatsu, Japan, Feb. 2014.

    In this talk, I will discuss some issues of analyzing huge amount of data captured with sensors such as those deployed in the Fukushima wheel project (FWP). Besides visualization, which is a complex task on his own, finding trends on data coming from unreliable sensors poses a difficult task to manage with traditional tools and methods. To illustrate these issues, a correlation analysis between data of two environment indicators (temperature and ionizing radiation) retrieved from the FWP database will be conducted in R. Results of such analysis suggests a negative correlation between temperature and radiation (R=-.265, p<.001), i.e., a significant increase on radiation levels is found for lower temperatures. However, it is well know that temperature has no effect on ionizing radiation.

    [13]    Julián Villegas, William L. Martens, Michael Cohen, and Ian Wilson. Spatial separation decreases psychoacoustic roughness of high-frequency tones. In J. Acoust. Soc. Am., volume 134, page 4228, Dec. 2013.

    Perceived roughness reports were collected for pairings of sinusoidal tones presented either over loudspeakers or headphones such that the sounds were collocated or spatially separated 90 degrees in front of the listener (+ /- 45 degrees). In the loudspeaker experiment, pairs of sinusoids were centered at 0.3, 1.0, and 3.3 kHz, and separated by half a critical band. In the headphone experiment, the pairs of sinusoids were centered at 0.5, 1.0, and 2.0 kHz, and separated by a semitone. Although not all listeners’ reports showed the influence of spatial separation as clearly as others, analysis indicates that listeners generally found spatially separated tone combinations less rough when the frequencies of those tones were centered at 2.0 kHz or higher. This trend was also observed in a follow-up study with 20-component complex tones at fundamental frequencies of C2, C3, A4, and C4 (131, 262, 440, and 523 Hz, respectively) presented via headphones. These results suggest that spatial separation decreases perceived roughness, especially for tones with frequencies higher than the threshold at which interaural time differences rival interaural level differences for sound localization (approximately 2.3 kHz) and that the current roughness models need to be reviewed to include binaural effects.

    [14]    Ian Wilson, Julián Villegas, and Terumasa Doi. Lateral tongue bracing in Japanese and English. In Proc. of Ultrafest VI: the sixth Ultrafest meeting of researchers working with Ultrasound imaging technology for linguistic analysis, page NA, Nov. 2013.

    Coronal ultrasound imaging was used to compare the degree of lateral tongue bracing that occurs in English with that occurring in Japanese. The speech of Japanese speakers of English as a second language was examined to test the hypothesis that those who brace more (as is thought to be normal for English native speakers) have pronunciation that is perceived to be closer to native-like.

    [15]    Jorge Gonzalez Alonso, Maria del Pilar Garcia Mayo, and Julián Villegas. L3 morpho-lexical processing: Effects of bilinguals’ language dominance. In Proc. GASLA 12: The 12th Int. Conf. on Generative Approaches to Second Language Acquisition, page NA, Florida, Apr. 2013.

    We studied the effects of bilinguals’ language dominance in word-formation processes. Especifically, we analized the case of noun-compounds in terms of accuracy and response time for two trilingual groups: Spanish-Euskera-English, and Euskera-Spanish-English (in order of dominance). Results have largely matched our predictions: no significant effect of the participants’ linguistic profile was found on their accuracy rates (F(2) = 0.098, p = .906), a factor which was however significantly influential on their response latencies to the critical conditions (F(2) = 31.334, p < .001).

    [16]    Julián Villegas and Martin Cooke. Speech modifications induced by alternating noise bands. In Proc. SPiN–2013: The 5 Int. Wkshp. on Speech in Noise: Intelligibility and Quality, page NA, Vitoria, Spain, Jan 2013.

    We analyze several acoustic features from recordings of conversations between pairs solving Sudoku puzzles in the presence of noise bands with different center frequencies. For most of the studied features, significant differences were found for different center frequencies, but rather than circumvent the maskers, these changes suggest that speakers were trying to protect spectral regions important for speech.

    [17]    Jorge Gonzalez Alonso, Maria del Pilar Garcia Mayo, and Julián Villegas. Processing of English compounds by Basque-Spanish bilinguals: The role of dominance. In Hispanic Linguistics Symp., Gainesville, Florida (USA), Oct. 2012.

    Word-formation processes vary greatly among languages, although those which are typologically close tend to cluster around particular configurations which may or may not differ from those of other linguistic families. The case of compound words in Romance and Germanic languages has received a considerable amount of attention from both theoretical linguists (Contreras, 1985; Yoon, 2009) and acquisitionists (Liceras & Díaz, 2000; Slabakova, 2002; García Mayo, 2006), with the second focusing more on the interplay between two or more systems in a multilingual setting. The case of deverbal N+N compounds (e.g. can opener) in English as compared to their V+N Spanish semantic equivalents (e.g. abrelatas ‘can opener’, lit. ‘opens-cans’) is particularly interesting. What seems apparent is that Spanish and English do not lexicalise verb-noun relationships in the same way. Basque, on theother hand, does seem to have direct parallels with English: Basque deverbal compounds are also right-headed N+N constructions, in which the deverbal head has been nominalised through affixation (e.g. kontu kontalaria, lit. ‘story teller’). In light of this, are there any facilitatory effects in processing for those bilinguals whose L1 is similar to the L3 in the formation of deverbal compounds? We carried out an experiment in which we controlled for both language profile and proficiency. Sixty-six participants belonging to one of three language groups (L1-Spanish monolinguals, L1Basque-L2Spanish bilinguals and L1Spanish-L2Basque bilinguals) were assigned to one of three levels of proficiency in English (high, medium or low) based on their scores on the standardised Oxford Placement Test, and further tested in a lexical decision task, where they were asked to respond whether the itemsappearing on screen were actual English words. For the critical condition, 42 high-frequency English compounds and 42 pseudo-compounds (non-words) were used. The design was completed with 168 fillers: 84 non-compound words and 84 non-words. We predicted practically equal accuracy rates for all groups at comparable levels of proficiency, since the effect is not expected to override lexical knowledge; a faster performance of the monolingual group, due to an attested higher processing cost in bilinguals (Costa, 2005); and shorter response latencies for the Basque-dominant bilinguals as opposed to their Spanish-dominant counterparts, since the critical structure is hypothesised to be more readily available for the former group. Results have largely matched our predictions: two-way ANOVAs performed on the data indicated no significant effect of the participants’ linguistic profile on their accuracyrates, a factor which was however significantly influential when it came to their response latencies to the critical conditions. That is, while all participants, irrespective of language group, performed equally well when compared to their proficiency-matched counterparts, Basque-dominant bilinguals were significantly faster at processing English deverbal compounds than their Spanish-dominant peers. These results will be considered in light of models of L3 transfer, for which they might have important implications.

    [18]    Julián Villegas, Martin Cooke, and Catherine Mayo. The influence of temporal and spectral modifications on the intelligibility of normal and Lombard speech. In Proc. SPiN–2012: The 4 Int. Wkshp. on Speech in Noise: Intelligibility and Quality, Cardiff, UK, Jan 2012.

    The current study manipulated independently spectral and durational parameters of ‘normal’ and Lombard utterances. For each parameter, normal speech was modified to take on the values observed in Lombard speech, while Lombard speech was modified to match those values found in normal speech. Modifications were applied globally or instantaneously. Durational modifications had no effect on intelligibility, while spectral changes led to large gains. Global modifications produced larger effects than instantaneous modifications. These findings suggest that most of the intelligibility benefit of Lombard speech is due to the release from energetic masking resulting from spectral changes. However, Lombard speech retains some residual intelligibility benefit. The current study demonstrates that this residual gain is unlikely to be due to the slower speaking rate observed in Lombard speech.

    [19]    Martin Cooke, Vincent Aubanel, Julián Villegas, and Maria Luisa Garcia Lecumberri. Lombard, interactional and overlap effects while conversing in the presence of competing speech. In Proc. of Int. Wkshp. “Computational Audition”, Delmenhorst, October 2011.

    One aspect of the cocktail party problem which has hitherto received little attention is the role played by the interlocutors themselves in facilitating comprehension during conversations which take place in noise. Studying speech produced in noise (Lombard speech) is not new, but less is known about a talker’s response to ‘noise’ consisting of competing speech, especially in real conversations, where modifications to both low-level acoustic parameters as well as higher-level interactional aspects can be expected. Understanding a talker’s response to adverse conditions during spoken communication might lead to improvements in speech output technologies (for instance, rendering equal intelligibility at lower presentation levels, or more appropriate timing of interventions in dialogue systems). Here, we present results from two studies involving speech produced in the presence of intelligible speech.

    [20]    Julián Villegas, Vincent Aubanel, and Martin Cooke. Temporal changes in conversational interactions induced by the presence of a simultaneous conversation. In Proc. escop, the 17th Meeting of the European Soc. for Cognitive Psychology, Donostia – San Sebastián, Spain, Sep. 2011.

    This study aims to better understand the changes in foreground conversations induced by background conversations, particularly modifications in the temporal domain including overlaps between foreground and background speech. Understanding the strategies that humans adopt to orally communicate with a peer in the presence of competing dialogs could give some useful insights for developing improved human–computer interfaces, delivering aural information more effectively, etc. In comparison to the acoustic effects of a background dialog in a conversation, our knowledge on background conversation interactional effects is rather limited. In experiments involving simultaneous conversations, we have found intensity and fundamental frequency increments, speech rate decrements, and other changes associated with the Lombard effect in speech produced in the presence of competing talkers. Interactional effects such as greater number of interruptions and dysfluencies, and less accurate turn taking were also seen. Unlike previous studies, we observed no reduction in overlap between foreground and background speech. We hypothesise that this unexpected result could be explained by visual cues used by the subjects during the conversation,methodological differences (i.e., as opposed to free conversations, previous reports focused on task-oriented experiments), stimuli differences (a single competing talker instead of a spontaneous talking pair).

    [21]    Martin Cooke, Vincent Aubanel, and Julián Villegas. Speaking in the Presence of Background Speech. In Proc. Int. Wkshp. Speech in Noise: Intelligibility and Quality, Lyon, France, Jan 2011.

    [22]    Julián Villegas and Michael Cohen. Mapping Topological Representations of Musical Scales Onto Virtual 3D Spaces. In Proc. iwpash: Int. Wkshp. on the Principles and Applications of Spatial Hearing, Zao, Japan, Nov 2009.

    We have developed a Collaborative Virtual Environment (cve) client that allows directionalization of audio streams using a Head-Related Transform Function (hrtf) filter. The cve is a suite of multimedia and multimodal clients, authored mostly in Java by members of our laboratory. Its simple but robust synchronization mechanism is used for sharing information regarding location and position of virtual objects among multiple applications [?]. The new client has been deployed in conjunction with the Helical Keyboard, an interactive installation displaying three-dimensional musical scales aurally and visually, to offer a more realistic user experience and musical immersion. It allows spatialization of audio sources with elevation angles between 40 and +90 and azimuth angles between 0 and 355. In this fashion we could overcome previous limitations on the auditory display of our objects, for which we heretofore usually displayed only azimuth.

    [23]    Enrique López de Lara, Jerold A. DeHart, Julián Villegas, and Subhash Bhalla. RssKanji: Japanese Vocabulary Learning using RSS Feeds and Cellular Phones. In Proc. of jalt2006: 32nd Annual International Conference on Language Teaching and Learning & Educational Materials Exposition, Kitakyushu, Fukuoka, Japan, 2006.

    The acquisition of vocabulary plays a central role in the learning of a new language. For learners of Japanese, the task involves memorizing thousands of new words and three main aspects of each word— the readings, the pictograms used and the meaning of the word in the learner’s native language. We developed a prototype of a system to improve retention of new words. The system incorporates vocabulary learning techniques such as visual mnemonics, repeated exposure to words, short quizzes followed by verification of accuracy, and grouping of words by user-generated tags. Our prototype differs from existing vocabulary learning systems in that it uses rss feeds to provide ubiquitous access to the words being learned. The system provides a Web interface to a native xml database where users can store new words. Using this database, the system generates word lists in rss format at regular intervals from randomly selected words. A user can subscribe to his own feed or to the feeds of other users, allowing for collaborative vocabulary learning. If one of the words in the list is not yet memorized by the user, the user can click on it to have a small quiz about the correct readings, meaning or pictograms for the word. The rss feeds can be accessed from a pc inside the classroom or from a mobile phone outside the classroom.

    Invited talks

    [1]    Julián Villegas. Measuring acoustic feature from audio and egg recordings. In Proc. Int. Electroglottography Workshop, International Christian University (Tokyo), Oct 2016.

    [2]    Julián Villegas. Parallels between musical consonance and speech intelligibility. In Summer school of linguistics, Da˘c ice, Czech Republic, Aug. 2012.

    In this talk, the origins of speech intelligibility and musical consonance are discussed. Physical, perceptual, and cognitive causes of these complex phenomena have been identified and the understanding of their interactions is still a very active topic of research. We will focus on acoustic and psycho-acoustic features of speech and music and present the effect of them on intelligibility and consonance. Particularly, the role of fundamental frequency in both phenomena will be discussed: we will show the effect of scaling the fundamental frequency of voiced regions on objective speech intelligibility. The frequency scaling is driven by maximizing the glimpse proportion, inspired by musical consonance maximization techniques.

    [3]    Julián Villegas. The role of speech rate of speech intelligibility. In Summer school on linguistics, Da˘c ice, Czech Republic, Aug. 2012.

    Speech rate has been identified as one of the main differences between speaking styles. Clear speech (or the speech produced by someone who has been asked to speak clearly) and Lombard speech (or speech produced in noisy environments) are more intelligible than other styles (like casual and “normal” speech), and also have a slower speech rate. In this talk, we will discuss the role of speech rate on intelligibility, show how to artificially modify speech rate either by stretching or compressing an utterance or by time-aligning one to another.Rather than discussing the inner mechanisms of the signal processing, we will focus on understanding the differences between the two modalities and how to use existing software applications to modify duration.

    [4]    Julián Villegas. Acoustic modifications of speech and intelligibility in noise. In Proc. issm’11–’12: Int. Symp. on Spatial Media, Aizu-Wakamatsu, Japan, Mar. 2012.

    [5]    Julián Villegas. Listen to what i say: environment-aware speech production. In Japan-Woche der FH Düsseldorf Wkshp. on Mixed Reality and Virtual Environments, Düsseldorf, May 2011.

    Major challenges to adapt all forms of speech output to a given auditory context (e.g., noisy or highly reverberant environments, second language or hearing-impaired listeners, etc.) based on human speaker strategies are discussed. Ongoing research aimed at increasing speech intelligibility in real-time without compromising speech quality (or fatiguing the listener) is described, and software applications used in this research are presented. This talk will also present auditory demonstrations of natural and artificial speech modifications.

    [6]    Julián Villegas. Unconventional 3d-sound Controllers. In Michael Cohen, editor, Proc. issm’10–’11: Int. Symp. on Spatial Media, Aizu Wakamatsu, Japan, March 2011.

    Wireless technologies allow the introduction of sensors in otherwise unanticipated devices, increasing the opportunities of interaction with real-world, daily-life things. In this paper the idea of controlling three-dimensional audio by means of flying discs is presented, the prototype implementation (based on gyroscopes, Xbee radios, Arduino micro-controllers, Pure-Data and Quartz Composer patches) helps to understand the challenges, capabilities and limitations of the underlying technologies that make such interactions possible.

    [7]    Michael Cohen and Julián Villegas. Spatial Sound and Entertainment Computing. In icec: Int. Conf. on Entertainment Computing, Seoul, September 2010.

    This tutorial introduces the theory and practice of spatial sound for entertainment computing, including psychophysical (psychoacoustic) basis of spatial hearing, outlines the mechanism for creating and displaying spatial sound the hardware and software used to realize such systems, display configurations, and reviews some applications of spatial sound to entertainment computing, especially multimodal interfaces, featuring spatial sound. Many case studies reify the explanations; animations, videos, and live demonstrations are featured

    Supervised graduate research

    [1]    Taku Nagasaka. Elevation of sound by spectral energy equalization and delay adjustments using single-layer loudspeaker arrays. Master’s thesis, University of Aizu, University of Aizu, Mar 2015. Supervisor: Julián Villegas.

    [2]    Shunsuke Nogami. Lateralization of sound by spectral energy equalization and delay adjustments using single-layer loudspeaker arrays. Master’s thesis, University of Aizu, University of Aizu, Mar 2015. Supervisor: Julián Villegas.

    [3]    Tomomi Sugasawa. Relative influence of spectral bands in horizontal-front localization of white noise. Master’s thesis, University of Aizu, University of Aizu, Mar 2014. Supervisor: Julián Villegas.

    Supervised student papers

    [1]    Taku Nagasaka, Shunsuke Nogami, Julián Villegas, and Jie Huang. Influence of spectral energy distribution on elevation judgments. In Proc. 139 Audio Eng. Soc. Int. Conv., New York, Oct. 2015.

    The relative influence of spectral cues on elevation localization was investigated by comparing judgements of loudspeaker reproduced stimuli spatialized with three methods: 3D vector-based amplitude panning (3D- vbap), 2D-vbap in conjunction with hrir convolution, and equalizing the stimuli to simulate spectral peaks and notches naturally occurring at different angles (equalizing filters). For the last two methods a single horizontal loudspeaker array was used. As expected, smallest absolute errors were observed in the vbapjudgements regardless of presentation azimuth; no significant difference in the mean absolute error was found between the other two methods. But, for most presentation azimuths, the method based on equalizing filters yielded less dispersed results. These results could be used for improving elevation localization in two-dimensional vbap reproduction systems.

    [2]    Shunsuke Nogami, Taku Nagasaka, Julián Villegas, and Jie Huang. Influence of spectral energy distribution on subjective azimuth judgements. In Proc. 139 Audio Eng. Soc. Int. Conv., New York, Oct. 2015.

    In this research, we compare subjective judgements of azimuth obtained by three methods: Vector-Based Amplitude Panning (vbap), vbap mixed with binaural rendition over loudspeakers (vbap+hrtf), and a newly proposed method based on equalizing spectral energy. In our results, significantly smaller errors were found for the stimuli treated with vbap and hrtfs; differences between the other two treatments were not significant. Regarding spherical dispersion of the judgements, vbap results have the greatest dispersion, whereas the dispersion on the results of the other two methods were significantly smaller, however similar between them. These results suggest that horizontal localization using vbap methods can be improved by applying a frequency dependent panning factor a opposed to a constant scalar as commonly used.

    [3]    Bektur Ryskeldiev, Michael Cohen, and Julián Villegas. Rendering spatial audio through dynamically reconfigurable smartphone loudspeaker arrays. In Proc. Int. Conf. on Virtual Reality Continuum and Its Applications in Industry, Kobe, October 2015.

    Spatial audio for multiple listeners can be rendered through an array of loudspeakers. However, changing loudspeaker locations during a listening session is cumbersome, since circumstances depend on various physical conditions, such as the size of a listening room, number of participants, or locations of power sources. This study investigates alternatives forreconfigurable scenarios through wireless audio streaming via smartphones, as well as indoor positioning techniques for creation of robust auditory imagery.

    [4]    Ryo Igarashi and Julián Villegas. Steganography using audible signals for short distance communication. In Proc. Acoust. Soc. Japan, Autumn meeting, Aizu Wakamatsu, Japan, Sep. 2015. In Japanese.

    [5]    Yu Ito and Julián Villegas. Bass enhancement by actuator-enabled vest. In Proc. Acoust. Soc. Japan, Autumn meeting, Aizu Wakamatsu, Japan, Sep. 2015. In Japanese.

    [6]    Taku Nagasaka, Julián Villegas, and Jie Huang. Novel gui system for recording subjective responses in hearing experiment. In Proc. Acoust. Soc. Japan, Autumn meeting, Aizu Wakamatsu, Japan, Sep. 2015. In Japanese.

    [7]    Shunsuke Nogami, Taku Nagasaka, Julián Villegas, and Jie Huang. Improvement of azimuth perception in single-layer speaker array systems. In Proc. Acoust. Soc. Japan, Autumn meeting, Aizu Wakamatsu, Japan, Sep. 2015. In Japanese.

    [8]    Tsubasa Takahashi and Julián Villegas. Study on bimodal navigation systems. In Proc. Acoust. Soc. Japan, Autumn meeting, Aizu Wakamatsu, Japan, Sep. 2015. In Japanese.

    [9]    Tomomi Sugasawa, Jie Huang, and Julián Villegas. Relative influence of spectral bands in horizontal-front localization of white noise. In Proc. 137 Audio Eng. Soc. Conv., Oct. 2014.

    The relationship between horizontal-front localization and energy in different spectral bands is investigated in this research. Specifically, we tried to identify which spectral regions produced changes in the judgments of the position of a white noise when each band was removed from a front loudspeaker and presented via side loudspeakers. These loudspeakers were set at left and right from the front-midsagittal plane of the listener. Participants were asked to assess whether the noise was coming from the front loudspeaker as bands weremoved from front to side loudspeakers. Results from a pilot study suggested differences in the relative importance of spectral bands for horizontal-front localization.

    [10]    Wataru Sanuki, Julián Villegas, and Michael Cohen. Spatial sound for mobile navigation systems. In Proc. 136 Audio Eng. Soc. Conv., 2014.

    [11]    Tetunobu Ohashi, Julián Villegas, and Michael Cohen. Controlling tempo in real-time with mobile devices. In Proc. Tohoku Section Joint Conv. of Institutes of Electrical and Information Engineers, page 85, Aug. 2013.

    In this research we explored the ability of controlling the reproduction speed of a digitized melody by means of portable devices such as smartphones.

    [12]    Bektur Ryskeldiev, Julián Villegas, and Michael Cohen. Exploring virtual sound environments with mobile devices. In Proc. of Tohoku Section Joint Conv. of Institutes of Electrical and Information Engineers, page 18, Aug. 2013. Best Paper Prize, Student Section.

    The aim of this research is to explore virtual sound environments with mobile devices, using iOS as a main platform and Pure Data (Pd) as a backend for sound processing. The underlying calculations are based on human’s natural and linear interpolation between virtual sound sources. As a result, the developed application allows user to “walk around” virtual concerts, as well as to experiment with positions of sound sources by moving them GUI manually.

    [13]    Shogo Saze, Julián Villegas, and Michael Cohen. Map- and photo-enabled navigation assistance in a driver simulator. In Proc. of Tohoku Section Joint Conv. of Institutes of Electrical and Information Engineers, page 279, Aug. 2013.

    In this research we exp.ored the feasability of using driving simulators for a more intuitive navigation through maps and Google street view.

    Original music

    [1]    Julián Villegas. Original Music for “El Proyecto del Diablo”. tv Broadcasted by Rostros y Rastros (uvtv), 1999. Documentary directed by Óscar Campo (25 Min.). www.imdb.com/title/tt0483127.

    “Todos los caminos me llevan al infierno Pero si el infierno soy yo!”, es la frase con la que inicia el monólogo que narrará a través de todo el documental La Larva, quien no sabemos si está vivo o muerto. En El Proyecto del Diablo conocemos la historia y los pensamientos más reveladores de este hombre: Desde que probó la marihuana en el colegio, pasando por sus experiencias tropeleras en la universidad, hasta los momentos de mayor éxtasis con las drogas y su estadía en la cárcel por doce años. Descubrimos así la mentalidad de quien sedejó atrapar y llevar por la maldad.

    [2]    Julián Villegas. Original Music for “Aquí No Canta Nadie”. tv Broadcasted by Rostros y Rastros (uvtv), 1998. Short Film directed by Pilar Chávez (15 min.).

    Esta es una de las tantas historias que puede acontecer el campo Colombiano. Una noche, en una finca, tres niños escuchan los relatos de terror narradas por una mujer, quien les advierte que no deben salir de casa, pues el diablo puede llevarlos a una muerte horrible. Al amanecer uno de los niños ha desaparecido; sus dos hermanos lo buscan por toda la casa sin encontrarlo por ninguna parte. El hermano mayor se atreve a salir en su búsqueda y logra regresar con su hermano a casa, sin embargo, en esta travesía se confrontaron sus recuerdoscon una cruda verdad.

    [3]    Julián Villegas. Original Music for “El Terminal”. tv Broadcasted by Rostros y Rastros (uvtv), 1998. Documentary directed by Margarita Arbeláez, Luz Elena Luna, Ximena Bedoya, Andrea Rosales, Juan Camilo Duque, Claudia Villegas, and María Fernanda Gutiérrez (26 min.).

    Este documental retrata la rutinas, los personajes y sucesos que acontecen en El Terminal de Transportes de Cali. El paso del tiempo en el documental transcurre al ritmo de la espera y la ansiedad de los viajeros: Gente que va y viene; otros se despiden y se alejan; otros llegan sin conocer. El día pasa y llega la noche, lo que marca diferentes ritmos de la vida en El Terminal.

    [4]    Julián Villegas. Original Music for “El Ojo de Buziraco”. tv Broadcasted by Rostros y Rastros (uvtv), 1997. Four Short Films directed by David Bohórquez (26,25,26,24 min.).

    A four part series, granted by the Ministry of Culture of Colombia, about the myths and legends of the Colombian Pacific coast which draws a thin line between fact and fiction. El ojo de Buziraco I: Nadie vio nada El Ojo de Buziraco es una serie compuesta por cuatro capítulos que reactualiza algunos mitos colombianos (mantenidos a través de la tradición oral) en ámbitos urbanos. En el primer capitulo titulado Nadie vio nada, dos hombres de edad madura reconstruyen, a través de su testimonio, la historia que oyeron de sus antecesores a cerca del mito del Mohan, mounstro de los ríos; y paralelamente, se desarrolla una historia de ficción en la que una banda criminal de la ciudad se verá acechada y atacada por este personaje. El ojo de Buziraco II: Tente en el aire En el segundo capítulo de la serie El Ojo de Buziraco, la historia gira en torno a un grupo de universitarios, estudiantes de audiovisuales, que indagan acerca de la función de los “cuentos de miedo” en la educación de tres adultos mayores, a quienes dichas historias fueron contadas por sus padres y abuelos. La novia de uno de los jóvenes ha fallecido recientemente y a medida que los testimonios de las entrevistas describen el mito de La Tunda, él comienza a experimentar acerca- mientos con la difunta: la ve en pesadillas recurrentes, en la pantalla de los monitores de edición y a través de la cámara; al final, la presencia fantasmal aparece y lo convierte en una víctima más de La Tunda. El ojo de Buziraco III: La guerra de Mandrágora La guerra de Mandrágora es el tercer capítulo de la serie El Ojo de Buziraco donde los entrevistados sostienen que las brujas sí existen. En la puesta en escena, una mujer acude a una bruja para buscar solución a las pesadillas que atormentan a su marido; luego, la mujer en su soledad y frente a la sospecha de ser engañada por su esposo, vuelve donde la bruja quien ha decidido que ella debe ser su sucesora. Una vez terminados los ritos de iniciación, aparece el diablo castigando a la bruja y la historia tiene un final inesperado. El ojo de Buziraco IV: El Vampiríparo El último capítulo de la serie El Ojo de Buziraco, acontece en un salón de clases donde el estudiante más «atontado», Medardo, comienza a tener una serie de alucinaciones. En los momentos de mayor presión y frustración, se ve a si mismo como un nosferatu, siempre en la búsqueda de un mentor: un vampiro «real». Dicha obsesión, llevará a Medardo a vivir múltiples situaciones en las que se verá comprometida su integridad. Entre tanto, los entrevistados, que son la cuota documental del audiovisual, sostendrán en sus testimonios la no-existencia de los vampiros y recrearán historias que se convirtieron en mitos urbanos y que posiblementeinspiraron la leyenda acerca de la existencia del vampirismo en la ciudad de Cali.

    [5]    Julián Villegas. Original Music for “Mario y las Voces”. tv Broadcasted by Rostros y Rastros (uvtv), 1997. Short Film directed by Carlos Espinosa (15 min.).

    Relato futurista en el que las máquinas ejercen pleno control sobre los hombres. Esta tecno-sociedad de la represión, sin embargo, oculta una verdad que le es revelada a Mario desde una suerte de submundo habitado por quienes han decidido resistir: tras el poder de las máquinas se esconde el poder de algunos pocos hombres.

    [6]    Julián Villegas. Original Music for “No, no…baby”. tv Broadcasted by Rostros y Rastros (uvtv), 1997. Short Film directed by Diego Pérez (13 min.).

    No, no…baby es una historia que combina elementos de drama y suspenso alrededor de las extrañas, violentas y perversas relaciones que se tejen en un grupo de amigos, entregados con pasión al consumo de la carne. Los excesos desatan sus instintos caníbales al momento que aparecen los conflictos sentimentales entre ellos. El desenlace de esta historia es llevado al extremo cuando vemos que el grupo termina devorándose mutuamente mientras observan películas de Stanley Kubrick.


    [1]    Julián Villegas. Creakiness by roughness, 2017. Retrieved March 31, 2017. Available from http://onkyo.u-aizu.ac.jp/index.php/software/creakbyr/.

    [2]    Julián Villegas. The 60th anniversary of Tohoku chapter of the Acoustical Society of Japan, chapter Computer Arts Lab at the University of Aizu. Acoustical Society of Japan, 2016.

    [3]    Julián Villegas. Beating and Roughness. The Wolfram Demonstrations Project, September 2010. [Online; accessed 23-Sep-2010]:http://demonstrations.wolfram.com/BeatingAndRoughness.

    A demonstration of beating sinusoids showing fluctuation strength, roughness, and tone separation.

    [4]    Julián Villegas. GoldenM. 360 degrees of 60 × 60 event at icmc 2010: The Int. Computer Music Conf., June 2010.

    GoldenM is a computer-generated composition created in Pd. It is an arrangement for three voices and filtered white noise. In GoldenM, each voice has a spectrum based on the golden ratio (about 1.61803), and the pitch set was selected using the minima of the dissonance function as proposed by Vassilakis. Rhythm and the spatialization are generated using Markov chains. The purpose of the composition is to use the golden ratio, in unnatural ways preserving, in some extent, its esthetic nuance. The result, at moderate volume, resembles (at least to the author) the sound of chimes and bells used in Asian musical traditions.

    [5]    Julián Villegas. Psychoacoutic Roughness Applications in Music: On Automatic Retuning and Binaural Perception. PhD thesis, University of Aizu, Aizu-Wakamatsu, Japan, March 2010.

    The goal of this study is to help to understand the influence of psychoacoustic roughness in music. Roughness is an auditory attribute produced by rapid temporal envelope fluctuations (normally resulting from wave interferences), and it has been related to musical dissonance. After reviewing the main theories that explain the origin of roughness, a software program created for the purpose of this research is presented. This software application, based on a spectral model to predict roughness (a physical predictor of the auditory attribute), is able to control (usually, to reduce) the predicted roughness of a sound ensemble in realtime. Experimental results were analyzed with a standard measurement software tool to corroborate the predicted roughness reduction. The audio output of this software application was compared by human subjects with renditions of the same musical content using some well known tuning systems (twelve tones equal tempered and just tuning). The results of these subjective experiments are presented and analyzed. Preliminary results on binaural roughness perception are presented at the end of the dissertation as a new direction of research. Contributions of the present work include the creation of an adaptive tuning program capable of retuning audio streams in realtime to minimize the measured roughness due to the interaction between sounds (extrinsic roughness). To the best of our knowledge, this procedure hadbeen applied only to midi sequences for which realtime constraints implied oversimplifications that are not assumed in our program. We were able to determine that roughness by itself can explain musical preference among musically naïve participants. In the analysis of that experiment, we found that, contrary to popular belief, predicted roughness of 12tet intervals is not always greater than pure intervals. This discovery correlates with preference choices reported by participants. We also show that current roughness models need to be revised to include the effect of binaural cues. Other minor contributions include several entries in Wikipedia (e.g., Vicentino’s keyboard layout) and to Mutopia (e.g., Bach choral BWV 264).

    [6]    Julián Villegas. Encuentros entre Colombia y Japón: homenaje a 100 años de amistad, chapter De como el mundo es un pañuelo y de las misteriosas maneras (Of how the world is a handkerchief and the mysterious ways). Colombian Ministry of Foreign Affairs, Bogotá D.C., Colombia, 2010. (Fiction, in Spanish).

    [7]    Julián Villegas. Local Consonance Maximization in Realtime. Master’s thesis, University of Aizu, Aizu-Wakamatsu, Japan, September 2006.

    Although the problem of maximizing consonance in tonal music has been addressed before, every solution reflecting the technological advances of its epoch, and considering that current theories to explain this psychoacoustical phenomenon are generally satisfactory, there are still vast unexplored aspects of this area, since even most recent solutions lack adequate mechanisms to apply such techniques in realtime scenarios. In general, the most advanced achievements in this field are based on the midi protocol for controlling the pitch of simultaneous notes, inheriting the protocol limitations in terms of dependency on the quality of the synthesizer for satisfactory results, scalability, accuracy, veracity, etc. Besides that, timbres are generally known a priori for these techniques, so their application to unknown timbres requires digitization and analysis of sound samples, making such techniques unsuitable for realtime situations. This thesis summarizes the main theories about consonance and its relation to musical scales, reviews several previous solutions as well as the state of the art, proposes an alternative model to adaptively adjust consonance in a polyphonic scenario based on the tonotopic dissonance paradigm (presented by Plomp and Levelt, having been previously developed by Sethares), and presents a prototype of this model that aims to surmount the difficulties of prior solutions by performing realtime analysis and pitch adjustment programmed in Pure-data (Pd), a data flow dsp environment for realtime audio applications. The results are analyzed to determine the efficacy and efficiency of the proposed solution.

    [8]    Felipe Millán Constaín, Juan Camilo Paz, Alfredo Roa, Julián Villegas, Nicolás Carranza, Diego Briceño, and Alex Mera. Medición de la Productividad del Valor Agregado (Added Value Productivity Measurement). Servicio Nacional de Aprendizaje (sena), 2nd edition, 2003. in Spanish. [Online; accessed 31-Aug-2008]: http://cnp.org.co.

    [9]    Julián Villegas. Diseño e Implementación de un Algoritmo Genético para la Asignación de Aulas en la Universidad del Valle (Design and Implementation of a Genetic Algorithm for Timetabling at the University of Valle). Undergraduate honors thesis, University of Valle, Cali, 2001. In Spanish.

    Cómo aplicar las ventajas de las técnicas de procesamiento paralelo y de inteligencia artificial, específicamente de Algoritmos Genéticos, en la solución del problema de asignación de aulas, inicialmente en la Universidad del Valle, además de comparar la eficiencia del sistema actual con el nuevo esquema propuesto, es el objeto de esta tesis. En el primer capítulo se presenta una Introducción a la Teoría de la Computación en Paralelo: se muestran las diferentes fuentes de paralelismo en un programa y las diferentes arquitecturas de software y hardware disponibles para lograr paralelismo. Se comparan las diferentes opciones con las necesidades de la presente tesis. En el segundo capítulo se presenta una Introducción General a los Algoritmos Genéticos: se explica su funcionamiento, se presentan los operadores genéticos más comunes, las estrategias de selección de más aceptación; también, se muestra la relación que hay entre la computación en paralelo y los algoritmos genéticos. Finalmente, se hace una presentación de una implementación popular de un algoritmo genético paralelo. En el tercer capítulo, se presenta el problema general de asignación de aulas, de horarios, y sus principales características. En el cuarto capítulo se presenta la evolución del problema de la asignación de aulas y horarios y el estado del arte. En el quinto capítulo se analiza el proceso actual (enero – mayo 2000) de asignación de aulas en la Universidad del Valle; se determinan las entradas y salidas del proceso, se calcula la dimensión del problema, se recopilan los requerimientos de los entes involucrados, y se identifican las restricciones y prioridades tenidas en cuenta en el proceso de asignación de aulas. En el sexto capítulo se propone una solución al problema de asignación de aulas basada en algoritmos genéticos paralelos: resume el proceso de instalación de 6 PVM, la configuración empleada y algunos detalles que no son tan claros en la documentación que viene con las fuentes de PVM; además, se muestra la instalación de SSH y la manera de emplearlo conjuntamente con PVM en una red donde la seguridad es importante. En el séptimo capítulo, se hace un análisis de los resultados obtenidos y se discuten posibles desarrollos posteriores, mejoramientos y refinamientos del algoritmo genético propuesto. En el octavo se anexan el código fuente en C de los programas que hacen parte de la solución (el código del algoritmo genético maestro y el esclavo), como también los códigos SQL de las consultas realizadas a la base de datos para extraer la información necesaria para la asignación. Además, un papel presentado en GECCO – 2000 (Genetic and Evolutive Computation Conference – 2000). En el noveno capítulo se presentan las conclusiones generales del presente proyecto de grado. El décimo capitulo presenta las referencias bibliográficas empleadas en el desarrollo del presente proyecto.

    March 31, 2017