Saund project

Updates

May 2019: Final website update with recent publications.
April 2018: Website updated with recent publications.
April 2017: Website updated with recent publications.
March 2016: Website available.

Introduction

This 3-year project aims to build the next generation of sound spatializers to be used in conjunction with Head-Mounted Displays (HMDs) in Virtual Reality (VR) environments. As a result of this project, a prototype of a real-time virtual sound spatializer which allow distance control for binaural reproduction will be implemented. This proof-of-concept would let us evaluate the accuracy of the spatialization methods especially on the near-field and close to virtual walls. By demonstrating its feasibility and benefits, we expect that the general public take greater interest on the introduction of spatial sound along with visual 3D-technologies.

Scientific background for the research

This project, SAUND, aims to develop a sound spatializer (a software program) that can be used with Head-Mounted Displays (HMD). An HMD is a device featuring 3D displays that is usually worn as a helmet or pair of glasses. The proposed spatializer would be able to simulate not only distance changes of remote sound sources (as current spatializers do) but those produced in the near proximity of the user. Specifically, we plan to develop a prototype of such spatializer as depicted in Figure 1.

Conservative estimates [1] put the number of HMD units sold by 2020 around 25 million. This multi-billionaire market would increase the demand for immersive contents not only visually, but also auditory, tactile, etc. Auditory, i.e., the sense of hearing, is arguably the second most important modality of perception.

In contrast with the verisimilitude of the visual images obtained with HMDs, the projection of auditory images still faces many hurdles preventing such realism. For example, binaural recordings (recordings made with two mikes fit into a surrogate of the listener: another person, mannequin, etc.) allow impressive auditory illusions which also capture the room characteristics where the recording was made. However, situations arise where such recordings become impractical as in the case of virtual worlds where the interactions of agents (inhabitants, assets, etc.) are difficult to predict.

Although the angle from which a sound is being projected is relatively easy to simulate, its auditory distance constitutes an elusive problem. Several techniques have been proposed to solve it: simulating changes with monaural intensity (i.e., changing the sound level on both ears by the same amount) [2]; manipulating the acoustic energy traveling directly from the sound source to the listener’s ears and the given by other paths (bouncing against walls, etc.)— direct-to-reverberant energy ratio [3]; computing a Distance Variation Function (DVF) [5], i.e., a function describing changes in the filtering effect of the ears, head, and upper body (Head-Related Impulse Responses—HRIRS) when a sound source moves from far- to near-field; decomposing HRIRs into spherical harmonics (a mathematical way to describe a sound field) [4]; interpolating previously captured HRIRs in the near-filed [6]; etc. To complicate this problem, when a listener is in the near- field of a source (i.e., at a relatively short distance) or in the proximity of walls, monaural intensity changes are not longer adequate. Furthermore, distance judgements of virtual stimuli in the near- field tend to be overestimated by listeners, but the causes of such error are currently undetermined.

The purpose of this research is to bridge the gap between realism of visual and auditory displays in HMDs, especially for near-field sources. Since headphones are the de facto reproduction apparatus for HMDs, in this research, we will focus on the Simulation of AUditory Near-field Distance (SAUND) on headphone (binaural) reproduction.

Expected achievement and results

With this system in place, a typical user would be able to hear multiple sounds coming from arbitrary directions and distances, corresponding to those virtual assets in a scene: e.g., other users, sound effects, etc. For each audio stream (belonging to an asset), a single spatializer block is used, the location (x, y, z coordinates) of each asset is retrieved from the logic of the VR scene as well as the position of the user (x, y, z and rotations around these axes). The resulting sound mix is then presented to the user, commonly, via headphones.

The creation, deletion, assignation, and management of the spatializer modules is performed by the spatializer manager module. For data and audio communication, routing protocols such as Open Sound Control (osc) would be used.

Originality

Several aspects of Saund are new including:

The accurate spatialization of sound sources in the near-field (current systems used in HMDs do not deal with this feature correctly).
The interaction of Saund with HMDs, a promising but still mainly visual technology.
The use of osc or similar protocols to compartmentalize processing load and software development.

Expected results

Besides the dissemination of our results, we will build a sound spatializer prototype to demonstrate feasibility and benefits of near-field spatialization. This prototype would feature the functionalities aforementioned, but integration with a full-scale VR system is beyond the scope of this project.

Significance of the project

The way we interact with information will drastically change when HMDs, or similar technologies, are widely adopted. Hence, we must prepare audio technologies with similar capabilities than their visual counterparts.

Saund aims to unveil latent difficulties to achieve this goal and to stress the importance of spatial sound in virtual environments. These are some of Saund’s significant aspects:

VR experience enhancement: An accurate reproduction of a sound field is necessary to enhance immersion, interaction, and presence, three of the most important VR characteristics.
Big data exploration: In many cases, sound has been proved to be superior than vision for finding patterns among huge amount of data. Naturally, spatialization adds another dimension to such exploration capabilities.
Baseline for new research: By the research conducted in this project, we would like to stimulate the scientific discussion on the application of near-field spatial sound on VR.

[1] B. Intelligence. The virtual reality report: Forecasts, market size, and the trends driving adoption. Technical report, 2015.
[2] D. H. Mershon and L. E. King. Intensity and Reverberation as Factors in the Auditory Perception of Egocentric Distance. Perception & Psychophysics, 18(6):409–415, 1975.
[3] P. Zahorik, D. S. Brungart, and A. W. Bronkhorst. Auditory Distance Perception in Humans: A Summary of Past and Present Research. Acta Acustica united with Acustica, 91(3):409–420, June 2005.
[4] M. Pollow, K.-V. Nguyen, O. Warusfel, T. Carpentier, M. M¨uller-Trapet, M. Vorl¨ander, and M. Noisternig. Calculation of head-related transfer functions for arbitrary field points using spherical harmonics decomposition. Acta acustica united with Acustica, 98(1):72–82, 2012.
[5] A. Kan, C. Jin, and A. van Schaik. A psychophysical evaluation of near-field head-related transfer functions synthesized using a distance variation function. J. Acoust. Soc. Am, 125(4):2233–2242, 2009.
[6] *J. Villegas. Locating virtual sound sources at arbitrary distances in real-time binaural reproduction. Virtual Reality, Oct 2015.

Project management and workload balance

Saund project comprises six work packages (WP) as illustrated in Figure 2. These WPs correspond to different aspects of the project, concretely,

WP1 comprises the subjective evaluation of different methods used for near-field localization of virtual sources (see p. 1 for a short review).

The real-time implementation of the spatializers (on Pure-data programming language) are grouped in WP2, tasks related to the development of visual contents for HMD visualization (developed on Unity) will be conducted in WP3, while WP4 will cover the integration of visual and audio parts, as well as its evaluation.

Dissemination and demonstration tasks are contemplated in WP5, and the administrative tasks to guarantee the normal development of the project (monitoring, coordination, progress meetings, technical progress, objective achievement, financial issues, communication, quality assurance and punctuality of reports and demonstrations) are contained in WP6.

Work packages 1, 2, 4, and 6 will be led by Assoc. Prof. Julian Villegas, whereas WP3 and WP5 by Senior Assoc. Prof. Jie Huang.

Project team

Julian Villegas: http://onkyo.u-aizu.ac.jp/

Jie Huang: http://web-ext.u-aizu.ac.jp/~j-huang/

Publication list

2016

[1] J. Villegas, J. Perkins and S. J. Lee, “Psychoacoustic roughness as creaky voice predictor” in J. Acoust. Soc. Am., vol. 140, pp. 3394–3394, Dec 2016.
[2] J. Villegas and T. Ninagawa, “Pure-data-based transaural filter with range control” in Proc. 5th Int. Pure Data Convention, Nov 2016.
[3] J. Villegas, T. Stegenborg-Andersen, N. Zacharov and J. Ramsgaard, “A comparison of stimulus presentation methods for listening tests,” in Proc. 141 Audio Eng. Soc. Int. Conv., Sep. 2016.
[4] Y. Nagayama, A. Saji and J. Huang. Distance factor for frontal sound localization with side loudspeakers. In Proc. 141st AES Convention, page no.9678. AES, September 2016.
[5] R. Kaneta, A. Saji and J. Huang. 3D sound localization system using two side loudspeaker matrices. In Proc. 141st AES Convention, page EB.280. AES, September 2016.

2017

[1] J. Villegas, N. Fukasawa, and Y. Suzuki, “Improving elevation perception in single-layer loudspeaker array display using equalizing filters and lateral grouping,” in Proc. 143 Audio Eng. Soc. Int. Conv., Oct. 2017.
[2] Y. Sato, A. Saji, and J. Huang. “A 3D sound localization system using two side loudspeaker matrices,” In Proc. 143 Audio Eng. Soc. Int. Conv., Oct. 2017.
[3] Y. Ono, A. Saji, and J. Huang. “Frontal sound localization with head-phone systems using characteristics of each device,” In Proc. 143 Audio Eng. Soc. Int. Conv., Oct. 2017.

2018

[1] J. Villegas, “Improving perceived elevation accuracy in sound reproduced via a loudspeaker ring by means of equalizing filters and side loudspeaker grouping,” Acoustical Science and Technology, vol. 40, pp. 127–137, Mar 2019. DOI: 10.1250/ast.40.127.
[2] J. Villegas, “Movement perception of Risset tones presented diotically,” Acoustical Science and Technology, 2019. (in Press).
[3] J. Villegas and N. Fukasawa, “Doppler illusion prevails over Pratt effect in Risset tones,” Perception, vol. 47, no. 12, pp. 1179–1195, 2018. Doi: 10.1177/0301006618807338.
[4] J. Villegas, “Association of frequency changes with perceived horizontal and vertical movement,” in Proc. Int. Symp. on Universal Acoustical Communication, Oct. 2018.
[5] C. Arevalo, G. Sarria, and J. Villegas, “Accurate spatialization of vr sound sources in the near field,” in Audio Eng. Soc. Int. Conf. on Spatial Reproduction-Aesthetics and Science, 2018.

Libraries

SAUND