Home » Syllabi » Sound and Audio Processing

Sound and Audio Processing

IT09 Sound and Audio Processing

    • Course period: First quarter
    • Course year: 4
    • Course type: Elective
    • Credits: 3
    • Coordinator: Julián Villegas
  • Instructors: Julián Villegas and Michael Cohen


The purpose of this course is two-fold: To learn some techniques used for extracting information from acoustic signals, and to use acoustic signals to display information. Hearing is the second most important sensory modality, and it is sometimes preferable to vision to display and acquire information. For example, a car navigation system delivers guidance using speech, or you verbally ask your mobile phone to dial some number. In this course, we briefly review the main characteristics of sound, audio, and their processing for human-computer interaction.


    • Students who pass this course are expected to be able to extract information from acoustic signals that can be used as input for other techniques.
    • These students are also expected to be able to use acoustic signals to explore big data.
  • Given some application constraints (real-time, computing power, etc.), students at the end of the term should be able to decide which of the presented techniques is best for extracting/displaying data using acoustic signals.







Course overview, make-up classes schedule, materials, evaluation, methodology, motivation


Physics of sound

Vibration and waves, Simple vibrating system, resonance, complex mass-spring systems, Modal behavior, Sound and voice as signals, Sound pressure, Sound pressure level, Sound power, Sound intensity, Computation with amplitude and level quantities


Sound waves and rooms

Spherical waves, Plane waves and the wave field in a tube, Reflection, absorption, and refraction, Scattering and diffraction, Reverberation, Sound pressure level in a room, Modal behavior of sound in a room, Computational modeling of closed space acoustics


Sound perception

Structure of the ear, Auditory nerve, Sound events vs. auditory events, Psychophysical functions, Masking, Bark, ERB, Greenwood Scales, Pitch


Sound perception (continuation)

Loudness, Timbre, Subjective duration, Perceptual organization of sound, Segregation of sound sources, Sound streaming and auditory scene analysis


Basic audio processing

Sounds as signals, Typical signals, Fundamental concepts of signal processing, Linear and time-invariant systems, Convolution, Signal transforms, Fourier analysis and synthesis, Spectrum analysis, Time-frequency representations


Basic audio processing (continuation)

Filter banks, Auto- and Cross-correlation, Digital Signal Processing (DSP), Sampling and signal conversion, Z transform



Filters as LTI systems, Digital filtering, Linear prediction, Adaptive filtering, FIR filters, IIR filters


Time-frequency processing

Basic techniques for time-frequency processing, Frame-based processing, Downsampled filter-bank processing, Modulation with tone sequences, Aliasing, Time-frequency transforms, Short-Time Fourier Transform (STFT), Alias-Free STFT, Modified Discrete Cosine Transform (MDCT)


Advanced processing

Discrete WFT and IWFT using an FFT filter bank, Pitch-scaling using the FFT filter bank, Time-scaling using the FFT filter bank, FFT filter bank as a channel vocoder


Wavelet and Cepstrum

Wavelet transform, Continuous wavelet transform, Discrete wavelet transform, Discrete-time wavelet transform(DTWT), DTWT/IDTWT realization with polyphase decomposition, DTWT/IDTWT with FIR lattice filters, Cepstrum, MFCCs


Speech technologies

Speech coding, Text-to-Speech Synthesis, Early knowledge-based Text-to-Speech (TTS) synthesis, Unit-selection synthesis, Statistical parametric synthesis, Speech recognition, Hidden Markov models



Auditory display and sonification, Sonification and auditory displays, Audification, Auditory icons, Parameter mapping sonification, Model-Based sonification, Applications, Statistical sonification for exploratory data analysis


Concepts of intelligent and learning systems

Low-level audio features, Segmentation and region features, Audio fingerprints, Tonal descriptors, Rhythm, Bottom-up extraction of descriptors from audio, Extracting higher-level musical patterns, Learning algorithms commonly used in music classification


    • V. Pulkki and M. Karjalainen, Communication acoustics: an introduction to speech, audio and psychoacoustics. John Wiley & Sons, 2015.
    • W. M. Hartmann, Signals, Sound, and Sensation. Modern acoustics and signal processing, Woodbury, NY; USA: American Institute of Physics, 1997.
    • T. Hermann, A. Hunt, and J. G. Neuhoff, The sonification handbook. Logos Verlag Berlin, 2011.
  • Various materials prepared by the instructors

Evaluation method

Exercises 60%
Final exam 40%

Referential sources