Prospective (MSc and PhD) students interested in any of the (current or past) topics described here or other related topics may email me.

Call for applications for CMU-Portugal dual-degree Ph.D. scholarships.

Alunos de mestrado/doutoramento interessados nalgum dos temas descritos aqui ou em temas semelhantes e para informação sobre BOLSAS, podem contactar-me por email.

Encontra-se aberto concurso para uma bolsa de doutoramento dual-degree do programa CMU-Portugal.

Current Projects

Serious games and applications for health and education

We are currently working on applications for health, such as serious games and toolsets that explore HCI techniques and bio-feedback for speech therapy. Our goal is to leverage on speech and facial expression recognition combined with gaming in order to improve the effectiveness of speech therapy processes, as developed by healthcare specialists. This work is being developed with the collaboration of researchers from Carnegie Mellon University, INESC-ID, Escola Superior de Saúde de Alcoitão and VoiceInteraction. More details in the BioVisualSpeech project page.


We are also currently working on serious games and applications for education for blind students, such as educational computer games, orientation and mobility computer games that use spatialized sound, and a molecular editor with spatialized sound. Some of the results appear in Ferreira and Cavaco (FIE 2014) and Simões and Cavaco (ACE 2014) (which won the Bronze poster award).

Sound synthesis
We are developing statistical methods for modeling and synthesizing sounds with both sinusoidal and attack transient components. We use multivariate decomposition techniques to learn the intrinsic structures that characterize the sound samples. Afterwards these structures are used to synthesize new sounds which can be drawn from the distribution of the real original sound samples. Some of the results appear in Cavaco (SMC 2012).

We have also been working on the synthesis of spatialized sound (2D or 3D) for applications for the blind.

Modeling harmonic and percussion instruments
Percussion instruments

Most musical instrument classifiers focus on distinguishing different harmonic instruments such as the violin and the flute, whose sounds have very different characteristics. On the other hand, much less attention has been given to percussion instruments, especially if we consider the discrimination of instruments of the same type, like the cymbals in a drum kit. We have been developing classifiers that are able to distinguish this latter type of instruments. In particular we have been working with cymbal sounds and we are interested in modeling, classification, transcription and synthesis of these sounds. Some of the results appear in Cavaco and Almeida (IWSSIP 2012).

Harmonic instruments

Apart from the work with percussion instruments we have also been developing models that describe sounds from harmonic instruments, such as the flute, piano and guitar. Some of the results appear in Malheiro and Cavaco (INForum 2011).

Intrinsic Structures of Impact Sounds
Models of sounds have proven useful in many fields, such as sound synthesis, sound recognition and identification of events or properties (like material or length) of the objects involved. However, developing such models is hard due to all the complexities of real sounds.

Natural sounds of the same type have a rich variability in their acoustic structure. For example, different impacts on the same rod can generate very different acoustic waveforms. In natural environments there is variability due to reverberation and background noise, but even when the sounds are recorded in anechoic conditions there is variability that is due to factors such as the slight variations in the impact force and location. (For instance, the figure on the left shows that, even though different impacts on the same rod have very similar spectra, the relative power and duration of the partials varies from one instance to the other. These differences cannot be explained by a simple variation in amplitude.) In spite of these variations, when the sounds are heard they are often perceived as almost identical, meaning that they have some common intrinsic structures.

We are developing data-driven methods for learning the intrinsic features that govern the acoustic structure of impact sounds. These methods require no a priori knowledge of the physics, dynamics and acoustics, and are used to create models of impact sounds that represent a rich variety of structure and variability in the sounds. For more details see Cavaco and Lewicki (JASA 2007).
Sound recognition
Environmental sound recognition systems are intended to distinguish different categories of sounds, where sounds from different categories usually have very different spectral and temporal characteristics. A typical example of such categories is: door bells, waves, dog barking, whistle, footsteps, keyboard, etc. These sounds are not only produced by different types of objects but also by different types of events. We have been investigating the possibility of building sound recognizers (for environmental sounds and percussion instruments) that differ from the recognizers described above as they are intended to distinguish sounds produced by very similar objects and by the same type of event, such as impacts on metal rods (the image on the left shows that sounds from metal rods are separable) or sounds from a drum kit cymbals. Some of the results appear in Cavaco and Rodeia (ICISP 2010) and Cavaco and Almeida (IWSSIP 2012).

In the past, we have also worked on sound recognition for robots. More specifically, we have worked on the recognition of sounds from toys for Kismet (a robot from MIT AI lab).

Past Projects

Unveiling the world of color for the blind
We have developed a tool that converts color information from still images or video frames into sound. The tool converts the hue, saturation and value parameters into sound parameters that influence the perception of pitch, timbre and loudness. Our goal is to help visually impaired individuals to perceive characteristics of the environment that are usually not easily acquired without vision. The tool has been experimented by visually impaired individuals, who confirmed that it can be used to give them information about the range of colors present in the images, presence or absence of light sources as well the location and shape of the objects. Some of the results appear in Cavaco et al. (SeGAH 2013) (which won the SeGAH 2013 best paper award) and Cavaco et al. (HCist 2013). This project is described in more detail here.
Video annotation with audiovisual information
Due to the lack of annotation of their large video archives, multimedia content provider companies and television channels do not use the data in their archives to their full extent. In order to contribute with a solution to this problem, we have developed a tool that combines audio and visual information to annotate video. In particular, this tool has been used by a video production company that has given us positive feedback. The main innovation of this tool is the use of environmental sound recognition to annotate video. Some of the results appear in Cavaco et al (ICALIP 2012) and Mateus et al (ICMCS 2012).
Music genre classification
Since today's digital content development triggered the massive use of digital music, an indexing process is very important to guarantee a correct organization of huge databases. While many supervised automatic music genre classifiers have been proposed, these will always be dependent on a previous manual labeling of the data. Alternatively, an unsupervised approach would not have this dependency and would be able to determine the genre of the music samples only based on their audio features. We have been developing unsupervised techniques for music genre classification. Some of the results appear in Barreira, Cavaco and Ferreira da Silva (EPIA 2011). (The figure on the left shows a similarity matrix from 165 music titles and 11 different genres.)
Sound localization
Past projects include
• the localization of sound sources using auditory cues, such as interaural time differences and
• the acoustic detection of direction of motion, that is, detecting moving sound sources and the direction of motion using only information from the sounds they produce.
Some results appear in Cavaco and Hallam (CASA 99) and Cavaco and Hallam (IJNS 99).
Object detection with ultrasonic sensors: we developed a navigation controller for a mobile robot using an adaptive neural network and information from ultrasonic sensors.
Some past projects involved robots, like the navigation controller for a mobile robot mentioned above, localization of sound sources for a robotic cat head and the toys' sound identifier for Kismet.