|Home||Papers||Projects||Code Fragments||Dissertations||Presentations||Posters||Reports||Proposals||Lectures given||Course notes|
Designing Music Therapy: Developing Algorithms to Extract Emotion from Music
Werner Van Belle1* - firstname.lastname@example.org, email@example.com
Abstract : The effect of music as a psychotherapeutic tool has been recognized for a long time. Alone, or in combination with classical treatment, music can alleviate depression, stress and anxiety, as well as acute and chronic pain. Such beneficial effects are likely to derive from its ability to induce mood changes. However, it remains unclear which aspects of music can cause emotional changes. This project aims to link advanced audio signal processing techniques to empirical psychoacoustic testing to develop algorithms that automatically retrieve emotions associated with a particular piece of music. Such algorithms could then be used to select and develop musical pieces for therapeutic purposes.
audio content extraction, psycho acoustics, BPM, tempo, rhythm, composition, echo, spectrum, sound color, music therapy, psychoacoustics
Major depressive disorder and dysthymic disorder are two costly diseases for society. Epidemiological studies show that, in modern societies and in any given year, 5 to 10% adults may suffer from a severe unipolar pattern of depression, while 2.5 to 5.4% would suffer from a less disabling dysthymic disorder [Com99]. It is estimated that clinical depression is experienced by 50% of all people who suffered a stroke, 17% of those who have had a heart attack, 30% of cancer patients and between 9 and 27% of people with diabetes [Sim96].
Another dramatic health problem are neuro-degenerative diseases among the elderly population. Interventions that could delay disease onset even modestly can have a major public health impact [ASK+05]. Thus, depression and dementia constitute two of the most significant mental health issues for nursing home residents. [SSRB03]
Music is not and could not be the cure to the above disorders but it does have the ability to influence the central nervous system and, specifically, it can be used as a mood induction procedure. Indeed, music is often used as an independent therapeutic nursing intervention, and it is has been used in the psychiatric setting, for the 'treatment' of chemical dependency and in intensive care units of hospitals. Clearly, it can have a soothing and relaxing effect and can enhance well-being by reducing anxiety, enhancing sleep, and by distracting a patient from agitation, aggression, and depression states. These positive aspects of music have led to the use of 'music therapy' as an aid in the everyday care of patients in, for example, nursing homes [Cur86,SZ89]. Compared to other forms of patients care, the costs are low, there are no clear negative side effects, and music can help in improving the level of patient's satisfaction. Furthermore music can be easily applied to the care of many people, since its use is not constrained to a specific environment and circumstances and can be used within the hospital as at home. Music has also been shown to improve the symptoms in neuro-degenerative diseases and may contribute in palliative care [Mys05,GTYG01]. Similarly, patients' reports of chronic pains can also be reduced through appropriate music.
Music appears to elicit emotional responses in the listeners. A survey [Tha95] of more than 300 participants ranked listening to music as the second best method to get out of a bad mood (i.e., feelings of tension or low energy). This power of emotion to both stir and quite emotions is clearly also at the base of its use as a therapeutic tool, which in combination with actual medical treatment might synergistically enhance the effect of the whole treatment [Cur86,SZ89].
Some researchers argue that music works by conditioned response , but attributing its effects only to a learning context may be incorrect since music is a human universal and every known human society is characterized by making music from their very beginnings. Although different cultures may differ in how they make music and may have developed different instruments and vocal techniques, they all seem to perceive music in a very similar way. That is, there may be considerable agreement in what emotions people associate with a specific musical performance [CFB04,DeB01,Jus01,CBB00,Mak00,SW99,Kiv89,Boz85,Cro84,MB82,Ber74,LvdGP66,Coo59]. In general, there seems to be very little research on which aspects of music induce a specific emotion and which underlying musical dimensions give music its structure and meaning.
This research tries to understand which aspects of music induce specific emotions and which underlying musical dimensions provide structure and meaning to the listener. We are specifically concerned with measuring in a mathematical way the basic emotional content in music. Empirical participant testing will associate sound fragments with emotions while various combinations of signal processing techniques will provide automated sound analysis. Multivariate statistical methods will measure the applicability of the various signal processing techniques.
The possible emotions contained within music are often assumed to be simple basic emotions. Once we can find those back, one might expect that combinations of primary emotions might lead to more complex emotional patterns [LeD96,OT90,Plu80].
Music therapy is believed to help alleviate many different symptoms. Among those depression, anxiety, insomnia and chronic pain.
Depression - Depression and dementia remain two of the most significant mental health issues for nursing home residents [SSRB03]. There is now a growing interest in the therapeutic use of music in nursing homes. A widely shared conclusion is that music can supplement medical treatment and has a clear potential for improving nursing homes' care. Music also seems to improve major depression [JF99].
Pregnancy - Experiments on the effects of music on intrusive and withdrawn mothers show a decrease in cortisol levels of the intrusive mothers. Concomitantly, their state anxiety inventory levels decreased, while the profile of mood states (POMS) depressed mood levels decreased significantly [TFHR+03].
Anxiety - It would seem that, in general, affective processes are critical to understanding and promoting lasting therapeutic change. Results by [KWM01] indicate that music-assisted reframing is more efficacious than typical reframing interventions in reducing anxiety. One study of patients with animal phobias showed that when patients were exposed (from a distance) to the animals, those who simultaneously listened to music they preferred got over their phobias more readily than the group treated in the silence condition [ea88].
Insomnia - Music improves sleep quality in older adults. Using the Pittsburgh Sleep Quality Index and Epworth Sleepiness Scale [LG05] showed that music significantly improves sleep quality.
Pain reduction - Music therapy seems an efficient treatment for different forms of chronic pain, including fibromyalgia, myofascial pain syndromes, polyarthritis [MBH97]; chronic headaches [RSV01] and chronic low back pain [GCP+05]. Music seems to affect especially the communicative and emotional dimension of chronic pain [MBH97]. Sound induced trance also enables patients to distract them from their condition and it may result in pain relief 6-12 month later [RSV01].
The general impact of music on the nervous system extends to the immune system. Research by [HO03] indicates that listening to music after a stressful task increases norepinephrine levels. This is in agreement with [BBF+01], who verified the immunological impact of drum circles. Drum circles have been part of healing rituals in many cultures throughout the world since antiquity. Composite drumming directs the immune system away from classical stress and results in increased dehydroepiandrosterone-to-cortisol ratios, natural killer cell activity and lymphokine-activated killer cell activity without alteration in plasma interleukin 2 or interferon-gamma. One area of application for these effects could be cancer treatment. Autologous stem cell transplantation, a common treatment for hematologic malignancies, causes significant psychological distress due to its effect on the immune system. A study by [CVM03] reveals that music therapy reduces mood disturbance in such patients. The fact that music can be used as a mood induction procedure, with the required physiological effects can make its use relevant for pharmaceutical companies [RA04]. Positive benefits of music therapy have also been observed in Multiple Sclerosis patients, which is an auto-immune disease [SA04]. As a last note, many researchers observed the cumulative effect of music therapy [HL04,LG05].
From an engineering point of view, psychoacoustics has been very important in the design of compression algorithms such as MPEG [BBQ+96,HBEG95]. A large variety of work exists on psycho-acoustic 'masking'. Removing information that humans do not hear, improves the compression quality of the codec without audible content loss [HJ96,Hel72]. Frequency masking, temporal masking and volume masking are all important tools for current day coders and decoders [Dim05]. Also, the application of correct filter-banks, respecting the sensitivity of human ears is another crucial factor. Notwithstanding the large variety and great quality of existing work on this matter, there is little relation between understanding what can be thrown away (masking) and what, when kept, leads humans to associate specific feelings with music. In this proposal we aim to extract emotions associated with music, without a need to compress the signal.
Musical emotions can be characterized very much the same way as the basic human emotions. The happy and sad emotional tones are among the most commonly reported in music and these basic emotions may be expressed, across musical styles and traditions, by similar structural features. Among these, pitch and rhythm seem basic features, which research on infants has shown to be structural features that we are able to perceive already early in life [HJ05]. Recently, it has been proposed that at least some portion of the emotional content of a musical piece is due to the close relationship between vocal expression of emotions (either as used in speech, e.g. a sad tone, or in non-verbal expressions, e.g., crying). In other words, the accuracy with which specific emotions can be communicated is dependent on specific patterns of acoustic cues [JL03]. This account can explain why music is considered expressive of certain emotions and it can also be related to evolutionary perspectives on the vocal expression of emotions. More specifically, some of the relevant acoustic cues that pertain to both domains (music and verbal communication, respectively) are: speech rate/tempo, voice intensity/sound level, and high-frequency energy). Speech rate/tempo may be most related to basic emotions as anger and happiness (when they increase) or sadness and tenderness (when they decrease). Similarly, the high-frequency energy also plays a role in anger and happiness (when it increases) and sadness and tenderness (when it decreases). Different combinations and/or levels of these basic acoustic cues could result in several specific emotions. For example, fear may be associated in speech or song with low voice intensity and little high-frequency energy, but panic expressed by increasing both intensity and energy.
Juslin & Laukka [JL03] have proposed the following dimensions of music (speech has also a correlate for each of these) as those that define the emotional structure and expression of music performance: Pitch or F0 (i.e., the lowest cycle component of a waveform) F0 contour or intonation, vibrato, intensity or loudness, attack or the rapidity of tone onset, tempo or the velocity of music, articulation or the proportion of sound-to-silence, timing or rhythm variation, timbre or high-frequency energy or an instrument/singers formant.
Certainly, the above is not an exhaustive list of all the relevant dimensions of music or of the relevant dimensions of emotional content in music. Additional dimensions might include echo, harmonics, melody and low frequency oscillations. We briefly describe these additional dimensions
Sound engineers recognize that echo, delay and spatial positioning [WB89,J.83,K.87,AJ96] influence the feeling of a sound production. Short echo of high frequencies and long delays makes the sound 'cold' (e.g., concrete wall room), while a short echo without delay, preserving the middle frequencies, makes the sound 'warm'. No echo makes the sound clinical and unnatural. Research into room acoustics [BK92,D.95,PM86,Y.85,LA62] show that digital techniques have great difficulties simulating the correct 'feeling' of natural rooms, illustrating the importance of time-correlation as a factor.
Harmonics refers to the interplay between a tone and integer multiples of that tone. A note's frequency together with all its harmonics forms the characteristic waveform of an instrument. Artists often select instruments based on the 'emotion' captured by an instrument. (e.g., clarinet typically plays 'funny' fragments while cello plays 'sad' fragments.
Melody & pitch intervals. Key, scale and chords could also influence the feeling of a song. The terms 'major' and 'minor' chords already illustrate this, but also classical Indian musicology suggests this [OL03].
The proposed project will gather psychoacoustic information using human participants in an experimental laboratory setting where their emotional response to various sounds will be measured. The project relies on three tasks. i) initial screening, ii) a designed experiment and iii) development of signal processing modules. In the first phase we screen different kinds of music and test their relevance to this study. This phase relies on open questionnaires and free-form answers. In the second phase we use an engineered collection of songs and acquire statistical relevant information by using closed questions to the participant group. In parallel with data acquisition we will further develop signal processing modules that measure specific song properties.
Finally, we will measure the participants' physiological reactions by the method of pupillometry. Given that the pupil of the eye is also controlled by the autonomic system [Loe93], then monitoring changes of pupil diameter can provide a window onto the emotional state of an individual. Previous research has established that variations in pupil size occur in response of stimuli of interest to the individual [Dab97,HP60], emotional states [KB66,LF96,PS03], as well as increased cognitive load [KB66,KB67,Pra70,HP63] or 'cognitive dissonance' [PL05]. Mudd, Conway, and Schindler [MCS90] already used pupillometry to index listeners' aesthetic appreciations of music.
The stimuli presented to the listeners (N=200) will be short (a couple of measures) and capture the song essence. All sounds will be presented to participants using stereo headphones. Participants will be asked to rate on a 7-step-scale how well a particular emotion describes the music/sounds heard through the headphones. Responses and latency will be recorded with mouse clicks from participants on a screen display implemented in an extension of BpmDj. All experiments will use within-participants designs (unless otherwise indicated). Analysis of variance will be used to test for the interaction between factors in the experiments. Statistical tools as principal component analysis or cluster analysis will be applied to the data set obtained from the participants, so as to reveal the underlying structure or feature used by the participants in their attributions of emotional content to music.
will be performed by means of the Remote Eye Tracking Device, R.E.D., built by SMI-SensoMotoric Instruments in Teltow (Germany). Analyzes of recordings will be computed by use of the iView software, also developed by SMI. The R.E.D. II can operate at a distance of 0.5-1-5 m and the recording eye-tracking sample rate is 50 Hz., with resolution better than 0.1 degree. The eye-tracking device operates by determining the positions of two elements of the eye: The pupil and the corneal reflection. The sensor is an infrared light sensitive video camera typically centered on the left eye of the participant. The coordinates of all the boundary points are fed to a computer that, in turn, determines the centroids of the two elements. The vectorial difference between the two centroids is the "raw" computed eye position. Pupil diameters are expressed in number of video-pixels of the horizontal and vertical diameter of the ellipsoid projected onto the video image by the eye pupil at every 20 ms sample.
The open source software BpmDj [Bel05a] (Figure 2) analyzes and stores of a large number of soundtracks. Werner Van Belle developed the program from 2000 until now under the form of a hobby project. It contains advanced algorithms to measure spectrum, tempo, rhythm, composition and echo characteristics.
Tempo module - Five different tempo measurement techniques are available of which autocorrelation [OS89] and ray-shooting [Bel04] are most appropriate. Other existing techniques include [YKT+95,UH03,SD98,GM97]. All analyzers in BpmDj make use of the Bark psychoacoustic [GEH98,KSP96,EH99] scale. The spectrum or sound color is visualized as a 3 channel color (red/green/blue) based on a Karhunen-Loéve transform [C.M95] of the available songs.
Echo/delay Modules - Measuring the echo characteristics is based on a distribution analysis of the frequency content of the music and then enhancing it using a differential autocorrelation [Bel05d]. Fig. 3 shows the echo characteristic of a song.
Rhythm/composition modules - Rhythm and composition properties rely on correct tempo information. Rhythm relies on cross correlation and overlaying of all available measures. Composition relies on measuring the probability of a composition change after measures. Fig. 3 present a rhythm and composition pattern.
From an end-user point of view the program supports distributed analysis, automatic mixing of music, distance metrics for all analyzers as well as clustering and automatic classification based on this information. Everything is tied together in a Qt [htt] based user interface. BpmDj will be used as a basic platform in which new modules will be plugged in.
Not only can music elicit emotional response, many observations indicate that ones mental state influences music preference and thus symptomatically reveals mental aspects of the listener. [LW96,BGL02,PBP04,San04]. This observation will be used in the initial screening phase during which we will ask participants to rate music they are presented with as well as list music they prefer. This in combination with an analysis of the participants mental state will provide input into an experimental design (section 4.3).
In the initial screening phase we rely upon an open questionnaire and a random selection of songs. We are interested to learn which songs are useful for further statistical analysis and explore the parameters that should be taken into account. For every song we are interested in the 1) reported emotion, 2) the strength of the reported emotion, 3) why the participant perceives a specific emotion.
The following information is currently believed to be important. a) Level of schooling - artist working with music might have a better feeling towards music than unschooled people. b) Semantic Information - what is the impact of semantic information, such as recognizing the artist or song ? c) Emotional Reporting - how well are human participants able to report emotions ? What is the impact of their initial mood to the reports [FMN+98] ? Does emotion strength limit the information we might obtain or not ? What is the relevance of response-time. Is a short response time better suitable for consistent information, or is a long response time -allowing time to verbalize- more informative. d) Memory - what is the impact of presentation sequence. Does information from one song carry over to the next ? Is there a lag effect in reported emotions ? Exists a correlation between the accuracy of answers and the time the participant has been listening to songs ?
We will rely on normal participants (equally distributed male/female) drawn from the student population. Before and after the test participants will be asked to list a number of their favorite songs and assess their mental state using a merge of different questionaries, including, but not necessary limited to the Pittsburgh sleep quality index, the Epworth sleepiness scale, the beck depression inventory II, the beck anxiety inventory, profile of mood states [MLD71], state anxiety inventory [KBR98] and Plutchik's emotions profile index [Plu74].
The result of the initial screening will consist of a) a list of songs related to specific emotions and b) a list of suspected parameters for emotional content retrieval.
After initial screening we will design a set of experiments to gather reliable information. Design of Experiments allows one to vary all relevant factors systematically. When the results of the experiments are analyzed, they identify the factors that most influence the results, as well as details such as the existence of interactions and synergies between factors [Box54,Gir92].
The design variables encompasses a set of measured song properties (see section 4.4). Tentatively, we believe echo, tempo, harmonics and key measures will be a subset of the design variables. The response variables encompasses the emotional response, quantified using pupillometry. If appropriate, we might consider decomposing emotional responses into primary emotions [LeD96,OT90,Plu80]. The experiment itself requires the construction of a closed questionnaire (targeting the response variables) and a selection of songs (providing the design variables).
After performing the experiment, multivariate analysis will reveal the relation between the design variables and the response variables. The design of experiment should be performed very carefully, taking into account all factors determined in the initial task. Especially testing sequence and participant endurance should be accounted for.
Measuring subjective responses to music provides only part of the information necessary to understand the relation between audio content and emotional perception.
Signal processing techniques, developed in this task, will enable us to link subjective perception to quantitative measured properties. BpmDj currently supports measurement of tempo, echo, rhythm and compositional information. We will extend BpmDj with new modules, able to capture information relevant to the emotional aspects of music. Below we describe the modules we will develop.
The key/scale module will measure the occurrence of chords by measuring individual notes. To provide information on the scale (equitemporal, pure major, pure minor, Arabic, Pythagorean, Werkmeister, Kirnberger, slendro, pelog and others) this module will also measure the tuning/detuning of the different notes. The dynamic module will measure energy changes at different frequencies. First order energy changes provide attack and decay parameters of the song. Second order energy changes might provide information on song temperament. The harmonic module will inter-relate different frequencies in a song by investigating the probability that specific frequencies occur together. A Bayesian classification of the time based frequency and phase content will determine different classes. Every class will describe which attributes (frequencies and phases) belong together, thereby providing a characteristic sound or waveform of the music. This classification will allow us to correlate harmonic relations to the perception of music. Autoclass [SC95,CS96] will perform the Bayesian classification. The melody module will rely on a similar technique by measuring relations between notes in time.
All of the above modules need to decompose a song into its frequency content. To this end, we will initially make use of a sliding window Fourier transform [OS89]. Later in the project, integration of a multi-rate filterbank will achieve more accurate decomposition. Existing filterbanks include octave filter banks [J. 93] and the psycho-acoustic MP3 filterbank [BBQ+96], which will even allow us to work immediately with MP3 frames. We will also investigate wavelet based filterbanks [Kai96] because then we can experiment with different wavelet bases, accurately capturing the required harmonics. The preliminary analysis might indicate the need for different and/or extra signal processing modules. We believe that the two basic technologies supporting this work package (Bayesian classification and multi-rate signal decomposition) should make it easy to realize new modules quickly.
The strategic relevance of this proposal to the HELSEOMSORG program can be found in its ability to improve the design of musical therapy. If a strategic use can alleviate symptoms of pain, depression, instability, anxiety and other symptoms, this will increase the quality of life for many people in nursing homes, in psychiatric institutions and the elderly confined at home. Understanding which musical aspects lead to an emotional response might lead to creation of efficient playlists and a more scientific way of assessing and selecting songs. Depending on the results of the presented work we might be able to present recommendations to different patients on what kind of music might be suitable to them. Creation of typical 'likes-a-lot' and 'should-listen-to' playlists per emotional state might enhance the psychotherapists toolbox. The presented research might also give input into the required ambience in hospitals in areas for birth-giving and palliative areas.
Publication - The research will primarily be of interest to the international research communities in cognitive science, computer science and nursing. Articles based on the proposed ground research will be submitted to top-end journals in cognitive science, such as Music Perception. In addition, findings from this research will be presented at international conferences.
Artefacts - produced in this project will be open sourced in order to attract international attention from computer scientist working in the field of content extraction. BpmDj and Cryosleep are both online and with over respectively 390 and 688 unique monthly downloads, excluding search robots, we are able to reach many researchers at low cost.
All participants in the experiments will participate on a voluntary basis and after written informed consent. They will be informed that they can interrupt the procedure at any time, without having to give a reason for it and at no costs for withdrawing. In none of the experiments will sensitive personal information or names or other characteristics that might identify the participant be recorded. All participants will be thoroughly debriefed after the experiment.
The research program itself is multidisciplinary. It include psychology, statistics, computer science and signal processing. The computational processes underlying music and the emotions are a little investigated topic and interdisciplinary collaborations on this topic are rare.
Prof. Bruno Laeng - has a 100% academic position (50% research appointment) in the biologisk psykologi division of the Department of Psychology. Recent quality evaluations from Norges Forksningsråd show that the division of biologisk psykologi at Universitetet i Tromsø (UiTø) has received one of the highest level of evaluation within UiTø from the examining committee (i.e., very good). Moreover, this applicants was awarded in the year 2000 the Pris til yngre forsker from UiTø.
Dr. Werner Van Belle - currently works at Norut IT. In his spare time he is passionate about digital signal processing for audio applications. Of particular relevance for this proposal is his work on mood induction [Bel05b] and sound analysis [Bel05a,Bel05c,Bel05d,Bel04].
The research is supported by 'het Weyerke', a Belgian service center/nursing home for mentally handicapped and elderly. They are mainly interested in music as a stimulation and soothing mechanism to alleviate stress and depressive symptoms from dementing elderly. Their long standing tradition in this matter will provide input into our study. Furthermore, the presented international cooperation might allow the exchange of key scientists and know how.