Maximilian Renz

Maximilian Renz

Master's Thesis

Advisors
Ann-Kristin Seifer, Robert RicherProf. Dr. B. Eskofier

Abstract

Depression is a prevalent clinical condition, affecting over 280 million individuals globally, including 5% of the adult population [1]. For those affected, the condition represents a considerable burden, entailing a loss of quality of life and a reduction in life expectancy [2]. Early detection is crucial, as timely intervention can significantly enhance outcomes [3]. The duration of untreated depression plays a critical role in recovery, with earlier treatment being associated with higher remission rates [4], ultimately improving mental and social functioning [5].

Depression is usually diagnosed by a trained professional (e.g. physicians, psychiatrists, psychologists) using structured interviews and standardized questionnaires such as the Hamilton Rating Scale for Depression (HAM-D) [6], the Beck Depression Inventory-II (BDI-II) [7], and the Patient Health Questionnaire (PHQ) [8]. However, these assessments can only be performed when patients seek medical care from these professionals.

There is an emerging field of research investigating alternative approaches for depression detection by assessing the potential of objective digital biomarkers, such as voice, activity, gait, sleep time, and many more [9]. A frequently employed biomarker for the detection of depression is voice [10,11], given that individuals with depression exhibit notable alterations in their speech patterns. These include, among others, a reduction in intensity, a decrease in fundamental frequency deviation (FFD) or pitch range, and a slowing of speech rate [12]. The application of machine learning has enabled the development of numerous automated speech analysis techniques that use these paralinguistic features [13]. Another objective digital biomarker that can be indicative of depression is gait, including a slumped posture, impaired dynamic balance, and reduced gait speed [14], and other generic and expert motion features, which have already been shown to be promising for assessing the influence of stress on body posture and movement [15].

Data from these digital biomarkers can be obtained in a number of ways. In addition to external or ambient sensors, which are fixed in the environment, wearable sensing devices are often used, for instance, inertial sensors, smartphones, or smartwatches [16]. While it has been shown that voice analysis, as well as motion and gait patterns, can be used to detect depression, it has not been analyzed whether the combination of these modalities can be used to enhance depression detection. Furthermore, the majority of existing approaches are not suitable for everyday use, as they require the user to wear one or more sensors. Therefore, a hearing aid, a wearable device on the head, has the advantage of recording voice, gait, and other movements directly, as it inherently integrates a motion sensor as well as a microphone sensor. While earables in general are gaining popularity [18, 21], the combination of sensors on the ear (or head) to detect depression represents a novel approach that has not yet been documented in the literature.

The objective of this thesis is to develop an algorithm that can predict depression based on the combined analysis of voice, gait, and motion from hearing aid sensors. This entails data acquisition, feature extraction, and the development and evaluation of various machine and deep learning approaches. The data will be acquired through the utilization of hearing aid-integrated sensors, specifically the inertial sensor and the microphone. A study will be planned and conducted to collect data from 20 participants (10 healthy and 10 depressed individuals). The PHQ-8 will be used to assess the presence and severity of depression. To obtain voice data, a semi-structured interview will be conducted, which is recorded by the hearing aid microphones worn by the participants. To gather data related to the person’s gait, a 6-minute walking test will be conducted. Additionally, head movements will be recorded in different situations to be able to gain insight into the participants’ activity and movement. All tests will be conducted in a laboratory setting.

To predict the depression score, different algorithmic approaches will be implemented and evaluated. Following the extraction of features from the gait and motion data, different classifiers are evaluated with respect to their classification performance of distinguishing healthy from depressed individuals. In the first approach, voice features will be extracted by using established frameworks such as openSMILE [17] as well as gait and motion features, e.g. by using EarGait [18] and tsfresh [19]. Subsequently, a classification is performed. The second method differs primarily in the processing of voice. In this case, wav2vec 2.0, a present deep learning model for emotion recognition [20], is used for depression detection. Prior to training, the model is pre-trained with speech data from other datasets. Following, fine-tuning is done by using speech recorded by hearing aid microphones. Different information fusion techniques will be applied to find the best-performing depression classification method.