University of Hamburg, 2019. — 170 p.
In this thesis, single-channel speech enhancement algorithms that either process the signal captured by a single microphone or the output of a spatial filtering algorithm are considered. The aim of this thesis is to increase the robustness of machine-learning (ML)-based and non-ML-based single-channel speech enhancement algorithms by exploiting synergies between both approaches. In conventional non-ML-based speech enhancement such as Wiener filtering based approaches, spectral gain functions are applied to the complex coefficients of the short-time Fourier spectra to enhance the noisy input signal. These gain functions are derived in a statistical framework where the clean speech and the noise Fourier coefficients are modeled using parametric probability density functions (PDFs). The parameters of the PDFs are estimated blindly from the noisy observation. Contrarily, ML-based algorithms use representative examples to learn the statistics of speech and noise which are then used for the enhancement. Often, ML-based approaches are motivated by the fact that conventional approaches are unable to follow highly non-stationary background noise types. However, it is still unclear how well ML-based approaches generalize unseen acoustic conditions.