For the application of acoustic emission methods in the field of condition monitoring, extracting meaningful information out of signals is an essential step in signal processing and mandatory for machine learning. However, when dealing with acoustic emission signals, the computational intensity of extracting features is becoming more important due to its large amount of data. The methodology in this study, hence, investigates established methods from descriptive statistics as typically used in acoustic emission and techniques commonly employed in speech recognition. Thereby, the objective is to improve the feature extraction of acoustic emission signals, emphasizing the synergies between the methods of both approaches. Extracting traditional acoustic emission features based on descriptive statistics alone usually yields robust features capturing the significant patterns within the time and frequency domain. However, traditional acoustic emission features may neglect more detailed information present in the signals due to averaging or mere evaluation of extreme values. To address this limitation, the methodology is extended by incorporating techniques from the field of speech recognition, such as applying window functions and digital filter banks to calculate the so-called cepstral coefficients. When comparing machine learning models that incorporate these cepstral coefficients alongside those that do not, this approach demonstrates that speech recognition techniques can result in more effective models. Here, the basic concept for speech recognition-based feature extraction and its implementation are presented, and necessary considerations and techniques are discussed, which are crucial for performant signal processing, particularly for real-time monitoring applications. The methodology is demonstrated, using acoustic emission signals derived from industrial-scale processes , such as e.g. milling, drilling or friction stir welding.