MIR/Similarity estimation/SOMs

The system implements functionality for signal processing and low-level feature extraction, statistical modeling using Hidden Markov Models (HMM), and clustering and visualization using a Self- Organizing Map (SOM). The feature extraction framework (FEF) combines these technologies into a ready-to-use package for the scripting language Python.



Segmentation of audio data is accomplished using block iterators, that yield consecutive segments from a file. The length of the segments and their displacement relative to the beginning may be specified by user. Furthermore, onset detection based on local amplitude detection, Spectral Flux, and Entropy is provided.


The framework provides extractors for the following audio features:

Time domain features include auto-correlation, the Correlogram, n-dimensional delay embedding, and fractal dimension (Bader 2013).

In the frequency domain, the FEF provides Spectral Centroid, -Spread, -Skewness, and -Kurtosis per bin of the Short Time Fourier Transform. Additionally, different filters and filter banks are implemented, as well as Mel Frequency Cepstral Coefficient.

Furthermore, the FEF provides extractors for perceptual features based on the critical band rate scale. These include Specific Loudness, Sharpness, and Roughness.

The Hidden Markov Model is utilized to aggregate the mentioned low-level features (Blass & Bader 2019) to perceptual features of musical rhythm and timbre. The system currently implements the Forward-Backward-, Baum-Welch-, and Viterbi algorithms for Poisson-distributed HMMs. Additionally, it provides annotated heat maps, and network graphs for HMM inspection and visualization. Scoring is implemented via minimum-log-likelihood, Akaike Information Criterion, and Bayesian Information Criterion.

The Self-Organizing Map sub-module implements routines for incremental and batch training as well as quality checking via Quantization Error and Topographic Error. Map visualization is performed using two- and three-dimensional U-Matrix (Blass & Bader 2019), Feature Map display, k-Nearest Neighbor clustering, and dendrogram.

Link to Framework documentation

Link to the UHH Ethnographic Sound Recordings Archive



Bader, R. (ed.): Compuational Phonogram Archiving. Springer Series Current Research in Systematic Musicology Vol. 5, 2019.

Bader, R.: Computational Music Archiving as Physical Culture Theory. In: R. Bader (ed.): Computational Phonogram Archiving, Springer Series ‘Current Research in Systematic Musicology’, Vol. 5, 3-36, 2019.

Blaß, M. & Bader, R.: Content Based Music Retrieval and Visualization System for Ethnomusicological Music Archives. In: R. Bader (ed.): Computational Phonogram Archiving, Springer Series ‘Current Research in Systematic Musicology’, Vol. 5, 145-174, 2019.

Bader, R..: Nonlinearities and Synchronization in Musical Acoustics and Music Psychology. Springer Series Current Research in Systematic Musicology, Vol. 2, Springer Heidelberg , 2013.


1st May