DL news
2007-12-03: DELOS Association established
The DELOS Association for Digital Libraries has been established in order to keep the "DELOS spirit" alive by promoting research activities in the field of digital libraries.
More info...
2007-06-08: Second Workshop on Foundations of Digital Libraries

The 2nd International Workshop on Foundations of Digital Libraries will be held in Budapest (Hungary) on 20 Septemeber 2007, in conjunction with the 11th European Conference on Research and Advanced Technologies for Digital Libraries (ECDL 2007).
Event website

DL Events
January 24-25, 2008 - Padova, Italy

4th Italian Research Conference on Digital Library Systems
Event website

December 5-7, 2007 - Pisa, Italy

Second DELOS Conference on Digital Libraries
Event website

Delos News as an
Home arrow Software Inventory - RPextract
PDF Print E-mail
RPextract Music Feature Extractor


Contact points:

Thomas Lidy ( ) and Andreas Rauber ( )

Technical University of Vienna, Austria

Technical Contact Points:

Thomas Lidy ( )

Type of software:


Descriptive Keywords:

Audio feature extraction, music descriptor, music genre classification, music similarity retrieval, Rhythm Patterns, Statistical Spectrum Descriptor, Rhythm Histograms

Potential Use and Applications:

Content-based access to audio files, particularly music, requires the development of feature extraction techniques that capture the acoustic characteristics of the signal. The RPextract music feature extractor extracts features of music files capturing aspects related to rhythm and timbre. This enables the computation of similarity between pieces of music, useful for a large range of potential Music Information Retrieval tasks, such as retrieval of similar music from an archive, identification of pieces of music, music genre recognition, artist detection, organization and classification of music libraries.

General Description:

Content-based access to audio files, particularly music, requires the development of feature extraction techniques that capture the acoustic characteristics of the signal, and that allow the computation of similarity between pieces of music. At TU Vienna - IFS three different sets of descriptors were developed:

- Statistical Spectrum Descriptors: describe fluctuations by statistical measures on critical frequency bands of a psycho-acoustically transformed Sonogram

- Rhythm Patterns: reflect the rhythmical structure in musical pieces by a matrix describing the amplitude of modulation on critical frequency bands for several modulation frequencies

- Rhythm Histograms: aggregate the energy of modulation for 60 different modulation frequencies and thus indicate general rhythmic in music


The algorithm considers psycho-acoustics in order to resemble the human auditory system. The feature extractor processes au, wav, mp3 and ogg files. Feature vectors are output in SOMLib format, an ASCII format containing descriptive headers.


Details about the algorithm are available from http://www.ifs.tuwien.ac.at/mir/audiofeatureextraction.html.

A usage guide is available at http://www.ifs.tuwien.ac.at/mir/howto_matlab_fe.html.

Details about the output format are at http://www.ifs.tuwien.ac.at/mir/howto_matlab_fe.html.

Technical description:

The feature extraction algorithm is as follows: in a pre-processing step the audio signal is converted to a mono signal and segmented into chunks of approximately 6 seconds. Typically, the first and last one or two segments are skipped and from the remaining segments every third one is processed. For each segment the spectrogram of the audio is computed using the short time Fast Fourier Transform (STFT). The Bark scale, a perceptual scale which groups frequencies to critical bands according to perceptive pitch regions, is applied to the spectrogram, aggregating it to 24 frequency bands. The Bark scale spectrogram is then transformed into the decibel scale. Further psycho-acoustic transformations are applied: Computation of the Phon scale incorporates equal loudness curves, which account for the different perception of loudness at different frequencies. Subsequently, the values are transformed into the unit Sone. The Sone scale relates to the Phon scale in the way that a doubling on the Sone scale sounds to the human ear like a doubling of the loudness. This results in a Bark-scale Sonogram - a representation that reflects the specific loudness sensation of the human auditory system. From this representation of perceived loudness statistical measures (mean, median, variance, skewness, kurtosis, min and max) are computed per critical band, in order to describe fluctuations within the bands extensively. The se result is a Statistical Spectrum Descriptor. In a further step, the varying energy on the critical bands of the Bark scale Sonogram is regarded as a modulation of the amplitude over time. Using a Fourier Transform, the spectrum of this modulation signal is retrieved. The result is a time-invariant signal that contains magnitudes of modulation per modulation frequency per critical band. This matrix represents a Rhythm Pattern, indicating occurrence of rhythm as vertical bars, but also describing smaller fluctuations on all frequency bands of the human auditory range. Subsequent to the Fourier Transform, modulation amplitudes are weighted according to a function of human sensation of modulation frequency, accentuating values around 4 Hz. The application of a gradient filter and Gaussian smoothing potentially improves similarity of Rhythm Patterns which is useful in classification and retrieval tasks. A Rhythm Histogram is constructed by aggregating the critical bands of the Rhythm Pattern (before weighting and smoothing), resulting in a histogram of rhythmic energy for 60 modulation frequencies. The feature vectors are computed for a piece of audio by taking the median of the descriptors of its segments.

Required User Skills:

· Edit options file with a Text Editor

· Minimal knowledge of Matlab (only to start algorithm)

· Alternatively, knowledge of how to start algorithm from a shell script

· Knowledge of how to process a csv data file

Pre-Requisites for Installation:

Windows or Linux

Matlab 6.1 or above, with Signal Processing Toolbox installed

For extracting MP3, OGG or FLAC files, additional binaries are required for decoding.

Conditions of use:

Available for research purposes. Please contact the authors.

DELOS Community


Remember me
Forgot your password?
Create new user
DELOS search
 DELOS site
 DELOS sites