Music annotation with augmented MFCC

Ni, Chenying (倪晨穎)

Please use this identifier to cite or link to this item: http://dspace.cityu.edu.hk/handle/2031/7150

Full metadata record

DC Field	Value	Language
dc.contributor.author	Ni, Chenying (倪晨穎)	en_US
dc.date.accessioned	2014-03-17T03:57:27Z
dc.date.accessioned	2017-09-19T08:26:51Z
dc.date.accessioned	2019-01-22T03:40:34Z	-
dc.date.available	2014-03-17T03:57:27Z
dc.date.available	2017-09-19T08:26:51Z
dc.date.available	2019-01-22T03:40:34Z	-
dc.date.issued	2013	en_US
dc.identifier.citation	Ni, C. (2013). Music annotation with augmented MFCC (Outstanding Academic Papers by Students (OAPS)). Retrieved from City University of Hong Kong, CityU Institutional Repository.	en_US
dc.identifier.other	cs2013-4514-ncy904	en_US
dc.identifier.uri	http://144.214.8.231/handle/2031/7150	-
dc.description.abstract	MFCC is a widely adopted feature in speech analysis. Research has proved its adaptability in music information retrieval systems. Recently, it has been proposed that an augmented Mel-Frequency Cepstral Coe cients (MFCCs) feature which is invariant to music keys can improve results for the task of music genre classi cation. Based on this nding, the goal of this project is to further determine whether this nding is applicable to a large music annotation task. To achieve our goal, we implement an audio annotation system that can tag audio tracks with meaningful words, and experiment with key augmented audio features. Particularly we are interested in nding a key invariant feature to t in the system. We adopt the Computer Audition Lab 500-Song (CAL500) dataset containing 502 popular Western songs and each with a list of human annotated tags. We build two classi ers for annotation. In our case, classi ers are called tag models. They de ne how audio features are gathered from training tracks to form tag models. Based on Gaussian Mixture Model, we learn each track-level distribution by running Expectation-Maximization (EM) algorithm. Our tag models are built on those tracklevel distributions. Firstly, we adopt Model Averaging on original CAL500 dataset. This tag model takes weighted average over related audio tracks. Although Model Averaging model produces good results, however it is computationally expensive. So we choose Mixture Hierarchical Model as an alternative to do the augmentation experiments. Instead of naive averaging, it computes a word-level distribution based on the related track-level distributions by running hierarchical EM (HEM) algorithm and produces xed small number of Gaussian Mixture components. To further nd out how MFCC, a commonly used short term low level feature, can be made invariant to musical key and how this key augmented feature will perform for the annotation task, we use two di erent techniques to do feature augmentation. One is using Audacity to generate key shifted songs and the other one is by changing Mel band lters during the computation of MFCC. Traditionally in Audactiy key shifted model, we need to output the dataset in sound wave i.e. mp3/wav format for each key shift. With our novel approach of MFCC Mel band lters changing key simulation, we get the key shifted features without producing extra sound track les. We compare the augmentation results for each individual approach under four models depending on the training or testing phrases where augmented dataset is used. Namely these four models are Original, Augmented Training, Augmented Testing and Augmented Both (Training and Testing). With the results from feature augmentations, we analyze each model in terms of precision and recall for annotation and area under characteristic curve for retrieval. From our experiments, we nd that key augmented MFCC performs better for some certain categories. However, there is still, some categories such as Instrument which are better predicted using the original dataset.
dc.rights	This work is protected by copyright. Reproduction or distribution of the work in any format is prohibited without written permission of the copyright owner.	en_US
dc.rights	Access is unrestricted.	en_US
dc.subject	Musical notation -- Data processing.
dc.title	Music annotation with augmented MFCC	en_US
dc.contributor.department	Department of Computer Science	en_US
dc.description.course	CS4514 Project	en_US
dc.description.instructor	Dr. Chan, Antoni Bert	en_US
dc.description.programme	Bachelor of Science (Honours) in Computer Science	en_US
Appears in Collections:	OAPS - Dept. of Computer Science

Files in This Item:

File	Size	Format
fulltext.html	154 B	HTML	View/Open

Show simple item record