Please use this identifier to cite or link to this item:
http://dspace.cityu.edu.hk/handle/2031/7216
Full metadata record
DC Field | Value | Language |
---|---|---|
dc.contributor.author | Ni, Chenying | en_US |
dc.date.accessioned | 2014-04-28T09:56:49Z | |
dc.date.accessioned | 2017-09-19T08:50:49Z | |
dc.date.accessioned | 2019-02-12T06:53:01Z | - |
dc.date.available | 2014-04-28T09:56:49Z | |
dc.date.available | 2017-09-19T08:50:49Z | |
dc.date.available | 2019-02-12T06:53:01Z | - |
dc.date.issued | 2013 | en_US |
dc.identifier.other | 2013csnc904 | en_US |
dc.identifier.uri | http://144.214.8.231/handle/2031/7216 | - |
dc.description.abstract | MFCC is a widely adopted feature in speech analysis. Research has proved its adaptability in music information retrieval systems. Recently, it has been proposed that an augmented Mel-Frequency Cepstral Coefficients (MFCCs) feature which is invariant to music keys can improve results for the task of music genre classification. Based on this finding, the goal of this project is to further determine whether this finding is applicable to a large music annotation task. To achieve our goal, we implement an audio annotation system that can tag audio tracks with meaningful words, and experiment with key augmented audio features. Particularly we are interested in finding a key invariant feature to fit in the system. We adopt the Computer Audition Lab 500-Song (CAL500) dataset containing 502 popular Western songs and each with a list of human annotated tags. We build two classifiers for annotation. In our case, classifiers are called tag models. They define how audio features are gathered from training tracks to form tag models. Based on Gaussian Mixture Model, we learn each track-level distribution by running Expectation-Maximization (EM) algorithm. Our tag models are built on those tracklevel distributions. Firstly, we adopt Model Averaging on original CAL500 dataset. This tag model takes weighted average over related audio tracks. Although Model Averaging model produces good results, however it is computationally expensive. So we choose Mixture Hierarchical Model as an alternative to do the augmentation experiments. Instead of naive averaging, it computes a word-level distribution based on the related track-level distributions by running hierarchical EM (HEM) algorithm and produces fixed small number of Gaussian Mixture components. To further find out how MFCC, a commonly used short term low level feature, can be made invariant to musical key and how this key augmented feature will perform for the annotation task, we use two different techniques to do feature augmentation. One is using Audacity to generate key shifted songs and the other one is by changing Mel band lters during the computation of MFCC. Traditionally in Audactiy key shifted model, we need to output the dataset in sound wave i.e. mp3/wav format for each key shift. With our novel approach of MFCC Mel band filters changing key simulation, we get the key shifted features without producing extra sound track files. We compare the augmentation results for each individual approach under four models depending on the training or testing phrases where augmented dataset is used. Namely these four models are Original, Augmented Training, Augmented Testing and Augmented Both (Training and Testing). With the results from feature augmentations, we analyze each model in terms of precision and recall for annotation and area under characteristic curve for retrieval. From our experiments, we find that key augmented MFCC performs better for some certain categories. However, there is still, some categories such as Instrument which are better predicted using the original dataset | en_US |
dc.rights | This work is protected by copyright. Reproduction or distribution of the work in any format is prohibited without written permission of the copyright owner. | en_US |
dc.rights | Access is restricted to CityU users. | en_US |
dc.title | Music Annotation with Augmented MFCC | en_US |
dc.contributor.department | Department of Computer Science | en_US |
dc.description.supervisor | Supervisor: Dr. Chan, Antoni Bert; First Reader: Dr. Lee, Kenneth Ka Chun; Second Reader: Prof. Li, Qing | en_US |
Appears in Collections: | Computer Science - Undergraduate Final Year Projects |
Files in This Item:
File | Size | Format | |
---|---|---|---|
fulltext.html | 145 B | HTML | View/Open |
Items in Digital CityU Collections are protected by copyright, with all rights reserved, unless otherwise indicated.