KVT dataset provides semantic labels of singing voices from K-pop songs.
Notable Features
K-pop songs!
Tags are about singing voices.
Ratings are collected from a survey, not mined from social data.
Each rating is about a 10-second-long segment with vocal presence.
Note
Due to copyright issue, we don't provide audio data in this page(please contact ).
Instead, we are adding links to music content provider(including Youtube) for each song.
Also you can download mfcc data of our audio data, which can be used to check if your version of audio data is consistent to ours.
Data files in the above table contain all the important data except audio files.
Audio files
We're updating links to music content providers for each songs in "Songs" section and "songs.tsv" file.
You may buy or download audio data from specified links.
For segmentation, "segments.*"" files contain offset information.
For validation, you can consult "mfcc.zip", which contains MFCC data of our version of segment audio files.
File formats
.pkl files are pickle dump of Python dict object.
We used Python3 in the process.
If you're using Python2, it will be much easier to read .json files and use json module to parse them.
.json files contain data in JSON format, and .js files contain "var VARNAME = " and what their .json counterparts have.
You can imnport .js files in HTML. This page, for example, uses "segments.js" to make list of semgents in the "Segments" tab with Vuejs.
ratings.npy file contain a numpy array. The order of tags are in tag files.
Unlike other songs files, songs.tsv contains links to content providers only and will be frequently updated.
Note
Currently, song titles and artist names are mostly in Korean(unicode string).
We're trying to match official English titles and names now.
English versions will be added after the process.
segID
title
artist
offset
{{ segment.segID }}
{{ segment.title }}
{{ segment.artist }}
{{ segment.offset }}
Song Retrieval
Query song
Similar songs
{{song[0]}} - {{song[1]}}
Note
You can find songs from the same artist are listed in similar songs.
Similarity between two songs can be calculated by cosine similarity of mean ratings for each song from their segment ratings respectively.
We used precaluated cosine similarity from "mean_song_ratings.js" here.
Check the source code of this page to see how similar songs are retrieved.