K-pop Singing Vocal Tag Dataset (KVT dataset)

TL;DR

KVT dataset provides semantic labels of singing voices from K-pop songs.

Notable Features

  • K-pop songs!
  • Tags are about singing voices.
  • Ratings are collected from a survey, not mined from social data.
  • Each rating is about a 10-second-long segment with vocal presence.

Note

Due to copyright issue, we don't provide audio data in this page(please contact ). Instead, we are adding links to music content provider(including Youtube) for each song. Also you can download mfcc data of our audio data, which can be used to check if your version of audio data is consistent to ours.
Name .pkl .json .js others
artists artists.pkl artists.json artists.js
tags tags.pkl tags.json tags.js
segments songs.pkl songs.json songs.js songs.tsv (v. 0.1)
segments segments.pkl segments.json segments.js
ratings ratings.pkl ratings.json ratings.js ratings.npy
mean song ratings mean_song_ratings.pkl mean_song_ratings.json mean_song_ratings.js mean_song_ratings.npy
mfcc mfcc.zip

Data files

Data files in the above table contain all the important data except audio files.

Audio files

We're updating links to music content providers for each songs in "Songs" section and "songs.tsv" file. You may buy or download audio data from specified links. For segmentation, "segments.*"" files contain offset information. For validation, you can consult "mfcc.zip", which contains MFCC data of our version of segment audio files.

File formats

  • .pkl files are pickle dump of Python dict object. We used Python3 in the process. If you're using Python2, it will be much easier to read .json files and use json module to parse them.
  • .json files contain data in JSON format, and .js files contain "var VARNAME = " and what their .json counterparts have. You can imnport .js files in HTML. This page, for example, uses "segments.js" to make list of semgents in the "Segments" tab with Vuejs.
  • ratings.npy file contain a numpy array. The order of tags are in tag files.
  • Unlike other songs files, songs.tsv contains links to content providers only and will be frequently updated.

Note

  • Currently, song titles and artist names are mostly in Korean(unicode string). We're trying to match official English titles and names now. English versions will be added after the process.
segID title artist offset
{{ segment.segID }} {{ segment.title }} {{ segment.artist }} {{ segment.offset }}

Song Retrieval

Query song
Similar songs
  1. {{song[0]}} - {{song[1]}}
Note
You can find songs from the same artist are listed in similar songs.
Similarity between two songs can be calculated by cosine similarity of mean ratings for each song from their segment ratings respectively. We used precaluated cosine similarity from "mean_song_ratings.js" here.
Check the source code of this page to see how similar songs are retrieved.

Prediction Demo

NAUL - Memory of The Wind


Sohyang - Fate