|
Identification split
|
26/10/2017 Overlap with SITW: The authors of the Speakers in the Wild (SITW) dataset have kindly released the overlap between the speakers in their dataset and VoxCeleb. The SITW codes of the speakers present in both datasets can be found here. Those wishing to use both datasets (SITW and VoxCeleb) will hence be required to reduce the overall size of the SITW dataset.
11/10/2017 Models: Pretrained models for Speaker Identification and Verification can be found here.
29/9/2017 VoxCeleb 1.1: After deduping the dataset, we have found a small list of repeated videos (34 videos). The list of duplicates can be found here. Note that these videos are only in the training set for identification, the test set remains unchanged.
The VoxCeleb dataset consists of Youtube URLs with timestamps for utterances. For privacy issues with the dataset, please refer to our Dataset Privacy Notice.
The provided VoxCeleb metadata is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
The URLs and timestamps for the VoxCeleb dataset are no longer available from this website.
The audio files for the VoxCeleb dataset are no longer available from this website.
The identifying metadata files for the VoxCeleb dataset are no longer available from this website.
VoxCeleb1-E and VoxCeleb1-H lists are drawn from the VoxCeleb1 training set. Therefore you cannot use any files in VoxCeleb1 for training if you are using these lists for testing.
Please cite the following if you make use of the dataset.