IITG Multivariability Speaker Recognition Database


The database was collected keeping in view the Indian scenario where there is a wide variability present in the languages, styles, environmental conditions and sensors. The entire corpus aims to aid automatic speaker recognition research in the direction of building systems that can perform equally well in the changing recording conditions. Study of sensor, language, style and environmental variations can be done in the Indian context. While taking recordings, the placement of the sensors was such that speech data is collected in degraded conditions like background noise and reverberation. To impart a practical dimension to the corpus, the choice of the sensors was such that they are portable, low cost and used extensively in public domain. The speakers were carefully selected to represent distinct parts of India speaking different languages. They were all from different educational backgrounds, ethnic orientations and age. All this ensured the needed inter-speaker variability. In order to get the intra-speaker variability, two sessions were taken for each speaker in which they read the same passages, but may be conversing in different way.

The database is created aiming to support and evaluate the automatic speaker recognition systems where channel, language, style and environments may vary. Accordingly, the database is named as IIT Guwahati (IITG) multi-variability (MV) speaker recognition database. It contains four sets, namely, IITG MV Phase-I, Phase-II, Phase-III and Phase-IV. The IITG-MV Phase-I dataset is collected from 100 subjects over two sessions in an office environment involving multiple sensors, multiple languages, and different speaking styles (conversational and read speech). IITG MV Phase-II also contains data of 100 speakers and differs from the Phase-I by collecting speech data mainly in multiple environments, namely, laboratories and hostel rooms, while keeping the other variabilities unchanged. In the third phase of recording, a truly conversational style telephonic speech data is collected and it is termed as IITG MV Phase-III dataset. Finally, in Phase-IV, with the view of supporting the development of a remote person authentication system, speech data was collected from the speakers all over the country through an interactive voice response (IVR) System and is termed as IITG MV Phase-IV dataset. The Phase IV of speech data is collected in three parts. The Part-I of Phase-IV involved speech data collection from 55 subjects all over India and to be used for the development of universal background model (UBM). The Part-II of Phase-IV involved speech data collection from 89 subjects all over India and to be used for the development of speaker models. The Part-III of Phase-IV has speech from 197 genuine trails and 130 imposter trails.

The database is organized as four phases as given below: