Building of Broadcast News Database for Evaluation of the Automated Subtitling Service

Matus Pleva; Jozef Juhar

Matus Pleva
Jozef Juhar

Keywords: broadcast news, segmentation, speech recognition, transcriber

Abstract

This paper describes the process of recording, annotation, correction and evaluation of the new Broadcast News (BN) speech database named KEMT-BN2, as an extension for our older KEMT-BN1 and COST-278 databases used for automatic Slovak continuous speech recognition development. The database utilisation and statistics are presented. This database was prepared for evaluation of the automated BN transcription system, developed in our laboratory, which is mainly used for subtitle generation for recorded BN shows. The speech database is the key part of the acoustic models training for specific domains and also for speaker and anchor adapted models creation.

Author Biographies

Matus Pleva

Department of Electronics and Multimedia Communications, Faculty of Electrical Engineering and Informatics, Technical University of Kosice, Slovakia

Jozef Juhar

Department of Electronics and Multimedia Communications, Faculty of Electrical Engineering and Informatics, Technical University of Kosice, Slovakia

Building of Broadcast News Database for Evaluation of the Automated Subtitling Service

Abstract

Author Biographies

Journal information