Dzongkha Automatic Speech Recognition

Record or Upload Audio



The Automatic Speech Recognition (ASR) model has several limitations. Firstly, it is trained on a relatively small dataset of only 5000 samples, which may impact its performance on diverse accents, dialects, or speech patterns not well represented in the training data. Secondly, the model, especially when used with a language model (LM), may struggle with accurate word segmentation since it is not specifically trained for that task. As a result, it may produce transcriptions with incorrect grammar, spelling errors, and unrecognizable words or characters. These limitations should be considered when relying on the ASR model's transcriptions, and further improvements can be explored through larger datasets, specialized language models, and post-processing techniques.


Pema Galey


Mr. Pema Galey is an associate lecturer at the Information Department, College of Science and Technology. He holds a Master of Engineering in Software Systems Engineering with a specialization in NLP (speech recognition) and Software Engineering.

Ngawang Samten Pelzang


Ngawang Samten Pelzang is a student at the College of Science and Technology, pursuing an undergraduate degree in Information Technology (2019-2023).

Kinley Rabgay


Kinley Rabgay is a student at the College of Science and Technology, pursuing an undergraduate degree in Information Technology (2019-2023).

Khentse Dorji


Khentse Dorji is a student at the College of Science and Technology, pursuing an undergraduate degree in Information Technology (2019-2023).