03:53 (GMT +7) - Saturday 08/08/2020


Zalo wins first prize in Al competition

Released at: 14:27, 09/11/2019

Zalo wins first prize in Al competition

Photo: Zalo

Popular Vietnamese chat app and social media platform wins Text to Speech category at Vietnamese Language and Speech Processing competition.

by Phi Linh

Zalo announced on November 7 in Ho Chi Minh City that it was the winner in the Text to Speech and first runner-up in the Automatic Speech Recognition category at the 6th Vietnamese Language and Speech Processing (VLSP) competition, co-organized by the Academy of Science and Technology and the University of Natural Sciences at the Vietnam National University, Hanoi.

Vietnamese language and speech processing remains a complex puzzle for AI researchers in Vietnam. In the many years of the VLSP competition, the topic has been a subject gathering expert groups of enterprises, universities, and research institutes to exchange experiences and find the best solutions.

Launched in early September, VLSP 2019 included four different categories in the field of natural language processing: Hate Speech Detection on Social Networks, Vietnamese Dependency Parsing, Automatic Speech Recognition, and Text to Speech.

Regarding Text to Speech, the training data set consisted of 15,000 southern accent recordings (about 23 hours) and nearly 1,000 northern accent recordings (about 45 minutes). Participating teams were required to train their AI models to “speak” Vietnamese with both northern and southern accents in a data set of 60 sentences for each. The result was sent randomly to 24 people, who would score based on the voice quality and how natural the audio files sound.

According to the contestants, the limited two-week timeframe was an issue and the noisy input data needed to be filtered. Zalo resolved the problem with a completely automated input data standardization solution. “This is an outstanding move compared to other teams,” said Mr. Nguyen Quoc Bao, Lead Scientist at Zalo.

In addition, as Zalo is currently investing in Vietnamese natural language and speech processing research, it has advantages in technology and human resources. Due to its latest technology, along with a team of experienced AI engineers, Zalo overcame many formidable opponents and won first prize in the Text to Speech category with a score of 3.94/5.

Vietnamese speech processing has been the subject of much research for a long time but all products are still in the stage of alpha tests. Mr. Bao said that in order to put the models into real world application, the training data set must be from 2,000 hours and ensure a diversity of accents and ways of speaking. Because more and more organizations like Zalo are drastically investing in AI, the opportunity to catch up with international technology corporations is there to be seized.

Zalo is currently applying speech processing technology into developing and completing Ki-Ki, a virtual assistant, and will integrate the model in Zing News and Bao Moi in the near future to provide the best experience for users.

User comment (0)

Send comment