Flitto Arabic speech data collection project. [Photo: Flitto]

AI data company Flitto said on Monday it has launched a new project to collect high-quality Arabic speech data to improve multilingual recognition rates in AI models.

It said it pursued the project as demand for multilingual speech data from global big tech companies has recently increased.

Arabic has more than 30 dialects in addition to Modern Standard Arabic (MSA), and code-switching in spoken language is frequent, making it a language that is difficult to build AI training data for. Flitto will use the "Arcade" feature in its app to encourage participation by actual users and collect data.

Participants read and record the sentences provided, and an AI system automatically identifies the dialect type. If the analysis is uncertain, it provides additional sentences and encourages users to participate again, increasing data accuracy.

The project aims to go beyond simple voice collection and build refined training data that reflects linguistic diversity such as speakers' patterns, intonation and vocabulary choices. It plans to ease AI learning bias stemming from disparities in language resources and deliver high recognition rates in real-use environments.

Flitto CEO Jeong-su Lee (이정수) said, "Arabic is a low-resource language used by more than 400 million people but lacks training data for AI," and added, "By building data that systematically reflects characteristics unique to Arabic, we will contribute to further improving recognition quality in global AI models."

Keyword

#Flitto #Arabic #MSA #Arcade #AI
Copyright © DigitalToday. All rights reserved. Unauthorized reproduction and redistribution are prohibited.