The AWAZ AI platform has officially launched in Pakistan as a citizen-powered initiative to build open-source language datasets for nine of the country’s native languages, including Urdu, Pashto, Punjabi, Sindhi, Balochi, Saraiki, Hindko, Brahui, and Kashmiri. The platform allows any Pakistani, anywhere in the world, to donate voice recordings or text in as little as two minutes, with no account or sign-up required.
What Is the AWAZ AI Platform?
Founded by Junaid Ahmad, AWAZ (آواز) addresses a well-documented gap in natural language processing technology: the near-total absence of quality training data for Pakistan’s native languages. Global AI systems have advanced rapidly for English and a handful of widely spoken languages, but the languages used daily by hundreds of millions of Pakistanis have remained largely invisible to machine learning models.
The AWAZ AI platform collects both voice recordings and written text, accepting input in native scripts as well as Roman transliteration, whichever contributors find more comfortable. All data collected will be made openly available to researchers, developers, and institutions working to build AI tools for Pakistani languages.
Why Pakistani Languages Have Been Left Behind
The core problem is straightforward: AI systems cannot serve speakers of a language if they have never been trained on sufficient data from that language. Speech recognition, automated translation, text-to-speech, and conversational assistants all depend on large, high-quality datasets. For languages like Pashto, Sindhi, or Balochi, those datasets have barely existed.
According to UNESCO’s work on language inclusion in education and technology, low-resource languages face compounding disadvantages when digital tools fail to support them, limiting access to healthcare information, government services, and economic opportunity for native speakers.
Junaid Ahmad, the founder, put it directly: “For years, the world’s AI has been built on English and a handful of major languages, while the languages of Pakistan were left out of the conversation. AWAZ is our answer to that. Every voice recorded and every sentence donated becomes the data that teaches machines to listen, read, and speak the way we do. Our languages are rich, ancient, and alive. It is time they earned their place in the global AI ecosystem, not as an afterthought, but as part of it.”
Key Features of the AWAZ AI Platform
The platform has been designed to remove as many barriers to participation as possible. Key features include:
- Support for 9 Pakistani languages: Urdu, Pashto, Punjabi, Sindhi, Balochi, Saraiki, Hindko, Brahui, and Kashmiri
- Contributions take as little as two minutes
- Both voice recordings and text donations are accepted
- No sign-up or account required
- Input accepted in native script or Roman/transliterated script
- All resulting datasets will be released as open data for researchers and developers
The Broader Impact: From Data to Real-World AI Tools
The AWAZ AI platform is not building end-user applications directly. Instead, it is creating the raw foundation that makes such applications possible. Once sufficient voice and text data exists for these languages, developers can train speech recognition models, build voice assistants, create automated translation services, and develop language tools that could meaningfully change how Pakistanis interact with technology.
The potential applications span multiple sectors. In healthcare, a Pashto-speaking patient in a rural area could interact with a medical information system in their own language. In education, Sindhi or Saraiki-language learning tools could become viable for the first time. In government services, automated systems could finally serve citizens who are not fluent in English or even standard Urdu.
This positions AWAZ as both a technology project and a digital inclusion effort, one that treats language preservation and AI development as complementary goals rather than separate concerns.
How to Contribute to AWAZ
Contributions to the AWAZ AI platform are open to anyone. Pakistanis in the country and abroad can visit awazdata.com to record their voice or donate text in any supported language. The process requires no registration, and participants can contribute in native script or Roman script depending on their preference.
The initiative is structured as a collective national effort, with the idea that the value of the dataset grows with every additional contributor. The more voices and text samples collected, the more useful the resulting data becomes for training robust AI models.
A Growing Focus on Low-Resource Language AI
AWAZ joins a small but growing number of global initiatives focused on low-resource language AI. Projects like Mozilla Common Voice have demonstrated that community-driven data collection can produce usable datasets for underrepresented languages. The AWAZ AI platform applies a similar model specifically to Pakistan’s linguistic landscape, which includes some of the most spoken yet digitally underserved languages in South Asia.
For Pakistan’s technology sector, the launch represents a practical step toward building an AI ecosystem that reflects the country’s actual population, rather than one that works only for English-speaking users.