• NextWave AI
  • Posts
  • African Languages for AI: Bridging the Digital Divide

African Languages for AI: Bridging the Digital Divide

In partnership with

The AI Insights Every Decision Maker Needs

You control budgets, manage pipelines, and make decisions, but you still have trouble keeping up with everything going on in AI. If that sounds like you, don’t worry, you’re not alone – and The Deep View is here to help.

This free, 5-minute-long daily newsletter covers everything you need to know about AI. The biggest developments, the most pressing issues, and how companies from Google and Meta to the hottest startups are using it to reshape their businesses… it’s all broken down for you each and every morning into easy-to-digest snippets.

If you want to up your AI knowledge and stay on the forefront of the industry, you can subscribe to The Deep View right here (it’s free!). 

Artificial Intelligence (AI) tools like ChatGPT, Siri, or Google Assistant are developed primarily in the Global North and trained mostly in English, Chinese, or European languages. As a result, African languages are vastly underrepresented in digital systems and AI technologies. This exclusion not only limits access but also widens the global technology divide.

The African Next Voices Project

To address this issue, a group of African computer scientists, linguists, and researchers launched the African Next Voices Project, a large-scale initiative aimed at building the most comprehensive dataset of African languages for AI so far.

The project, funded primarily by the Bill & Melinda Gates Foundation with additional support from Meta, involves collaboration across universities and research organizations in Kenya, Nigeria, and South Africa.

Its goal is to ensure that AI systems can understand, process, and respond in African languages — making technology more inclusive, fair, and useful for millions of speakers.

Why Language Matters for AI

Language is not just a communication tool; it is deeply tied to culture, values, and identity. AI systems that don’t understand local languages cannot accurately interpret users’ intent, leading to mistranslations and misunderstandings.

As AI becomes increasingly integrated into education, healthcare, and agriculture, the lack of linguistic diversity in AI models excludes millions of Africans from the benefits of digital innovation.

When a language is missing from AI data, its speakers are effectively invisible to these systems.

Challenges in African Language Representation

The scarcity of digital resources for African languages stems from colonial histories that marginalized indigenous tongues in favor of European ones.

Key challenges include:

  • Lack of digitized text and speech data

  • Absence of basic tools such as dictionaries, glossaries, and spell-checkers

  • Diverse dialects and orthographic variations

  • Technical barriers, like limited access to African language keyboards and tokenizers

These factors result in AI models that perform poorly when interpreting African speech or text, often producing unsafe or inaccurate outputs.

What the Project Is Doing

The African Next Voices project focuses on collecting speech data for Automatic Speech Recognition (ASR) — a core technology that converts spoken words into text.

Data collection includes:

  • Spontaneous and read speech

  • Conversations in multiple contexts (healthcare, agriculture, finance, etc.)

  • Participants from varied backgrounds (age, gender, education)

Ethical standards are strictly followed: every participant gives informed consent and receives fair compensation, while all recordings are checked for quality and linguistic accuracy.

Regional Highlights:

  • Kenya: Voice data in five languages — Dholuo, Maasai, Kalenjin (Nilotic), Somali (Cushitic), and Kikuyu (Bantu)

  • Nigeria: Data Science Nigeria is recording Bambara, Hausa, Igbo, Nigerian Pidgin, and Yoruba

  • South Africa: The Data Science for Social Impact Lab is documenting isiZulu, isiXhosa, Sesotho, Sepedi, Setswana, isiNdebele, and Tshivenda

This work builds upon previous community-driven efforts like Masakhane Research Foundation, Mozilla Common Voice, EqualyzAI, and Lelapa AI, creating a strong ecosystem for African AI development.

Practical Applications

Once developed, these datasets will serve many real-world uses:

  • Voice assistants that speak local languages

  • Captioning systems for local media content

  • Call-center tools that understand regional speech

  • Chatbots and education platforms in African languages

  • Cultural preservation through digital archiving

By enabling AI to function in African languages, these tools will empower local communities, enhance accessibility, and preserve linguistic heritage.

The Road Ahead

While the African Next Voices project marks a major milestone, it is only the beginning.
The next steps include:

  • Expanding to more African languages

  • Developing machine translation, grammar checkers, and summarization tools

  • Creating smaller, energy-efficient language models

  • Ensuring sustainability through open data access, training, and community involvement

The ultimate vision is a future where Africans can use AI naturally — whether in isiZulu, Hausa, or Kikuyu — and not be forced to depend solely on English or French interfaces.

Conclusion

Language inclusion in AI is not just a technical goal; it is a social and cultural necessity.
The African Next Voices project represents a vital step toward inclusive, ethical, and locally relevant AI that reflects the diversity of human expression.