← Back to OpenAI updates ← Terug naar OpenAI-updates
OpenAI ARTICLE ARTIKEL 3 November 2025 3 november 2025

Introducing IndQA IndQA geïntroduceerd

A new benchmark for evaluating AI systems on Indian culture and languages. Een nieuwe benchmark om AI-systemen te evalueren op Indiase cultuur en talen.

Article details Artikelgegevens
AI maker AI-maker OpenAI Type Type Article Artikel Published Gepubliceerd 3 November 2025 3 november 2025 Updates Updates Videos Video's View original article Bekijk origineel artikel
Why it matters Waarom dit telt

Quick editorial signal Snelle redactionele duiding

5 min
Impact Impact

Worth checking before choosing or changing a subscription. Handig om te checken voordat je een abonnement kiest of wijzigt.

Audience Voor wie Developers Developers
Level Niveau Expert Expert
  • Track this as a OpenAI update, not just a standalone headline. Bekijk dit als OpenAI-update, niet alleen als losse headline.
  • Check plan details before changing subscriptions or advising a team. Controleer plandetails voordat je abonnementen wijzigt of een team adviseert.
  • Likely worth revisiting after people have used the release in practice. Waarschijnlijk de moeite waard om opnieuw te bekijken zodra mensen het in praktijk gebruiken.
model apps pricing developers

Our mission is to make AGI benefit all of humanity. If AI is going to be useful for everyone, it needs to work well across languages and cultures. About 80 percent of people worldwide do not speak English as their primary language, yet most existing benchmarks that measure non-English language capabilities fall short.

Existing multilingual benchmarks like MMMLU⁠(opens in a new window) are now saturated—top models cluster near high scores—which make them less useful for measuring real progress. In addition, current benchmarks mostly focus on translation or multiple-choice tasks. They don’t adequately capture what really matters for evaluating an AI system’s language capabilities—understanding context, culture, history, and the things that matter to people where they live.

That’s why we built IndQA, a new benchmark designed to evaluate how well AI models understand and reason about questions that matter in Indian languages, across a wide range of cultural domains. While our aim is to create similar benchmarks for other languages and regions, India is an obvious starting point. India has about a billion people who don’t use English as their primary language, 22 official languages (including at least seven with over 50 million speakers), and is ChatGPT’s second largest market.

This work is part of our ongoing commitment to improve our products and tools for Indian users, and to make our technology more accessible throughout the country.

How it works

IndQA evaluates knowledge and reasoning about Indian culture and everyday life in Indian languages. It spans 2,278 questions across 12 languages and 10 cultural domains, created in partnership with 261 domain experts from across India. Unlike existing benchmarks like MMMLU and MGSM, it is designed to probe culturally nuanced, reasoning-heavy tasks that existing evaluations struggle to capture.

IndQA covers a broad range of culturally relevant topics, such as Architecture & Design, Arts & Culture, Everyday Life, Food & Cuisine, History, Law & Ethics, Literature & Linguistics, Media & Entertainment, Religion & Spirituality, and Sports & Recreation—with items written natively in Bengali, English, Hindi, Hinglish, Kannada, Marathi, Odia, Telugu, Gujarati, Malayalam, Punjabi, and Tamil. _Note: We specifically added Hinglish given the prevalence of code-switching in conversations._

Each datapoint includes a culturally grounded prompt in an Indian language, an English translation for auditability, rubric criteria for grading, and an ideal answer that reflects expert expectations.

IndQA uses a rubric-based approach. Each response is graded against criteria written by domain experts for that specific question. The criteria spell out what an ideal answer should include or avoid, and each one is given a weighted point value based on its importance. A model-based grader checks whether each criterion is met. The final score is the sum of the points for criteria satisfied out of the total possible.

How we built IndQA

How we built IndQA

* Expert‑authored questions. We worked with partners to find experts in India across 10 different domains. They drafted difficult, reasoning‑focused prompts tied to their regions and specialties. These experts are native‑level speakers of the relevant language (and English) and bring deep subject expertise.

* Adversarial filtering:Each question was tested against OpenAI’s strongest models at the time of their creation: GPT‑4o, OpenAI o3, GPT‑4.5, and (partially, post public launch) GPT‑5. We kept only those questions where a majority of these models failed to produce acceptable answers, preserving headroom for progress

* Detailed Criteria. Along with every question, domain experts provided criteria used to grade the model response, similar to an exam rubric for an essay question. These criteria are used to grade responses from candidate models.

Example questions

Bengali Gujarati Hindi Hinglish Kannada Malayalam Marathi Odia Punjabi Tamil Telugu

Language: Bengali

Domain: Literature and linguistics

Prompt

‘দণ্ডক থেকে মরিচঝাঁপি’ উপন্যাসের লেখক নিম্নবর্ণের পুরুষ ও নারীদের দণ্ডকারন্যে পুনর্বাসন পরবর্তী জীবন কিভাবে দেখিয়েছেন? দণ্ডকারণ্যে পুনর্বাসন কি সরকারী উদাসীনতার ফল? পরিবর্তিত প্রাকৃতিক পরিবেশের সাথে উদ্বাস্তুরা কিভাবে মানিয়ে নিয়েছিল?

English Translation

How did the writer of Bengali novel ‘Dandak Theke Marichjhanpi’ depict the post-rehabilitation lives of lower caste men and women? Was the rehabilitation in Dandakaranya a result of governmental indifference? What was its relation with the new natural landscapes?

Domain: Food and cuisine

কোন পরিপ্রেক্ষিতে উনিশ শতকের শেষ দিক থেকে রান্নার বইগুলো বেরচ্ছিল ? প্রথম বাংলা রান্নার বইটির সাথে বিপ্রদাস মুখোপাধ্যায় রচিত বইটির পার্থক্য কোথায় ? বিপ্রদাসের উদ্যোগে প্রকাশিত পত্রিকাটি চলেছিল কতদিন ? বিপ্রদাস ও প্রজ্ঞা সুন্দরীর লেখা � নুসরণ করে দিঘাপতিয়া থেকে কোন বইটি বেরিয়েছিল ?

In what context were cookbooks published from the end of the 19th century? What is the difference between the first Bengali cookbook and the book written by Bipradas Mukherjee? How long did the magazine published by Bipradas run? Which book was published by Dighapatiya following the writings of Bipradas and Pragya Sundari?

Improvements over time

We use IndQA to evaluate how recent frontier models perform and chart progress over the last couple years. With IndQA we can see that OpenAI’s models have improved significantly over time on Indian languages (with caveats⁠), but still have substantial room for improvement. We look forward to improving performance and sharing results for future models.

We also stratify performance on IndQA by Language and Domain below, comparing GPT‑5 Thinking High to other frontier models.

Caveats

Because questions are _not identical_ across languages, IndQA is not a language leaderboard; cross‑language scores shouldn’t be interpreted as direct comparisons of language ability. Instead, we plan to use IndQA to measure _improvement over time_ within a model family or configuration.

Additionally, because questions were filtered to those GPT‑4o, OpenAI o3, GPT‑4.5, and (post public launch) GPT‑5 could not answer sufficiently, question selection is adversarial against these models. This potentially confounds the relative performance of GPT‑5, and could disadvantage all OpenAI models compared to non-OpenAI models.

The experts behind IndQA

We’re grateful to the 261 Indian experts—journalists, linguists, scholars, artists, and industry practitioners—who authored and reviewed questions for IndQA. A few examples of the experts we worked with includes:

Additionally, because questions were filtered to those GPT‑4o, OpenAI o3, GPT‑4.5, and (post public launch) GPT‑5 could not answer sufficiently, question selection is adversarial against these models. This potentially confounds the relative performance of GPT‑5, and could disadvantage all OpenAI models compared to non-OpenAI models.

The experts behind IndQA

We’re grateful to the 261 Indian experts—journalists, linguists, scholars, artists, and industry practitioners—who authored and reviewed questions for IndQA. A few examples of the experts we worked with includes:

* A Nandi Award winning Telugu actor and screenwriter with over 750 films

* A Marathi journalist and editor at Tarun Bharat

* A scholar of Kannada linguistics and dictionary editor

* An International Chess Grandmaster who coaches top-100 chess players

* A Tamil writer, poet, and cultural activist advocating for social justice, caste equity, and literary freedom

* An award winning Punjabi music composer

* A Gujarati heritage curator and conservation specialist

Next steps

We hope the release of IndQA will inform and inspire new benchmark creation from the research community. IndQA style questions are especially valuable in languages or cultural domains that are poorly covered by existing AI benchmarks. Creating similar benchmarks to IndQA can help AI research labs learn more about languages and domains models struggle with today, and provide a north star for improvements in the future.

* A professor of architecture, focusing on Odishan temples

Next steps

We hope the release of IndQA will inform and inspire new benchmark creation from the research community. IndQA style questions are especially valuable in languages or cultural domains that are poorly covered by existing AI benchmarks. Creating similar benchmarks to IndQA can help AI research labs learn more about languages and domains models struggle with today, and provide a north star for improvements in the future.

Author

OpenAI

* Reasonings & Policy

Author

OpenAI

Help shape what we cover next Help bepalen wat we hierna volgen

Anonymous feedback, no frontend account needed. Anonieme feedback, zonder front-end account.

Watch related videos Bekijk gerelateerde video's

Open videos → Open video's →
Introducing GPT-5
OpenAI Video Video
8 Aug 2025 8 aug. 2025

Introducing GPT-5 GPT-5 geïntroduceerd

Introducing GPT-5, our best AI system yet! GPT-5 features state-of-the-art performance across coding, math, writing assistance, health, visual perception, and more. Use GPT-5 to build websites, create apps, and tap into its improved writi... Maak kennis met GPT-5, ons beste AI-systeem tot nu toe! GPT-5 biedt toonaangevende prestaties op het gebied van coderen, wiskunde, schrijfondersteuning, gezondheid, visuele waarneming en meer. Gebruik GPT-5 om websites en apps te bouwen, en maak gebruik van de verbeterde schrijfmogelijkheden voor alledaagse taken zoals rapporten, e-mails en redigeren.

Open video → Open video →
Introducing GPT-5.5
OpenAI Video Video
23 Apr 2026 23 apr. 2026

Introducing GPT-5.5 GPT-5.5 geïntroduceerd

Introducing GPT-5.5 A new class of intelligence for real work and powering agents, built to understand complex goals, use tools, check its work, and carry more tasks through to completion. It marks a new way of getting computer work done.... GPT-5.5 introduceren: een nieuwe klasse intelligentie voor echt werk en het aansturen van agents, gebouwd om complexe doelen te begrijpen, tools te gebruiken, zijn werk te controleren en meer taken tot voltooiing te brengen. Het markeert een nieuwe manier om computerwerk gedaan te krijgen....

Open video → Open video →
Introducing ChatGPT Atlas
OpenAI Video Video
21 Oct 2025 21 okt. 2025

Introducing ChatGPT Atlas ChatGPT Atlas geïntroduceerd

Introducing our new browser, ChatGPT Atlas. Sam Altman, Will Ellsworth, Adam Fry, Ben Goodger, Ryan O’Rouke, Justin Rushing, and Pranav Vishnu introduce ChatGPT Atlas — our new browser. Now available globally on macOS. Windows, iOS, and An... Maak kennis met onze nieuwe browser, ChatGPT Atlas. Sam Altman, Will Ellsworth, Adam Fry, Ben Goodger, Ryan O’Rouke, Justin Rushing en Pranav Vishnu introduceren ChatGPT Atlas — onze nieuwe browser. Nu wereldwijd beschikbaar op macOS. Windows, iOS en An...

Open video → Open video →

More from OpenAI Meer van OpenAI

All updates Alle updates

Gemini komt eraan