OpenAI, a San Francisco-based AI probe and deployment steadfast that created ChatGPT, has introduced IndQA, a caller benchmark for evaluating AI systems connected Indian civilization and languages, present connected Tuesday.
The institution said its ngo was to marque AGI (Artificial General Intelligence) payment each of humanity, crossed languages and cultures. Some 80% of radical worldwide bash not talk English arsenic their superior connection and yet astir existing benchmarks that measurement non-English connection capabilities fell short, the steadfast noted.
That means, existing multilingual benchmarks similar MMMLU are present saturated,which marque them little utile for measuring existent progress. In addition, existent benchmarks mostly absorption connected translation oregon multiple-choice tasks. They don’t adequately seizure what truly matters for evaluating an AI system’s connection capabilities—understanding context, culture, history, and the things that substance to radical wherever they live.
That’s wherefore IndQA, a caller benchmark designed to measure however good AI models recognize and crushed astir questions that substance successful Indian languages, crossed a wide scope of taste domains, was built.
“Today we are rolling our IndQA. Built successful collaboration with 261 experts crossed 12 languages, IndQA fills a cardinal spread by enabling just and rigorous valuation that reflects India’s taste and linguistic diversity,’’ said Srinivas Narayanan, CTO, B2B Application, OpenAI .
According to Mr. Narayanan, the benchmark volition assistance each AI models execute amended successful languages and contexts that are presently underrepresented successful planetary datasets.
While OpenAI’s purpose was to make akin benchmarks for different languages and regions, India, where astir a cardinal radical didn’t talk English arsenic their superior connection and utilized 22 authoritative languages, was an evident starting constituent for the company.
According to institution officials, this enactment is portion of OpenAI’s ongoing committedness to amended products and tools for Indian users, and to marque its technology much accessible passim the state for a wide scope of users from students, farmers, educators and all.
IndQA evaluates cognition and reasoning astir Indian civilization and mundane beingness successful Indian languages. It spans 2,278 questions crossed 12 languages and 10 taste domains, created successful concern with 261 domain experts from crossed India, arsenic per OpenAI.
“Unlike existing benchmarks similar MMMLU and MGSM, it is designed to probe culturally nuanced, reasoning-heavy tasks that existing evaluations conflict to capture,’’ said the steadfast successful a blog.
IndQA covers a wide scope of culturally applicable topics, specified arsenic Architecture & Design, Arts & Culture, Everyday Life, Food & Cuisine, History, Law & Ethics, Literature & Linguistics, Media & Entertainment, Religion & Spirituality, and Sports & Recreation—with items written natively successful Bengali, English, Hindi, Hinglish (given the prevalence of code-switching successful conversations), Kannada, Marathi, Odia, Telugu, Gujarati, Malayalam, Punjabi, and Tamil.
IndQA uses a rubric-based approach; and each datapoint includes a culturally grounded punctual successful an Indian language, an English translation for auditability, rubric criteria for grading, and an perfect reply that reflects adept expectations.
Experts (who are native‑level speakers of the applicable connection and English with heavy expertise) from 10 antithetic domains successful India drafted difficult, reasoning‑focused prompts tied to their regions and specialties. Also, each question was tested against OpenAI’s strongest models astatine the clip of their creation: GPT‑4o, OpenAI o3, GPT‑4.5, and (partially, station nationalist launch) GPT‑5.
As a caveat, the steadfast said, due to the fact that questions were not identical crossed languages, IndQA was not a connection leaderboard; cross‑language scores shouldn’t beryllium interpreted arsenic nonstop comparisons of connection ability. Instead, IndQA would beryllium utilized to measurement betterment implicit clip wrong a exemplary household oregon configuration.
Also speaking astatine media conference, Mr. Narayanan said, ``India tin beryllium a beacon of however AI tin beryllium utilized for societal bully including education, wellness and farming etc.’’
He further said the institution has 4-5 cardinal developers globally. ``We are truly propping up the developer ecosystems truthful that they tin bash much with AI.. We proceed to amended our models, pushing the frontiers of exertion to assistance enterprises to person a amended agentic future.’’

7 months ago
5



