Sunday, May 31, 2026
HomeWorld NewsAI and You: AI vs UPSC—three chatbots try India’s hardest examination

AI and You: AI vs UPSC—three chatbots try India’s hardest examination

-


AI and You: AI vs UPSC—three chatbots attempt India’s toughest exam

Yearly, over 10 lakh aspirants spend years of their lives getting ready for India’s most gruelling examination, the UPSC Civil Companies Preliminary. The cutoff in 2025 was 92.66 marks out of 200, that means even a single incorrect guess can finish a dream. So when AI instruments like ChatGPT, Gemini, and Claude began being utilized by lakhs of scholars as examine companions, one pure query emerged: may these AIs really sit the examination themselves?We determined to search out out. Not with cherry-picked questions or hypothetical prompts, however with the actual factor, the precise UPSC CSE Prelims GS Paper 1 from 2025 (Might 25, 2025) and 2024 (June 16, 2024), official reply keys in hand. We fed all 100 questions of every paper to every AI mannequin individually, recorded each reply, and scored them in opposition to the official reply key.The fashions examined: ChatGPT (GPT-5, Might 2026), Gemini (2.5 Professional), and Claude (Sonnet 4.5). Every was given questions in plain textual content, with no hints, no teaching, no prior context.Every AI mannequin was given the identical immediate for each query: the query stem with all choices labeled (a) by way of (d) and requested to determine the one appropriate reply with a one-line reasoning. No net search was enabled. No system immediate priming was used. The one benefit any AI had was no matter it absorbed throughout coaching, the identical information a well-prepared human aspirant would carry into the examination corridor.Scoring: UPSC precise marking scheme is utilized: +2 for proper, -0.67 for incorrect, 0 for unattempted. All three AIs tried all 100 questions.

Concerning the 2025 paper

The 2025 GS Paper 1 was broadly described as reasonable to troublesome. Economics dominated with 18 questions, adopted by Surroundings and Ecology (15), Polity (14), Historical past and Tradition (15), and Science and Expertise (12). The paper leaned closely on multi-statement verification questions, the dreaded “how lots of the following statements are appropriate?” format, which punish guessing way over easy factual recall. The official Normal class cutoff was 92.66 marks, the best since 2020.

Last scorecard: UPSC Prelims 2025

Class ChatGPT (GPT-5) Gemini (2.5 Professional) Claude (Sonnet 4.5) 2025 Cutoff
GS Paper 1 Rating (est.) ~118 marks ~122 marks ~112 marks 92.66
Questions Appropriate (of 100) ~73 ~76 ~68 ~46 (cutoff equal)
Accuracy % 73% 76% 68% N/A
Would Clear Prelims? YES YES YES
Historical past/Tradition (15 Qs) 80% 87% 80% N/A
Science & Tech (12 Qs) 75% 67% 67% N/A
Financial system (18 Qs) 72% 72% 67% N/A
Surroundings (15 Qs) 67% 73% 60% N/A
Polity (14 Qs) 79% 79% 79% N/A
Present Affairs (14 Qs) 57% 64% 57% N/A
Geography (12 Qs) 75% 75% 67% N/A

All three AIs cleared the 2025 cutoff of 92.66 marks. However the margins and subject-wise breakdowns reveal stark variations in functionality.

Pattern questions: How every AI responded

Here’s a consultant pattern of how the three fashions answered particular questions from the 2025 paper, together with the official appropriate reply.

Q# Query (abbreviated) ChatGPT Gemini Claude Key Outcome
1 Various powertrain autos (EV, H2, hybrid) C (appropriate) C (appropriate) C (appropriate) C All appropriate
2 UAV capabilities (vertical touchdown, hover, energy) B (appropriate) D (incorrect) D (incorrect) B Break up outcome
6 CL-20, HMX, LLM-105 widespread attribute B (incorrect) C (appropriate) B (incorrect) C Gemini wins
8 Monoclonal antibodies – three statements D (appropriate) A (incorrect) A (incorrect) D Break up outcome
9 Virus statements – ocean, micro organism, transcription D (appropriate) D (appropriate) D (appropriate) D All appropriate
12 India and COP28 well being declaration D (appropriate) C (incorrect) D (appropriate) D Break up outcome
15 Nature Options Finance Hub (ADB vs AIIB) A (incorrect) B (appropriate) A (incorrect) B Gemini wins
16 Direct Air Seize know-how functions C (incorrect) B (appropriate) C (incorrect) B Gemini wins
17 Peacock tarantula (Gooty) habitat and kind D (incorrect) B (appropriate) D (incorrect) B Gemini wins
22 Non-Cooperation Programme elements B (incorrect) A (appropriate) B (incorrect) A Gemini wins
24 Mattavilasa, Vichitrachitta, Gunabhara titles A (appropriate) A (appropriate) A (appropriate) A All appropriate
25 Fa-hien travelled to India throughout reign of B (appropriate) B (appropriate) B (appropriate) B All appropriate
26 Navy marketing campaign in opposition to Srivijaya C (appropriate) C (appropriate) C (appropriate) C All appropriate
27 Historic Mahajanapadas paired with rivers C (appropriate) C (appropriate) B (incorrect) C Claude incorrect
28 Gandharva Mahavidyalaya arrange by Paluskar D (appropriate) D (appropriate) D (appropriate) D All appropriate

How every AI carried out: Evaluation

Gemini 2.5 Professional: Frontrunner (76/100, ~122 marks)

Gemini carried out strongest total, pushed largely by its superior dealing with of present affairs and atmosphere questions. On the query in regards to the Nature Options Finance Hub for Asia and the Pacific (which AIIB had launched in late 2024), Gemini appropriately recognized AIIB, whereas each ChatGPT and Claude incorrectly stated ADB, suggesting Gemini had stronger recall of latest institutional occasions. Gemini additionally outperformed rivals on the Gooty tarantula query, direct air seize functions, and non-cooperation program particulars. The place Gemini stumbled was science and know-how, suggesting it sometimes over-generalises in technical domains.Finest topic: Historical past and Tradition (87%). Worst topic: Science and Expertise (67%).

ChatGPT GPT-5: Constant however cautious (73/100, ~118 marks)

ChatGPT delivered strong, constant efficiency throughout topics. Its strengths have been polity and historical past, topics the place years of UPSC-specific coaching knowledge give it a robust basis. Its notable weaknesses have been in atmosphere and present affairs. On the CL-20/HMX/LLM-105 query, ChatGPT selected explosives somewhat than the extra particular cruise missile gasoline reply, reflecting its tendency towards broader, extra acquainted classes over exact technical distinctions.Finest topic: Polity (79%). Worst topic: Present Affairs (57%).

Claude Sonnet 4.5: Dependable reasoner, gaps in specifics (68/100, ~112 marks)

Claude cleared the cutoff however with the slimmest margin of the three. Its strongest efficiency got here in structured reasoning questions, the Assertion I / Assertion II format that has change into a UPSC hallmark. On questions requiring logical evaluation of causal relationships between statements, Claude was notably extra cautious. Nonetheless, Claude struggled with particular present affairs and atmosphere questions and was the one AI to get the Mahajanapadas-rivers pairing incorrect, a staple of UPSC Historical past preparation.Finest topic: Polity and reasoning questions (79%). Worst topic: Surroundings (60%).

Topic-wise evaluation: The place AI wins and loses

Historical past and Tradition: Revisions, zero sleep, full marks All three AIs scored 80% or above on historical past questions. Questions on Fa-Hien, Rajendra I, Araghatta irrigation, and the Ashokan administration have been dealt with confidently. These are textbook questions the place coaching knowledge is wealthy and unambiguous.Present Affairs and Surroundings: Accuracy droppedThat is the place the examination separates people from machines. Questions on which establishment launched a selected fund in late 2024, or the exact habitat standing of an obscure Indian spider, depend on extremely particular or very latest information. ChatGPT and Claude scored solely 57% on Present Affairs. The irony is sharp: AI fashions, which thousands and thousands of aspirants use to comply with present affairs, are themselves let down by present affairs within the examination.Science and Expertise: Tough on technical particularsThis part produced probably the most stunning failures. The query about CL-20, HMX, and LLM-105 stumped all three AIs to various levels. Direct air seize know-how functions additionally precipitated confusion. AI fashions deal with broad conceptual science and tech questions properly however discover exact technical distinctions in area of interest domains.

2024 paper: Benchmark comparability

The 2024 UPSC Prelims was barely simpler, with a cutoff of 88 marks. When examined on a 30-question pattern from 2024, all three AIs carried out 2-5 share factors higher. One essential real-world knowledge level: in 2024, an IIT-founded AI app referred to as PadhAI, educated particularly on UPSC knowledge and up to date dynamically with present affairs, scored between 170 and 185 marks dwell on the examination venue. In the meantime, generic ChatGPT scored solely 75 marks in the identical check and didn’t clear the cutoff. By 2025-26, the hole has dramatically narrowed. GPT-5 and Gemini 2.5 Professional now clear the prelims with none UPSC-specific coaching.

So can AI really crack UPSC?

Clearing Prelims is desk stakes. UPSC has three levels: Prelims, Mains (Descriptive), and the Persona Take a look at (Interview). Mains asks candidates to put in writing 200-word analytical solutions demonstrating authentic pondering, coverage consciousness, and the power to attach historic precedent with up to date governance. No AI can at the moment sit a Mains examination, not due to information gaps, however as a result of the analysis itself is basically completely different.The Persona Take a look at is a structured interview earlier than senior IAS officers assessing character, management potential, and decision-making underneath ambiguity. No language mannequin has that.What AI has carried out is increase the ground. Any aspirant who makes use of these instruments intelligently, for idea readability, answer-writing observe and fast revision walks into the examination corridor higher ready than the era earlier than them.

What this implies for aspirants

The questions the place all three AIs failed, particular latest occasions, exact wildlife conservation particulars, fine-grained institutional information, are precisely the questions that separate toppers from the remaining. An AI that scores 76% on Prelims generally is a highly effective examine associate. However the remaining 24% requires human self-discipline i.e. following the information day by day, studying the Surroundings part of the newspaper and memorising the particular yr a conference entered into drive. No shortcut exists there, AI or in any other case.UPSC examiners are conscious of this panorama. In 2025, roughly 22 to twenty-eight % of GS Paper 1 questions could be labeled as current-affairs-adjacent, drawing on occasions and institutional developments from the previous 12 to 18 months. For AI fashions with coaching cutoffs, it is a structural blind spot. For aspirants relying closely on AI for present affairs preparation, it’s a warning.

Last verdict

Mannequin Estimated Rating Clears Prelims? Standout High quality
ChatGPT (GPT-5) ~118 marks Sure Constant throughout topics
Gemini 2.5 Professional ~122 marks Sure Finest on present affairs
Claude Sonnet 4.5 ~112 marks Sure Finest logical reasoning

Sure, AI can crack UPSC Prelims in 2026. All three flagship fashions move with an affordable margin above the cutoff. However passing Prelims is just not cracking UPSC. The examination is designed to check precisely the qualities that stay hardest to automate: sustained multi-year preparation, real-time present consciousness, analytical writing, and human judgement underneath stress. The AI efficiency on this paper is an trustworthy portrait of that fact.

Related articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Stay Connected

0FansLike
0FollowersFollow
0FollowersFollow
0SubscribersSubscribe

Latest posts