A new chatbot outperforms doctoral researchers at literature reviews, and could end up being the secret weapon science has been waiting for, writes Satyen K. Bordoloi
PhD students have it tough. No matter how bright they are, they have to drown years not just into research, but in a sea of thousands of scientific papers, oscillating between dreams and dread. Part of the job involves conducting a literature review, as students spend weeks meticulously combing through existing research for their dissertations.
But, often, they discover they missed something, like a crucial paper published in a journal they’ve never heard of, but that challenges key facts of their paper. Without that paper, the research can never be complete. If only AI could help in that. Turns out, AI can, and it can do so in minutes.
One of the biggest academic nightmares has thus got an AI upgrade, and it is slated to change how science is done forever.

The Rise of the Reading Machines
Researchers from the University of Washington and the Allen Institute for AI have released OpenScholar, a specialised AI system that not only reads scientific papers but also synthesises them with the kind of thoroughness that would make any dissertation committee whoop with delight. And here’s the beauty: in head-to-head evaluations, expert scientists said they preferred OpenScholar’s responses to those written by actual PhD students and postdocs 51% to 70% of the time.
This is hard to fathom; what would take years of research and weeks or months of writing is being done within minutes. AI is no longer foraying into simple, pattern-based jobs but is coming for the ivory tower itself, armed with 45 million open-access papers and a retrieval system that is straight from the future.

AI as the Ultimate Research Assistant
This would naturally prompt many to pen angry manifestos about unemployment caused by the march of technology. Yet this is not a threat but an opportunity. It is not a replacement for human brilliance, but an amplification of it. Think of it as giving every scientist a research assistant with perfect memory and the ability to cross-reference findings across multiple disparate disciplines, and through millions of papers, with just a click of a button.
Think of what this could accomplish. Connections that were never even thought possible would be observed. New connections would be made between different ideas and subjects. The shot in the arm that this would give is unparalleled
This system operates by using what researchers call “retrieval-augmented language models”: essentially, it’s like having a librarian who has memorised every book in existence and can synthesise answers from multiple sources simultaneously. Unlike standard LLM-based chatbots that fabricate non-existent citations at an unacceptably high rate, OpenScholar achieves citation accuracy on par with human experts. This means that when OpenScholar tells you something, it can actually prove its existence and cite the source.

How to Make an AI That Doesn’t BS You
What makes OpenScholar different from LLMs that sprout confident nonsense are three key innovations that are both deceptively simple yet devastatingly effective.
First is access to a massive, specialised database called OpenScholar DataStore (OSDS), which has over 45 million papers and 236 million passage embeddings, and is often called the largest open scientific data store available. Think of it as giving AI keys to the world’s best research library, except it is instantly searchable and the doors never shut.
The second key differentiator is that OpenScholar doesn’t just retrieve information; it iteratively refines its answers through a “self-feedback loop.” It is an AI equivalent of a human scientist reading their draft, saying “this could be better,” and actually making it better, multiple times. The system generates an initial response, critiques itself, retrieves additional relevant papers if needed, and refines the output until it’s publication-ready.
Third, and perhaps most importantly, it verifies its citations. Every claim is backed by actual passages from real papers. This is absolutely essential in an age of AI hallucinations and AI slop.

Connecting Dots Humans Never Saw
What AI is best at, is pattern recognition. This job, hence, in a way, is what AI was made for. OpenScholar doesn’t just replicate human literature review capabilities; in the right hands, it can reveal patterns that have always been there but we’ve been missing all along. That is because human researchers have a few inherent problems. First, we work in silos. Even within our own field, no matter how broad our research circle, we might still know only a little of what is known so far about the subject we are researching.
Second comes the question of connecting across branches: a neuroscientist might never read that crucial physics paper that could revolutionise brain imaging technologies, or a materials scientist might miss the biological research that could hold the key to their next element. What makes us human is also what constraints us: our limit to how much we can train, our reading speeds, our institutional access, and frankly, the number of waking hours in a day.
A tool like OpenScholar, by contrast, can simultaneously consider findings from computer science, physics, biomedicine, and neuroscience without cognitive fatigue or disciplinary bias. It can identify connections between a 2019 paper on quantum computing and a 2024 study on protein folding that no human researcher had previously linked. It can synthesise methodologies from disparate fields, suggesting hybrid approaches that might never occur to domain specialists working in isolation.
This isn’t science fiction, but just an industrial-level pattern recognition. And when you’re dealing with 45 million papers growing by thousands every week, pattern recognition at scale becomes the difference between incremental progress and a radical shift in every field this tool is effectively used in.

Science Is Moving Too Fast for Humans Alone
Consider the brutal mathematics of modern research: thousands of new scientific papers are published every single day across multiple disciplines. Now, even if a researcher’s every waking moment is dedicated to reading, no eating, sleeping, or even the existential dread of nothingness allowed, that researcher would fall further behind with each passing week.
This creates what researchers call “the velocity problem”: science is advancing so rapidly that keeping up with your own field, let alone other disciplines, has become functionally impossible. Breakthroughs are happening at a rapid clip, and connections are waiting to be made. But how can you make them if they are buried under an avalanche of new publications that no single human can possibly come around to reading?
These Machine Learning systems fed with millions of these papers offer a solution to this velocity crisis. Automating the literature synthesis process frees scientists to focus on what humans still do better than machines: ask interesting questions, design creative experiments for uses rooted in human experience, and make intuitive leaps that connect seemingly unrelated concepts.
Imagine a cancer researcher who can query such a system about the latest findings in immunotherapy, materials science, and machine learning: all in the same afternoon, and receive comprehensive, citation-backed syntheses within hours, something that would have taken months to compile manually. That researcher will not stop being essential; they become exponentially more effective.

From Months to Minutes
Thus, the true potential of transformation is in democratisation and acceleration. Currently, comprehensive literature reviews are the domain of well-funded research institutions with extensive journal subscriptions and large graduate student workforces. Tools like OpenScholar, built entirely from open-access papers, could dramatically level this playing field.
A researcher at the remote Aizawl University, with limited institutional resources and physical reach, could access the same synthesised knowledge as someone at Harvard or MIT. A brilliant undergraduate with a novel hypothesis could explore whether existing literature supports their intuition without waiting years for doctoral training. Small research teams could compete with large labs in their ability to synthesise cross-disciplinary insights.
And most importantly, the time scale changes completely. What currently takes expert researchers approximately one hour to produce, such a system could generate in minutes, letting the researcher do endless iterations. This means that the more iterations that are created, the more hypotheses are tested, which means more dead ends are identified quickly, thus leading to more promising avenues explored thoroughly that could eventually lead to some world-altering findings.
When you compress the research cycle from months to days, from weeks to hours, you don’t just speed up existing science: you enable entirely new modes of scientific exploration.

The Human-AI Collaboration is Key
Despite all the positivity, some would naturally get disturbed. But the best thing that researchers have found is that the best results came not from AI alone, but from AI augmenting human expertise. When researchers combined OpenScholar with GPT-4o, correctness improved by 12% over GPT-4o alone. When they used their custom-trained 8B model, it outperformed PaperQA2 (built on proprietary systems) by 5.5%.
It is obvious that specialised AI tools, designed thoughtfully for specific scientific tasks, amplify human capability rather than replace it. AI tools like OpenScholar are a superpower for researchers: the ability to survey vast bodies of literature comprehensively, identify relevant connections instantly, and ground every claim in verifiable sources.
Expert scientists continue to outperform the AI in nuanced judgment calls, identifying the most representative papers, and recognising when citations, though technically accurate, missed more seminal works. Humans still bring irreplaceable domain expertise, creative hypothesis generation, and the ability to recognise which questions are worth asking in the first place.
But equipped with OpenScholar, the same humans become dramatically more efficient. Hence, it is not humans versus machines: it is human plus machine versus the overwhelming complexity of modern science.
The Citation Crisis
One of the most underappreciated achievements of OpenScholar is solving AI’s credibility problem. Large language models have a nasty habit of generating plausible-sounding nonsense with supreme confidence, aka hallucinations. Take GPT-4o, which, when asked to cite recent literature across fields, fabricated citations 78-90% of the time, making up paper titles, authors, and findings, but presented it with the same authoritative tone as real information.
OpenScholar’s citation accuracy matches that of human experts, with every claim backed by verifiable passages from actual published papers. This might sound like an obvious requirement, but it is a major breakthrough in trustworthy AI systems. Science depends on knowledge and accuracy: knowing where information comes from, who validated it, and under what conditions. OpenScholar does this while operating at machine speed and scale, thus revolutionising the whole scientific field.
What this does is accelerate science. It is not AI “doing” science, but AI enabling scientists to work at unprecedented scale and speed. This could speed up drug discovery, advance climate research through a rapid, cross-disciplinary approach, enable medical breakthroughs from unexpected connections, and, most of all, give emerging researchers and their institutions the chance to compete on merit rather than on the resources their institutions have access to.
A caveat is warranted, as the researchers acknowledge significant limitations. OpenScholar doesn’t consistently retrieve the most representative papers for every query. Like all LLMs, it can generate factually inaccurate content, particularly with the smaller 8B model. Most of all, as it stands right now, it works only with open-access papers, which means a vast swath of research is still behind paywalls.
And most important of all: no matter what the tool, it might be brilliant at synthesising answers to questions that we pose, but it still can iterate all the way create something unique or useful to us because the system’s context just does not have the feeling of human experience, and will never do.
Hence, no matter how good such systems become, we have to remember that they can never be a replacement, but an augmentation. Of course, bad researchers will fall by the wayside. But the good ones will grow multiple hands, not just to transform what they do, but to reshape the entire world.
Nothing is more exciting than giving a good scientist access to every research ever conducted in any discipline, so they can pluck brilliant ideas languishing in obscure journals. Who knows where this will take them? Where this will take science? And where will this take us all, as humanity?
In case you missed:
- AI Hallucinations Are a Lie; Here’s What Really Happens Inside ChatGPT
- Don’t Mind Your Language with AI: LLMs work best when mistreated?
- The Growing Push for Transparency in AI Energy Consumption
- The B2B AI Revolution: How Enterprise AI Startups Make Money While Consumer AI Grabs Headlines
- 75 Years of the Turing Test: Why It Still Matters for AI, and Why We Desperately Need One for Ourselves
- AI vs AI: New Cybersecurity Battlefield Where No Humans Are in the Loop
- Google’s “Learn Your Way” is Set to Redefine Education and Challenge the AI “Dumbing Down” Narrative
- Greatest irony of the AI age: Humans being increasingly hired to clean AI slop
- The Cheating Machine: How AI’s “Reward Hacking” Spirals into Sabotage and Deceit
- Rogue AI on the Loose: Can Auditing Uncover Hidden Agendas on Time?









