Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
A framework for evaluating the chemical knowledge and reasoning abilities of large language models against the expertise of chemists
39
Zitationen
35
Autoren
2025
Jahr
Abstract
Large language models (LLMs) have gained widespread interest owing to their ability to process human language and perform tasks on which they have not been explicitly trained. However, we possess only a limited systematic understanding of the chemical capabilities of LLMs, which would be required to improve models and mitigate potential harm. Here we introduce ChemBench, an automated framework for evaluating the chemical knowledge and reasoning abilities of state-of-the-art LLMs against the expertise of chemists. We curated more than 2,700 question-answer pairs, evaluated leading open- and closed-source LLMs and found that the best models, on average, outperformed the best human chemists in our study. However, the models struggle with some basic tasks and provide overconfident predictions. These findings reveal LLMs' impressive chemical capabilities while emphasizing the need for further research to improve their safety and usefulness. They also suggest adapting chemistry education and show the value of benchmarking frameworks for evaluating LLMs in specific domains.
Ähnliche Arbeiten
UCSF Chimera—A visualization system for exploratory research and analysis
2004 · 47.118 Zit.
AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading
2009 · 35.680 Zit.
Gaussian basis sets for use in correlated molecular calculations. I. The atoms boron through neon and hydrogen
1989 · 31.349 Zit.
The M06 suite of density functionals for main group thermochemistry, thermochemical kinetics, noncovalent interactions, excited states, and transition elements: two new functionals and systematic testing of four M06-class functionals and 12 other functionals
2007 · 29.427 Zit.
<i>VESTA 3</i> for three-dimensional visualization of crystal, volumetric and morphology data
2011 · 24.213 Zit.
Autoren
- A.H. Mirza
- Nawaf Alampara
- Sreekanth Kunchapu
- Martiño Ríos-García
- Benedict Emoekabu
- Aswanth Krishnan
- T. Gupta
- Mara Schilling-Wilhelmi
- Macjonathan Okereke
- Anagha Aneesh
- Mehrdad Asgari
- J. Eberhardt
- Amir Mohammad Elahi
- Hani M. Elbeheiry
- M.V. Gil
- Christina Glaubitz
- Maximilian Greiner
- Caroline T. Holick
- Tim Hoffmann
- Ashraf S. Ibrahim
- Lea C. Klepsch
- Yannik Köster
- Fabian Alexander Kreth
- Jakob Meyer
- Santiago Miret
- Jan Matthias Peschel
- Michael Ringleb
- Nicole C. Roesner
- J. Schreiber
- Ulrich S. Schubert
- Leanne M. Stafast
- A. D. Dinga Wonanke
- Michael Pieler
- Philippe Schwaller
- Kevin Maik Jablonka
Institutionen
- Friedrich Schiller University Jena(DE)
- Helmholtz Institute Jena(DE)
- Instituto Nacional del Carbón(ES)
- AM Technologies (Poland)(PL)
- École Polytechnique Fédérale de Lausanne(CH)
- University of Cambridge(GB)
- University of Bayreuth(DE)
- Intel (United States)(US)
- Technische Universität Dresden(DE)
- Open Geospatial Consortium(GB)
- Sustainability Institute(ZA)