Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
CMPhysBench: A Benchmark for Evaluating Large Language Models in Condensed Matter Physics
0
Zitationen
35
Autoren
2025
Jahr
Abstract
We introduce CMPhysBench, designed to assess the proficiency of Large Language Models (LLMs) in Condensed Matter Physics, as a novel Benchmark. CMPhysBench is composed of more than 520 graduate-level meticulously curated questions covering both representative subfields and foundational theoretical frameworks of condensed matter physics, such as magnetism, superconductivity, strongly correlated systems, etc. To ensure a deep understanding of the problem-solving process,we focus exclusively on calculation problems, requiring LLMs to independently generate comprehensive solutions. Meanwhile, leveraging tree-based representations of expressions, we introduce the Scalable Expression Edit Distance (SEED) score, which provides fine-grained (non-binary) partial credit and yields a more accurate assessment of similarity between prediction and ground-truth. Our results show that even the best models, Grok-4, reach only 36 average SEED score and 28% accuracy on CMPhysBench, underscoring a significant capability gap, especially for this practical and frontier domain relative to traditional physics. The code anddataset are publicly available at https://github.com/CMPhysBench/CMPhysBench.
Ähnliche Arbeiten
UCSF Chimera—A visualization system for exploratory research and analysis
2004 · 47.120 Zit.
AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading
2009 · 35.688 Zit.
Gaussian basis sets for use in correlated molecular calculations. I. The atoms boron through neon and hydrogen
1989 · 31.349 Zit.
The M06 suite of density functionals for main group thermochemistry, thermochemical kinetics, noncovalent interactions, excited states, and transition elements: two new functionals and systematic testing of four M06-class functionals and 12 other functionals
2007 · 29.429 Zit.
<i>VESTA 3</i> for three-dimensional visualization of crystal, volumetric and morphology data
2011 · 24.218 Zit.
Autoren
- W. Wang
- Dongchen Huang
- Jiatong Li
- Ting Yang
- Ziyang Zheng
- Di Zhang
- Dong Han
- Benteng Chen
- Bing Luo
- Zhiyu Liu
- Kunling Liu
- Zhiyuan Gao
- Sinong Geng
- Wei Ma
- Juan Su
- Xin Li
- Shouzhi Pu
- Y. Shui
- Qianjia Cheng
- Zhiyuan Dou
- Deliang Cui
- C.L. He
- Jin Zeng
- Zeke Xie
- Mao Su
- Dongzhan Zhou
- Yuqiang Li
- Wanli Ouyang
- Yunqi Cai
- Xi Dai
- Shufei Zhang
- Lei Bai
- Jinguang Cheng
- Fang Zhong
- Hongming Weng