Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Automatic Generation of a Mechanical Properties Question-Answering Data Set for Language Model Benchmarking: A Comparative Study of BERT, XLNet, and LLaMA Models
0
Zitationen
2
Autoren
2026
Jahr
Abstract
Contextualized language models offer new opportunities for mining materials-science information from literature, but progress is limited by the absence of domain-specific question-answering (QA) data sets. This study addresses this by introducing MechQA, a data set of 202,068 pairs of questions and answers about mechanical properties that have been automatically distilled from 125,967 articles in the literature. Unlike small manually curated QA benchmarks or approaches that rely on domain-specific pretraining, MechQA provides a large-scale, automatically generated training resource derived directly from the primary literature. It covers five fundamental mechanical properties of materials: ultimate tensile strength, yield strength, fracture strength, Young's modulus, and ductility. Manual evaluation of this data set confirmed its high quality (precision 83.76%, recall 89.09%, F1 score 86.34%). We apply MechQA to fine-tune three representative transformer models: two extractive models, BERT-base and XLNet-base, each with 110 M parameters, and a generative LLaMA-3.1-Instruct model with 8B parameters fine-tuned using low-rank adaptation (LoRA). The MechQA data set was partitioned into 181,722 training and 20,346 validation QA pairs for this application. On the validation set, domain-specific extractive models achieve strong Exact Match (EM) and F1 score performance (BERT: 78.03% EM/84.50% F1; XLNet: 78.21% EM/84.70% F1) with improved expected calibration error (ECE) of 7.98% and 6.25%, respectively, while the LLaMA-domain model achieves 80.48% EM/86.25% F1 with an ECE of 8.08%. Notably, the two extractive models exhibit competitive performance despite their significantly smaller parameter size compared to the LLaMA model. These results demonstrate that automatic QA data set generation, coupled with targeted fine-tuning, provides an effective data-centric method for domain adaptation of language models for materials science.
Ähnliche Arbeiten
UCSF Chimera—A visualization system for exploratory research and analysis
2004 · 47.316 Zit.
AutoDock Vina: Improving the speed and accuracy of docking with a new scoring function, efficient optimization, and multithreading
2009 · 35.991 Zit.
Gaussian basis sets for use in correlated molecular calculations. I. The atoms boron through neon and hydrogen
1989 · 31.473 Zit.
The M06 suite of density functionals for main group thermochemistry, thermochemical kinetics, noncovalent interactions, excited states, and transition elements: two new functionals and systematic testing of four M06-class functionals and 12 other functionals
2007 · 29.571 Zit.
<i>VESTA 3</i> for three-dimensional visualization of crystal, volumetric and morphology data
2011 · 24.432 Zit.