OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 19.04.2026, 17:12

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Automatic Generation of a Mechanical Properties Question-Answering Data Set for Language Model Benchmarking: A Comparative Study of BERT, XLNet, and LLaMA Models

2026·0 Zitationen·Journal of Chemical Information and ModelingOpen Access
Volltext beim Verlag öffnen

0

Zitationen

2

Autoren

2026

Jahr

Abstract

Contextualized language models offer new opportunities for mining materials-science information from literature, but progress is limited by the absence of domain-specific question-answering (QA) data sets. This study addresses this by introducing MechQA, a data set of 202,068 pairs of questions and answers about mechanical properties that have been automatically distilled from 125,967 articles in the literature. Unlike small manually curated QA benchmarks or approaches that rely on domain-specific pretraining, MechQA provides a large-scale, automatically generated training resource derived directly from the primary literature. It covers five fundamental mechanical properties of materials: ultimate tensile strength, yield strength, fracture strength, Young's modulus, and ductility. Manual evaluation of this data set confirmed its high quality (precision 83.76%, recall 89.09%, F1 score 86.34%). We apply MechQA to fine-tune three representative transformer models: two extractive models, BERT-base and XLNet-base, each with 110 M parameters, and a generative LLaMA-3.1-Instruct model with 8B parameters fine-tuned using low-rank adaptation (LoRA). The MechQA data set was partitioned into 181,722 training and 20,346 validation QA pairs for this application. On the validation set, domain-specific extractive models achieve strong Exact Match (EM) and F1 score performance (BERT: 78.03% EM/84.50% F1; XLNet: 78.21% EM/84.70% F1) with improved expected calibration error (ECE) of 7.98% and 6.25%, respectively, while the LLaMA-domain model achieves 80.48% EM/86.25% F1 with an ECE of 8.08%. Notably, the two extractive models exhibit competitive performance despite their significantly smaller parameter size compared to the LLaMA model. These results demonstrate that automatic QA data set generation, coupled with targeted fine-tuning, provides an effective data-centric method for domain adaptation of language models for materials science.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Machine Learning in Materials ScienceTopic ModelingArtificial Intelligence in Healthcare and Education
Volltext beim Verlag öffnen