Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Research on ChatGPT generated text detection model based on phonetic feature extraction and semantic features
0
Zitationen
7
Autoren
2026
Jahr
Abstract
With the widespread adoption of large language models such as ChatGPT, distinguishing AI-generated text from human-written content has become increasingly challenging. Existing detection methods often rely solely on semantic representations and exhibit limited robustness, particularly when texts are paraphrased or rewritten. This study proposes an integrated detection framework that combines contextual semantic embeddings with auxiliary surface-level features, including pronunciation-related textual cues and handcrafted statistical descriptors. Specifically, a RoBERTa encoder is employed to capture deep contextual semantics, while a convolutional neural network aggregates multi-scale representations. In parallel, a set of text-derived structural, lexical, and readability features-serving as proxies for phonetic and stylistic regularities-are incorporated to enrich the representation space. Rather than introducing a fundamentally new detection paradigm, the proposed approach emphasizes feature-level fusion and systematic empirical evaluation. Experiments on the HAGTC dataset and a ChatGPT-written abstract dataset show that the proposed RoBERTa-CNN framework consistently outperforms several strong baselines in terms of accuracy and F1 score. Notably, the model demonstrates improved robustness in detecting rewritten AI-generated texts. Ablation studies further confirm that integrating multiple feature types significantly enhances detection performance. These results indicate that combining contextual representations with auxiliary surface features offers a practical and effective direction for AI-generated text detection.