OpenAlex · Aktualisierung stündlich · Letzte Aktualisierung: 05.05.2026, 04:34

Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Efficient Classification of Human-Generated vs. Machine-Generated Text Using Lightweight Machine Learning Models

2025·0 Zitationen·International Journal of Artificial Intelligence Tools
Volltext beim Verlag öffnen

0

Zitationen

1

Autoren

2025

Jahr

Abstract

This study presents an efficient and interpretable approach to distinguishing human-authored text from machine-generated content using traditional machine learning techniques, thereby avoiding the computational demands of transformer-based classifiers. Two datasets were employed to ensure generalizability: (1) ROCStories narratives paired with continuations generated by FALCON-7B under three creativity settings, and (2) short news articles from The Indian Times and The Guardian continued by LLaMA-7B under identical settings. Preprocessing involved Minimal Text Cleaning (MTC) and Advanced Text Normalization (ATN), followed by feature extraction from TF-IDF, Part-of-Speech distributions, Named Entity Recognition (NER), readability indices, lexical richness, [Formula: see text]-gram frequencies, sentiment polarity, punctuation usage, and syntactic complexity. Random Forest (RF) consistently achieved top performance (accuracy up to 0.98, AUC/ROC up to 0.99), outperforming the Naïve Bayes baseline. To enhance transparency, SHAP-based explainability was applied, revealing that readability metrics, lexical richness, unigrams, and linguistic structures (POS and NER) were the strongest drivers of classification across both datasets. For comparison, GPT-4o and GPT-3.5-Turbo, tested in zero-shot mode, achieved a maximum accuracy of 0.68. These results highlight not only the robustness and computational efficiency of feature-engineered models but also their interpretability, suggesting their value as lightweight, transparent, and reliable components in decision-support systems where content authenticity verification is critical.

Ähnliche Arbeiten

Autoren

Institutionen

Themen

Topic ModelingText Readability and SimplificationArtificial Intelligence in Healthcare and Education
Volltext beim Verlag öffnen