Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Efficient Classification of Human-Generated vs. Machine-Generated Text Using Lightweight Machine Learning Models

2025·0 Zitationen·International Journal of Artificial Intelligence Tools

Volltext beim Verlag öffnen

Zitationen

Autoren

2025

Jahr

Abstract

This study presents an efficient and interpretable approach to distinguishing human-authored text from machine-generated content using traditional machine learning techniques, thereby avoiding the computational demands of transformer-based classifiers. Two datasets were employed to ensure generalizability: (1) ROCStories narratives paired with continuations generated by FALCON-7B under three creativity settings, and (2) short news articles from The Indian Times and The Guardian continued by LLaMA-7B under identical settings. Preprocessing involved Minimal Text Cleaning (MTC) and Advanced Text Normalization (ATN), followed by feature extraction from TF-IDF, Part-of-Speech distributions, Named Entity Recognition (NER), readability indices, lexical richness, [Formula: see text]-gram frequencies, sentiment polarity, punctuation usage, and syntactic complexity. Random Forest (RF) consistently achieved top performance (accuracy up to 0.98, AUC/ROC up to 0.99), outperforming the Naïve Bayes baseline. To enhance transparency, SHAP-based explainability was applied, revealing that readability metrics, lexical richness, unigrams, and linguistic structures (POS and NER) were the strongest drivers of classification across both datasets. For comparison, GPT-4o and GPT-3.5-Turbo, tested in zero-shot mode, achieved a maximum accuracy of 0.68. These results highlight not only the robustness and computational efficiency of feature-engineered models but also their interpretability, suggesting their value as lightweight, transparent, and reliable components in decision-support systems where content authenticity verification is critical.

Autoren

Kian Jazayeri

Institutionen

Cyprus International University(CY)

Themen

Topic ModelingText Readability and SimplificationArtificial Intelligence in Healthcare and Education

Volltext beim Verlag öffnen

Efficient Classification of Human-Generated vs. Machine-Generated Text Using Lightweight Machine Learning Models

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen