Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.

Engineering evaluation of AI healthcare applications: Review mining, game-theoretic hybrid weight fusion, and GRA–TOPSIS-based alternative ranking

2026·0 Zitationen·Results in EngineeringOpen Access

Volltext beim Verlag öffnen

Zitationen

Autoren

2026

Jahr

Abstract

• An evidence-driven framework linked app reviews to decision-grade evaluation. • LDA extracted 3 themes and 18 indicators from 7,041 AIMCA reviews. • PSO-FAHP and improved CRITIC enabled robust game-theoretic hybrid weighting. • GRA-TOPSIS ranked AIMCA schemes and identified the best-performing option. • The proposed method outperformed FAHP and EWM in CR and MAE. As generative and embedded AI technologies continue to diffuse rapidly, the evaluation of AI applications is increasingly challenged by fragmented indicator sources, overly subjective weighting, and limited support for alternative ranking. The innovation of this study lies in the construction of a closed-loop evaluation chain for artificial intelligence medical conversational agents (AIMCAs), integrating review mining, indicator generation, hybrid weighting, and comprehensive ranking. On this basis, an evidence-driven evaluation framework for AI applications is developed. Drawing on recent market-based user feedback from app store reviews over the past six months, this study collected and cleaned 7,041 valid user reviews. Latent Dirichlet allocation (LDA) was used to identify three topics, and 18 evaluation indicators were subsequently derived by integrating review-mining results with the literature, expert knowledge, and standardized items. Expert-side subjective weights were then obtained using PSO–FAHP, while user-rating-side objective weights were calculated using an improved CRITIC method. Game-theoretic combination weighting and GRA–TOPSIS were further applied to compare the alternatives. The results showed that, under the current indicator system, combined weights, and evaluation data, the baseline priority ranking of the four AIMCA alternatives was C > D > A > B, with alternative C achieving the highest comprehensive closeness coefficient (η = 0.5575). Compared with conventional FAHP and the entropy weight method, the proposed method produced lower CR values and lower MAE relative to the internal compromise reference vector across 30 repeated experiments. Specifically, the CR was 0.020 ± 0.002, lower than that of conventional FAHP (0.034 ± 0.005), while the MAE was 0.019 ± 0.002, lower than those of conventional FAHP (0.031 ± 0.004) and the entropy weight method (0.048 ± 0.008). The average computation time was 1.017 ± 0.115 s. Sensitivity analysis further indicated that more than half of the weight-perturbation scenarios retained the baseline ranking, and ranking changes were mainly concentrated in a small number of highly sensitive indicators. Taken together, these results suggest that the proposed framework can transform recent user-side evidence into actionable evaluation indicators and alternative-priority results, while providing application-level evaluation evidence in terms of consistency, compromise closeness, and structural stability. This framework offers decision-support reference for the design optimization, alternative comparison, and subsequent validation of AI applications.

Autoren

Institutionen

Themen

Artificial Intelligence in Healthcare and EducationAI in Service InteractionsExplainable Artificial Intelligence (XAI)

Volltext beim Verlag öffnen

Engineering evaluation of AI healthcare applications: Review mining, game-theoretic hybrid weight fusion, and GRA–TOPSIS-based alternative ranking

Abstract

Ähnliche Arbeiten

Autoren

Institutionen

Themen