Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Engineering evaluation of AI healthcare applications: Review mining, game-theoretic hybrid weight fusion, and GRA–TOPSIS-based alternative ranking
0
Zitationen
7
Autoren
2026
Jahr
Abstract
• An evidence-driven framework linked app reviews to decision-grade evaluation. • LDA extracted 3 themes and 18 indicators from 7,041 AIMCA reviews. • PSO-FAHP and improved CRITIC enabled robust game-theoretic hybrid weighting. • GRA-TOPSIS ranked AIMCA schemes and identified the best-performing option. • The proposed method outperformed FAHP and EWM in CR and MAE. As generative and embedded AI technologies continue to diffuse rapidly, the evaluation of AI applications is increasingly challenged by fragmented indicator sources, overly subjective weighting, and limited support for alternative ranking. The innovation of this study lies in the construction of a closed-loop evaluation chain for artificial intelligence medical conversational agents (AIMCAs), integrating review mining, indicator generation, hybrid weighting, and comprehensive ranking. On this basis, an evidence-driven evaluation framework for AI applications is developed. Drawing on recent market-based user feedback from app store reviews over the past six months, this study collected and cleaned 7,041 valid user reviews. Latent Dirichlet allocation (LDA) was used to identify three topics, and 18 evaluation indicators were subsequently derived by integrating review-mining results with the literature, expert knowledge, and standardized items. Expert-side subjective weights were then obtained using PSO–FAHP, while user-rating-side objective weights were calculated using an improved CRITIC method. Game-theoretic combination weighting and GRA–TOPSIS were further applied to compare the alternatives. The results showed that, under the current indicator system, combined weights, and evaluation data, the baseline priority ranking of the four AIMCA alternatives was C > D > A > B, with alternative C achieving the highest comprehensive closeness coefficient (η = 0.5575). Compared with conventional FAHP and the entropy weight method, the proposed method produced lower CR values and lower MAE relative to the internal compromise reference vector across 30 repeated experiments. Specifically, the CR was 0.020 ± 0.002, lower than that of conventional FAHP (0.034 ± 0.005), while the MAE was 0.019 ± 0.002, lower than those of conventional FAHP (0.031 ± 0.004) and the entropy weight method (0.048 ± 0.008). The average computation time was 1.017 ± 0.115 s. Sensitivity analysis further indicated that more than half of the weight-perturbation scenarios retained the baseline ranking, and ranking changes were mainly concentrated in a small number of highly sensitive indicators. Taken together, these results suggest that the proposed framework can transform recent user-side evidence into actionable evaluation indicators and alternative-priority results, while providing application-level evaluation evidence in terms of consistency, compromise closeness, and structural stability. This framework offers decision-support reference for the design optimization, alternative comparison, and subsequent validation of AI applications.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.774 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.685 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 8.244 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.898 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.