Dies ist eine Übersichtsseite mit Metadaten zu dieser wissenschaftlichen Arbeit. Der vollständige Artikel ist beim Verlag verfügbar.
Accuracy of automated scoring of verbal paired associates in a remote data collection context
0
Zitationen
4
Autoren
2020
Jahr
Abstract
Abstract Background The validation of remote testing methodologies has increased in relevance, given the impact of COVID‐19 on clinical trials. Verbal cognitive testing, which requires skilled raters, has not previously been feasible for remote testing. In previous research we had demonstrated that remote verbal cognitive testing using automatic speech recognition (ASR) was possible and showed the expected pattern of results. Here, we manually score responses to a verbal paired associates (VPA) test and explore the impact participant‐level demographic and technology related factors on scoring accuracy. Methods From a pool of 5,742 recordings of participants aged 17–86 years, 150 were randomly selected for manual review (age 30–70, M = 52.5). Participants were all fluent English speakers, and completed the VPA test via a device‐agnostic web‐app on their own devices. We recorded participant demographics and information regarding the operating system, browser and device on which the tasks were completed. Manual scoring was completed off‐line by trained raters through the Neurovocalix system. Results There was excellent agreement between the human scoring and ASR, with a Spearman correlation of 0.93 (p<0.0001) and an ICC(A,1) agreement of 1 (F(148,148) = 6951, p = <0.0001). The distribution of ASR errors was skewed, with a median of 0 and a mean of 0.97. The maximum number of scoring errors was 10, observed in two cases, where the ASR system did not detect correct responses due to very high levels of background noise or poor audio quality. We found no significant effect of age, gender, education or device on scoring errors. Key themes reported by raters as affecting the scoring accuracy were 1) slowness in responding, meaning that not the whole word was recorded, 2) the presence of significant background noise or poor audio quality 3) certain accents leading to miss‐recognition of specific words. Conclusion These results represent a comprehensive evaluation of the accuracy of automated scoring of verbal responses from data collected in a remote context. Overall, there is excellent accuracy. These results both demonstrate the potential utility of this approach to remote data collection as well as suggesting avenues for increasing accuracy of automated scoring.
Ähnliche Arbeiten
Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI
2019 · 8.549 Zit.
Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead
2019 · 8.443 Zit.
High-performance medicine: the convergence of human and artificial intelligence
2018 · 7.941 Zit.
BioBERT: a pre-trained biomedical language representation model for biomedical text mining
2019 · 6.792 Zit.
Proceedings of the 19th International Joint Conference on Artificial Intelligence
2005 · 5.781 Zit.