Comparing research trends through author-provided keywords with machine extracted terms: A ML algorithm approach using publications data on neurological disorders

Autores/as

DOI:

https://doi.org/10.47909/ijsmc.36

Resumen

Objective. This study aimed to identify the primary research areas, countries, and organizational involvement in publications on neurological disorders through an analysis of human-assigned keywords. These results were then compared with unsupervised and machine-algorithm-based extracted terms from the title and abstract of the publications to gain knowledge about deficiencies of both techniques. This has enabled us to understand how far machine-derived terms through titles and abstracts can be a substitute for human-assigned keywords of scientific research articles.

Design/Methodology/Approach. While significant research areas on neurological disorders were identified from the author-provided keywords of downloaded publications of Web of Science and PubMed, these results were compared by the terms extracted from titles and abstracts through unsupervised based models like VOSviewer and machine-algorithm-based techniques like YAKE and CounterVectorizer.

Results/Discussion. We observed that the post-covid-19 era witnessed more research on various neurological disorders, but authors still chose more generic terms in the keyword list than specific ones. The unsupervised extraction tool, like VOSviewer, identified many other extraneous and insignificant terms along with significant ones. However, our self-developed machine learning algorithm using CountVectorizer and YAKE provided precise results subject to adding more stop-words in the dictionary of the stop-word list of the NLTK tool kit.

Conclusion. We observed that although author provided keywords play a vital role as they are assigned in a broader sense by the author to increase readability, these concept terms lacked specificity for in-depth analysis. We suggested that the ML algorithm being more compatible with unstructured data was a valid alternative to the author-generated keywords for more accurate results.

Originality/Value. To our knowledge, this is the first-ever study that compared the results of author-provided keywords with machine-extracted terms with real datasets, which may be an essential lead in the machine learning domain. Replicating these techniques with large datasets from different fields may be a valuable knowledge resource for experts and stakeholders.

Descargas

Los datos de descargas todavía no están disponibles.

Biografía del autor/a

Priya Tiwari, Banaras Hindu University

Research Scholar, Dept. of Library & Information Science, Banaras Hindu University, Varanasi-221005, India

Saloni Chaudhary, Banaras Hindu University

Senior Research Scholar, Dept. of Library & Information Science, Banaras Hindu University, Varanasi-221005, India

Debasis Majhi, Banaras Hindu University

Research Scholar, Dept. of Library & Information Science, Banaras Hindu University, Varanasi-221005, India

Citas

Campos, R., Mangaravite, V., Pasquali, A., Jorge, A., Nunes, C., & Jatowt, A. (2020). YAKE! Keyword extraction from single documents using multiple local features. Information Sciences, 509, 257–289. doi: 10.1016/j.ins.2019.09.013 DOI: https://doi.org/10.1016/j.ins.2019.09.013

Cheng, Q., Wang, J., Lu, W., Huang, Y., & Bu, Y. (2020). Keyword-citation-keyword network: A new perspective of discipline knowledge structure analysis. Scientometrics, 124(3), 1923–1943. doi: 10.1007/s11192-020-03576-5 DOI: https://doi.org/10.1007/s11192-020-03576-5

Duvvuru, A., Radhakrishnan, S., More, D., Kamarthi, S., & Sultornsanee, S. (2013). Analyzing Structural & Temporal Characteristics of Keyword System in Academic Research Articles. Procedia Computer Science, 20, 439–445. doi: 10.1016/j.procs.2013.09.300 DOI: https://doi.org/10.1016/j.procs.2013.09.300

Graham, E. L., Clark, J. R., Orban, Z. S., Lim, P. H., Szymanski, A. L., Taylor, C., … Koralnik, I. J. (2021). Persistent neurologic symptoms and cognitive dysfunction in non-hospitalized Covid-19 “long haulers.” Annals of Clinical and Translational Neurology, 8(5), 1073–1085. doi: 10.1002/acn3.51350 DOI: https://doi.org/10.1002/acn3.51350

Huang, T.-Y., & Zhao, B. (2019). Measuring popularity of ecological topics in a temporal dynamical knowledge network. PLOS ONE, 14(1), e0208370. doi: 10.1371/journal.pone.0208370 DOI: https://doi.org/10.1371/journal.pone.0208370

Kevork, E. K., & Vrechopoulos, A. P. (2009). CRM literature: Conceptual and functional insights by keyword analysis. Marketing Intelligence & Planning, 27(1), 48–85. doi: 10.1108/02634500910928362 DOI: https://doi.org/10.1108/02634500910928362

Lu, W., Li, X., Liu, Z., & Cheng, Q. (2019). How do Author-Selected Keywords Function Semantically in Scientific Manuscripts?

Maurer, M. B., McCutcheon, S., & Schwing, T. (2011). Who’s Doing What? Findability and Author-Supplied ETD Metadata in the Library Catalog. Cataloging & Classification Quarterly, 49(4), 277–310. doi: 10.1080/01639374.2011.573440 DOI: https://doi.org/10.1080/01639374.2011.573440

Papagiannopoulou, E., & Tsoumakas, G. (2020). A review of keyphrase extraction. WIREs Data Mining and Knowledge Discovery, 10(2), e1339. doi: 10.1002/widm.1339 DOI: https://doi.org/10.1002/widm.1339

Quan, C., Wang, M., & Ren, F. (2014). An Unsupervised Text Mining Method for Relation Extraction from Biomedical Literature. PLOS ONE, 9(7), e102039. doi: 10.1371/journal.pone.0102039 DOI: https://doi.org/10.1371/journal.pone.0102039

Rothstein, T. L. (2023). Cortical Grey matter volume depletion links to neurological sequelae in post COVID-19 “long haulers.” BMC Neurology, 23(1), 22. doi: 10.1186/s12883-023-03049-1 DOI: https://doi.org/10.1186/s12883-023-03049-1

Roy, D., Ghosh, R., Dubey, S., Dubey, M. J., Benito-León, J., & Kanti Ray, B. (2021). Neurological and Neuropsychiatric Impacts of COVID-19 Pandemic. The Canadian Journal of Neurological Sciences. Le Journal Canadien Des Sciences Neurologiques, 48(1), 9–24. doi: 10.1017/cjn.2020.173 DOI: https://doi.org/10.1017/cjn.2020.173

Sarica, S., & Luo, J. (2021). Stopwords in technical language processing. PLOS ONE, 16(8), e0254937. doi: 10.1371/journal.pone.0254937 DOI: https://doi.org/10.1371/journal.pone.0254937

Sarker, I. H. (2021). Machine Learning: Algorithms, Real-World Applications and Research Directions. SN Computer Science, 2(3), 160. doi: 10.1007/s42979-021-00592-x DOI: https://doi.org/10.1007/s42979-021-00592-x

Small, H., Boyack, K. W., & Klavans, R. (2014). Identifying emerging topics in science and technology. Research Policy, 43(8), 1450–1467. doi: 10.1016/j.respol.2014.02.005 DOI: https://doi.org/10.1016/j.respol.2014.02.005

Uddin, S., Khan, A., Hossain, M. E., & Moni, M. A. (2019). Comparing different supervised machine learning algorithms for disease prediction. BMC Medical Informatics and Decision Making, 19(1), 281. doi: 10.1186/s12911-019-1004-8 DOI: https://doi.org/10.1186/s12911-019-1004-8

Wu, Y., Xu, X., Chen, Z., Duan, J., Hashimoto, K., Yang, L., … Yang, C. (2020). Nervous system involvement after infection with COVID-19 and other coronaviruses. Brain, Behavior, and Immunity, 87, 18–22. doi: 10.1016/j.bbi.2020.03.031 DOI: https://doi.org/10.1016/j.bbi.2020.03.031

Zamri, N., Pairan, M. A., Azman, W. N. A. W., Abas, S. S., Abdullah, L., Naim, S., … Gao, M. (2022). A comparison of unsupervised and supervised machine learning algorithms to predict water pollutions. Procedia Computer Science, 204, 172–179. doi: 10.1016/j.procs.2022.08.021 DOI: https://doi.org/10.1016/j.procs.2022.08.021

Zhang, J., Yu, Q., Zheng, F., Long, C., Lu, Z., & Duan, Z. (2016). Comparing keywords plus of WOS and author keywords: A case study of patient adherence research. Journal of the Association for Information Science and Technology, 67(4), 967–972. doi: 10.1002/asi.23437 DOI: https://doi.org/10.1002/asi.23437

Van Eck, N. J., & Waltman, L. (2018). VOSviewer Manual. https://www.vosviewer.com/ documentation/Manual_VOSviewer_1.6.9.pdf

Descargas

Publicado

2023-05-16

Cómo citar

Tiwari, P., Chaudhary, S., Majhi, D., & Mukherjee, B. (2023). Comparing research trends through author-provided keywords with machine extracted terms: A ML algorithm approach using publications data on neurological disorders. Iberoamerican Journal of Science Measurement and Communication, 3(1). https://doi.org/10.47909/ijsmc.36