Comparative analysis of machine learning algorithms for author age and gender identification

Data

2023-05-19

Embargo

Orientador

Coorientador

Título da revista

ISSN da revista

Título do volume

Editora

Springer
Idioma
Inglês

Projetos de investigação

Unidades organizacionais

Fascículo

Título Alternativo

Resumo

Author profiling is part of information retrieval in which different perspectives of the author are observed by considering various characteristics like native language, gender, and age. Different techniques are used to extract the required information using text analysis, like author identification on social media and for Short Text Message Service. Author profiling helps in security and blogs for identification purposes while capturing authors’ writing behaviors through messages, posts, comments, blogs, comments, and chat logs. Most of the work in this area has been done in English and other native languages. On the other hand, Roman Urdu is also getting attention for the author profiling task, but it needs to convert Roman-Urdu to English to extract important features like Named Entity Recognition (NER) and other linguistic features. The conversion may lose important information while having limitations in converting one language to another language. This research explores machine learning techniques that can be used for all languages to overcome the conversion limitation. The Vector Space Model (VSM) and Query Likelihood (Q.L.) are used to identify the author’s age and gender. Experimental results revealed that Q.L. produces better results in terms of accuracy.

Palavras-chave

Vector space model, Query likelihood model, Information retrieval (I.R.), Text mining, Author profiling

Tipo de Documento

conferenceObject

Versão da Editora

10.1007/978-981-19-9331-2_11

Dataset

Citação

Zainab, Z., Al-Obeidat, F., Moreira, F., Gul, H., & Amin, A. (2023). Comparative analysis of machine learning algorithms for author age and gender identification. In S. Anwar, A. Ullah, Á. Rocha, & M. J. Sousa (Eds.), Proceedings of International Conference on Information Technology and Applications (Lecture Notes in Networks and Systems, vol. 614), pp. 123-138. Springer. https://doi.org/10.1007/978-981-19-9331-2_11. Repositório Institucional UPT. http://hdl.handle.net/11328/4797

Identificadores


978-981-19-9331-2
978-981-19-9330-5

TID

Designação

Tipo de Acesso

Acesso Restrito

Apoio

Descrição