Comparative analysis of machine learning algorithms for author age and gender identification

dc.contributor.authorZainab, Zarah
dc.contributor.authorAl-Obeidat, Feras
dc.contributor.authorGul, Haji
dc.contributor.authorAmin, Adnan
dc.contributor.authorMoreira, Fernando
dc.date.accessioned2023-05-25T09:50:09Z
dc.date.available2023-05-25T09:50:09Z
dc.date.issued2023-05-19
dc.description.abstractAuthor profiling is part of information retrieval in which different perspectives of the author are observed by considering various characteristics like native language, gender, and age. Different techniques are used to extract the required information using text analysis, like author identification on social media and for Short Text Message Service. Author profiling helps in security and blogs for identification purposes while capturing authors’ writing behaviors through messages, posts, comments, blogs, comments, and chat logs. Most of the work in this area has been done in English and other native languages. On the other hand, Roman Urdu is also getting attention for the author profiling task, but it needs to convert Roman-Urdu to English to extract important features like Named Entity Recognition (NER) and other linguistic features. The conversion may lose important information while having limitations in converting one language to another language. This research explores machine learning techniques that can be used for all languages to overcome the conversion limitation. The Vector Space Model (VSM) and Query Likelihood (Q.L.) are used to identify the author’s age and gender. Experimental results revealed that Q.L. produces better results in terms of accuracy.pt_PT
dc.identifier.citationZainab, Z., Al-Obeidat, F., Moreira, F., Gul, H., & Amin, A. (2023). Comparative analysis of machine learning algorithms for author age and gender identification. In S. Anwar, A. Ullah, Á. Rocha, & M. J. Sousa (Eds.), Proceedings of International Conference on Information Technology and Applications (Lecture Notes in Networks and Systems, vol. 614), pp. 123-138. Springer. https://doi.org/10.1007/978-981-19-9331-2_11. Repositório Institucional UPT. http://hdl.handle.net/11328/4797pt_PT
dc.identifier.doi10.1007/978-981-19-9331-2_11pt_PT
dc.identifier.isbn978-981-19-9331-2
dc.identifier.isbn978-981-19-9330-5
dc.identifier.urihttp://hdl.handle.net/11328/4797
dc.language.isoengpt_PT
dc.peerreviewedyespt_PT
dc.publisherSpringerpt_PT
dc.relation.ispartofseriesLecture Notes in Networks and Systems;
dc.relation.publisherversionhttps://link.springer.com/chapter/10.1007/978-981-19-9331-2_11pt_PT
dc.rightsrestricted accesspt_PT
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/pt_PT
dc.subjectVector space modelpt_PT
dc.subjectQuery likelihood modelpt_PT
dc.subjectInformation retrieval (I.R.)pt_PT
dc.subjectText miningpt_PT
dc.subjectAuthor profilingpt_PT
dc.titleComparative analysis of machine learning algorithms for author age and gender identificationpt_PT
dc.typeconferenceObjectpt_PT
degois.publication.firstPage123pt_PT
degois.publication.lastPage138pt_PT
degois.publication.titleProceedings of International Conference on Information Technology and Applicationspt_PT
degois.publication.volume614pt_PT
dspace.entity.typePublicationen
person.affiliation.nameUniversidade Portucalense
person.familyNameMoreira
person.givenNameFernando
person.identifier.ciencia-id7B1C-3A29-9861
person.identifier.orcid0000-0002-0816-1445
person.identifier.ridP-9673-2016
person.identifier.scopus-author-id8649758400
relation.isAuthorOfPublicationbad3408c-ee33-431e-b9a6-cb778048975e
relation.isAuthorOfPublication.latestForDiscoverybad3408c-ee33-431e-b9a6-cb778048975e

Files

Original bundle

Now showing 1 - 1 of 1
Name:
P93.pdf
Size:
266.39 KB
Format:
Adobe Portable Document Format