Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly

dc.contributor.authorGarcía-Méndez, Silvia
dc.contributor.authorMalheiro, Benedita
dc.contributor.authorBurguillo-Rial, Juan Carlos
dc.contributor.authorVeloso, Bruno
dc.contributor.authorChis, Adriana E.
dc.contributor.authorGonzález-Vélez, Horacio
dc.contributor.authorLeal, Fátima
dc.date.accessioned2022-06-27T10:56:39Z
dc.date.available2022-06-27T10:56:39Z
dc.date.issued2022-06
dc.description.abstractData crowdsourcing is a data acquisition process where groups of voluntary contributors feed platforms with highly relevant data ranging from news, comments, and media to knowledge and classifications. It typically processes user-generated data streams to provide and refine popular services such as wikis, collaborative maps, e-commerce sites, and social networks. Nevertheless, this modus operandi raises severe concerns regarding ill-intentioned data manipulation in adversarial environments. This paper presents a simulation, modelling, and classification approach to automatically identify human and non-human (bots) as well as benign and malign contributors by using data fabrication to balance classes within experimental data sets, data stream modelling to build and update contributor profiles and, finally, autonomic data stream classification. By employing WikiVoyage – a free worldwide wiki travel guide open to contribution from the general public – as a testbed, our approach proves to significantly boost the confidence and quality of the classifier by using a class-balanced data stream, comprising both real and synthetic data. Our empirical results show that the proposed method distinguishes between benign and malign bots as well as human contributors with a classification accuracy of up to 92 %.pt_PT
dc.identifier.citationGarcía-Méndez, S., Leal, F., Malheiro, B., Burguillo-Rial, J. C., Veloso, B., Chis, A. E., & González-Vélez, H. (2022). Simulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the ugly. Simulation Modelling Practice and Theory, 120, 102616, 1-13. https://doi.org/10.1016/j.simpat.2022.102616. Repositório Institucional UPT. http://hdl.handle.net/11328/4289pt_PT
dc.identifier.doihttps://doi.org/10.1016/j.simpat.2022.102616pt_PT
dc.identifier.issn1569-190X (Print)
dc.identifier.urihttp://hdl.handle.net/11328/4289
dc.language.isoengpt_PT
dc.peerreviewedyespt_PT
dc.publisherElsevierpt_PT
dc.rightsopen accesspt_PT
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/pt_PT
dc.subjectClassificationpt_PT
dc.subjectData reliabilitypt_PT
dc.subjectStream processingpt_PT
dc.subjectSynthetic datapt_PT
dc.subjectData fabricationpt_PT
dc.subjectWiki contributorspt_PT
dc.titleSimulation, modelling and classification of wiki contributors: Spotting the good, the bad, and the uglypt_PT
dc.typejournal articlept_PT
degois.publication.firstPage1pt_PT
degois.publication.lastPage13pt_PT
degois.publication.titleSimulation Modelling Practice and Theorypt_PT
degois.publication.volume120pt_PT
dspace.entity.typePublicationen
person.affiliation.nameREMIT – Research on Economics, Management and Information Technologies
person.familyNameLeal
person.givenNameFátima
person.identifier.ciencia-id2211-3EC7-B4B6
person.identifier.orcid0000-0003-4418-2590
person.identifier.ridY-3460-2019
person.identifier.scopus-author-id57190765181
relation.isAuthorOfPublication8066078f-1e30-4b0a-aa84-3b6a2af4185c
relation.isAuthorOfPublication.latestForDiscovery8066078f-1e30-4b0a-aa84-3b6a2af4185c

Files

Original bundle

Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
SIMPAT 2022.pdf
Size:
1023.84 KB
Format:
Adobe Portable Document Format
Loading...
Thumbnail Image
Name:
Imagem1.png
Size:
219.08 KB
Format:
Portable Network Graphics