ZeroBERTo: Leveraging Zero-Shot Text Classification by topic modeling
dc.contributor.author | Alcoforado, Alexandre | |
dc.contributor.author | Ferraz, Thomas Palmeira | |
dc.contributor.author | Gerber, Rodrigo | |
dc.contributor.author | Bustos, Enzo | |
dc.contributor.author | Oliveira, André Seidel | |
dc.contributor.author | Veloso, Bruno | |
dc.contributor.author | Siqueira, Fabio Levy | |
dc.contributor.author | Costa, Anna Helena Reali | |
dc.date.accessioned | 2022-07-29T12:13:12Z | |
dc.date.available | 2022-07-29T12:13:12Z | |
dc.date.issued | 2022-03 | |
dc.description.abstract | Traditional text classification approaches often require a good amount of labeled data, which is difficult to obtain, especially in restricted domains or less widespread languages. This lack of labeled data has led to the rise of low-resource methods, that assume low data availability in natural language processing. Among them, zero-shot learning stands out, which consists of learning a classifier without any previously labeled data. The best results reported with this approach use language models such as Transformers, but fall into two problems: high execution time and inability to handle long texts as input. This paper proposes a new model, ZeroBERTo, which leverages an unsupervised clustering step to obtain a compressed data representation before the classification task. We show that ZeroBERTo has better performance for long inputs and shorter execution time, outperforming XLM-R by about 12% in the F1 score in the FolhaUOL dataset. | pt_PT |
dc.identifier.citation | Alcoforado, A., Ferraz, T. P., Gerber, R., Bustos, E., Oliveira, A. S., Veloso, B., Siqueira, F. L., & Costa, A. H. R. (2022). ZeroBERTo: Leveraging Zero-Shot Text Classification by topic modeling. In V. Pinheiro, & P. Gamallo (Eds.), [Proceedings of] Computational Processing of the Portuguese Language: 15th International Conference, PROPOR 2022, Fortaleza, Brazil, March 21-23 2022, (pp. 125-136). ACM. https://doi.org/10.1007/978-3-030-98305-5_12. Repositório Institucional UPT. http://hdl.handle.net/11328/4379 | pt_PT |
dc.identifier.doi | https://doi.org/10.1007/978-3-030-98305-5_12 | pt_PT |
dc.identifier.isbn | 978-3-030-98304-8 | |
dc.identifier.uri | http://hdl.handle.net/11328/4379 | |
dc.language.iso | eng | pt_PT |
dc.peerreviewed | yes | pt_PT |
dc.publisher | ACM | pt_PT |
dc.rights | restricted access | pt_PT |
dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ | pt_PT |
dc.subject | Artificial intelligence | pt_PT |
dc.subject | Machine learning | pt_PT |
dc.subject | Natural language processing | pt_PT |
dc.subject | Learning paradigms | pt_PT |
dc.subject | Supervised learning | pt_PT |
dc.subject | Supervised learning by classification | pt_PT |
dc.title | ZeroBERTo: Leveraging Zero-Shot Text Classification by topic modeling | pt_PT |
dc.type | conferenceObject | pt_PT |
degois.publication.firstPage | 125 | pt_PT |
degois.publication.lastPage | 136 | pt_PT |
degois.publication.location | Fortaleza, brasil | pt_PT |
degois.publication.title | Proceedings of Computational Processing of the Portuguese Language: 15th International Conference, PROPOR 2022 | pt_PT |
dspace.entity.type | Publication | en |
Files
Original bundle
1 - 1 of 1