ZeroBERTo: Leveraging Zero-Shot Text Classification by topic modeling

dc.contributor.authorAlcoforado, Alexandre
dc.contributor.authorFerraz, Thomas Palmeira
dc.contributor.authorGerber, Rodrigo
dc.contributor.authorBustos, Enzo
dc.contributor.authorOliveira, André Seidel
dc.contributor.authorVeloso, Bruno
dc.contributor.authorSiqueira, Fabio Levy
dc.contributor.authorCosta, Anna Helena Reali
dc.date.accessioned2022-07-29T12:13:12Z
dc.date.available2022-07-29T12:13:12Z
dc.date.issued2022-03
dc.description.abstractTraditional text classification approaches often require a good amount of labeled data, which is difficult to obtain, especially in restricted domains or less widespread languages. This lack of labeled data has led to the rise of low-resource methods, that assume low data availability in natural language processing. Among them, zero-shot learning stands out, which consists of learning a classifier without any previously labeled data. The best results reported with this approach use language models such as Transformers, but fall into two problems: high execution time and inability to handle long texts as input. This paper proposes a new model, ZeroBERTo, which leverages an unsupervised clustering step to obtain a compressed data representation before the classification task. We show that ZeroBERTo has better performance for long inputs and shorter execution time, outperforming XLM-R by about 12% in the F1 score in the FolhaUOL dataset.pt_PT
dc.identifier.citationAlcoforado, A., Ferraz, T. P., Gerber, R., Bustos, E., Oliveira, A. S., Veloso, B., Siqueira, F. L., & Costa, A. H. R. (2022). ZeroBERTo: Leveraging Zero-Shot Text Classification by topic modeling. In V. Pinheiro, & P. Gamallo (Eds.), [Proceedings of] Computational Processing of the Portuguese Language: 15th International Conference, PROPOR 2022, Fortaleza, Brazil, March 21-23 2022, (pp. 125-136). ACM. https://doi.org/10.1007/978-3-030-98305-5_12. Repositório Institucional UPT. http://hdl.handle.net/11328/4379pt_PT
dc.identifier.doihttps://doi.org/10.1007/978-3-030-98305-5_12pt_PT
dc.identifier.isbn978-3-030-98304-8
dc.identifier.urihttp://hdl.handle.net/11328/4379
dc.language.isoengpt_PT
dc.peerreviewedyespt_PT
dc.publisherACMpt_PT
dc.rightsrestricted accesspt_PT
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/pt_PT
dc.subjectArtificial intelligencept_PT
dc.subjectMachine learningpt_PT
dc.subjectNatural language processingpt_PT
dc.subjectLearning paradigmspt_PT
dc.subjectSupervised learningpt_PT
dc.subjectSupervised learning by classificationpt_PT
dc.titleZeroBERTo: Leveraging Zero-Shot Text Classification by topic modelingpt_PT
dc.typeconferenceObjectpt_PT
degois.publication.firstPage125pt_PT
degois.publication.lastPage136pt_PT
degois.publication.locationFortaleza, brasilpt_PT
degois.publication.titleProceedings of Computational Processing of the Portuguese Language: 15th International Conference, PROPOR 2022pt_PT
dspace.entity.typePublicationen

Files

Original bundle

Now showing 1 - 1 of 1
Name:
2201.01337.pdf
Size:
369.43 KB
Format:
Adobe Portable Document Format