ZeroBERTo: Leveraging Zero-Shot Text Classification by topic modeling
Date
2022-03
Embargo
Advisor
Coadvisor
Journal Title
Journal ISSN
Volume Title
Publisher
ACM
Language
English
Alternative Title
Abstract
Traditional text classification approaches often require a good amount of labeled data, which is difficult to obtain, especially in restricted domains or less widespread languages. This lack of labeled data has led to the rise of low-resource methods, that assume low data availability in natural language processing. Among them, zero-shot learning stands out, which consists of learning a classifier without any previously labeled data. The best results reported with this approach use language models such as Transformers, but fall into two problems: high execution time and inability to handle long texts as input. This paper proposes a new model, ZeroBERTo, which leverages an unsupervised clustering step to obtain a compressed data representation before the classification task. We show that ZeroBERTo has better performance for long inputs and shorter execution time, outperforming XLM-R by about 12% in the F1 score in the FolhaUOL dataset.
Keywords
Artificial intelligence, Machine learning, Natural language processing, Learning paradigms, Supervised learning, Supervised learning by classification
Document Type
conferenceObject
Publisher Version
Dataset
Citation
Alcoforado, A., Ferraz, T. P., Gerber, R., Bustos, E., Oliveira, A. S., Veloso, B., Siqueira, F. L., & Costa, A. H. R. (2022). ZeroBERTo: Leveraging Zero-Shot Text Classification by topic modeling. In V. Pinheiro, & P. Gamallo (Eds.), [Proceedings of] Computational Processing of the Portuguese Language: 15th International Conference, PROPOR 2022, Fortaleza, Brazil, March 21-23 2022, (pp. 125-136). ACM. https://doi.org/10.1007/978-3-030-98305-5_12. Repositório Institucional UPT. http://hdl.handle.net/11328/4379
Identifiers
TID
Designation
Access Type
Restricted Access