Sodet-synthetic oversampling decision trees

dc.contributor.authorCosta, Joaquim Fernando Pinto da
dc.contributor.authorAlonso, Hugo
dc.date.accessioned2026-04-20T16:23:10Z
dc.date.available2026-04-20T16:23:10Z
dc.date.issued2026-04-19
dc.description.abstractIn this work, we present a novel methodology for decision trees that use oversampling, not before tree construction (in the entire dataset), but inside each internal node (and corresponding input space region) of the tree. This strategy proves to be successful in fighting the greedy nature of decision trees. We take also into consideration the nature of the input variables, not just quantitative or binary, and also introduce the use of novel distances between instances that can also be used in other contexts. The application of our methodology to a significant number of datasets, thirteen, both balanced and imbalanced problems, shows the relevance of our approach when compared to CART and C5.0. Although our experiments were conducted on a standard computing platform, the proposed approach is well suited for high-performance computing environments, since node-level oversampling and distance computations can be efficiently parallelized, enabling the method to scale to large and high-dimensional datasets.
dc.identifier.citationCosta, J. F. P., & Alonso, H. (2026). Sodet-synthetic oversampling decision trees. Journal of Supercomputing, 82, 359, 1-33. https://doi.org/10.1007/s11227-026-08503-8. Repositório Institucional UPT. https://hdl.handle.net/11328/7088
dc.identifier.issn0920-8542
dc.identifier.issn1573-0484
dc.identifier.urihttps://hdl.handle.net/11328/7088
dc.language.isoeng
dc.publisherSpringer
dc.relation.hasversionhttps://doi.org/10.1007/s11227-026-08503-8
dc.rightsopen access
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subjectDecision trees
dc.subjectOversampling
dc.subjectSynthetic samples
dc.subjectData augmentation
dc.subjectRare classes
dc.subject.fosCiências Naturais - Ciências da Computação e da Informação
dc.subject.ods09 - industry, innovation and infrastructure
dc.titleSodet-synthetic oversampling decision trees
dc.typejournal article
dcterms.referenceshttps://link.springer.com/article/10.1007/s11227-026-08503-8#citeas
dspace.entity.typePublication
oaire.citation.endPage33
oaire.citation.issuePublished online: 19 April 2026
oaire.citation.startPage1
oaire.citation.titleJournal of Supercomputing
oaire.versionhttp://purl.org/coar/version/c_970fb48d4fbd8a85
person.affiliation.nameREMIT – Research on Economics, Management and Information Technologies
person.familyNameAlonso
person.givenNameHugo
person.identifier.ciencia-id5515-E921-89CF
person.identifier.orcid0000-0002-1599-5392
relation.isAuthorOfPublication0ad608d5-a1b0-42d1-88ab-17984e08e866
relation.isAuthorOfPublication.latestForDiscovery0ad608d5-a1b0-42d1-88ab-17984e08e866

Ficheiros

Principais

A mostrar 1 - 1 de 1
A carregar...
Miniatura
Nome:
Costa_et_al-2026-The_Journal_of_Supercomputing.pdf
Tamanho:
367.23 KB
Formato:
Adobe Portable Document Format