Sodet-synthetic oversampling decision trees
| dc.contributor.author | Costa, Joaquim Fernando Pinto da | |
| dc.contributor.author | Alonso, Hugo | |
| dc.date.accessioned | 2026-04-20T16:23:10Z | |
| dc.date.available | 2026-04-20T16:23:10Z | |
| dc.date.issued | 2026-04-19 | |
| dc.description.abstract | In this work, we present a novel methodology for decision trees that use oversampling, not before tree construction (in the entire dataset), but inside each internal node (and corresponding input space region) of the tree. This strategy proves to be successful in fighting the greedy nature of decision trees. We take also into consideration the nature of the input variables, not just quantitative or binary, and also introduce the use of novel distances between instances that can also be used in other contexts. The application of our methodology to a significant number of datasets, thirteen, both balanced and imbalanced problems, shows the relevance of our approach when compared to CART and C5.0. Although our experiments were conducted on a standard computing platform, the proposed approach is well suited for high-performance computing environments, since node-level oversampling and distance computations can be efficiently parallelized, enabling the method to scale to large and high-dimensional datasets. | |
| dc.identifier.citation | Costa, J. F. P., & Alonso, H. (2026). Sodet-synthetic oversampling decision trees. Journal of Supercomputing, 82, 359, 1-33. https://doi.org/10.1007/s11227-026-08503-8. Repositório Institucional UPT. https://hdl.handle.net/11328/7088 | |
| dc.identifier.issn | 0920-8542 | |
| dc.identifier.issn | 1573-0484 | |
| dc.identifier.uri | https://hdl.handle.net/11328/7088 | |
| dc.language.iso | eng | |
| dc.publisher | Springer | |
| dc.relation.hasversion | https://doi.org/10.1007/s11227-026-08503-8 | |
| dc.rights | open access | |
| dc.rights.uri | http://creativecommons.org/licenses/by/4.0/ | |
| dc.subject | Decision trees | |
| dc.subject | Oversampling | |
| dc.subject | Synthetic samples | |
| dc.subject | Data augmentation | |
| dc.subject | Rare classes | |
| dc.subject.fos | Ciências Naturais - Ciências da Computação e da Informação | |
| dc.subject.ods | 09 - industry, innovation and infrastructure | |
| dc.title | Sodet-synthetic oversampling decision trees | |
| dc.type | journal article | |
| dcterms.references | https://link.springer.com/article/10.1007/s11227-026-08503-8#citeas | |
| dspace.entity.type | Publication | |
| oaire.citation.endPage | 33 | |
| oaire.citation.issue | Published online: 19 April 2026 | |
| oaire.citation.startPage | 1 | |
| oaire.citation.title | Journal of Supercomputing | |
| oaire.version | http://purl.org/coar/version/c_970fb48d4fbd8a85 | |
| person.affiliation.name | REMIT – Research on Economics, Management and Information Technologies | |
| person.familyName | Alonso | |
| person.givenName | Hugo | |
| person.identifier.ciencia-id | 5515-E921-89CF | |
| person.identifier.orcid | 0000-0002-1599-5392 | |
| relation.isAuthorOfPublication | 0ad608d5-a1b0-42d1-88ab-17984e08e866 | |
| relation.isAuthorOfPublication.latestForDiscovery | 0ad608d5-a1b0-42d1-88ab-17984e08e866 |
Ficheiros
Principais
1 - 1 de 1
A carregar...
- Nome:
- Costa_et_al-2026-The_Journal_of_Supercomputing.pdf
- Tamanho:
- 367.23 KB
- Formato:
- Adobe Portable Document Format