Imputation strategies for interval-censored data: From AFT models to machine learning and scaled redistribution

Data

2026-03-06

Embargo

Orientador

Coorientador

Título da revista

ISSN da revista

Título do volume

Editora

AIMS Press
Idioma
Inglês

Projetos de investigação

Unidades organizacionais

Fascículo

Título Alternativo

Resumo

Interval-censored data pose challenges in survival analysis because event times are only known to occur within observation intervals. Traditional strategies, such as midpoint imputation, often fail to capture the uncertainty inherent to this censoring. This study compares classical, model-based, and machine learning approaches for imputing interval-censored event times. Specifically, we evaluate (ⅰ) standard midpoint imputation, (ⅱ) accelerated failure time (AFT) model–based imputation, (ⅲ) a machine learning method using XGBoost, and (ⅳ) a new scaled linear redistribution method that constrains model-based imputations within censoring bounds while preserving their relative variability. A comprehensive simulation study under varying levels of right censoring was carried out to assess bias, accuracy, and concordance. Three real datasets were then analyzed to illustrate the practical behavior of the imputation methods. Results show that the XGBoost-based imputation shows stable performance across the different censoring scenarios considered, yielding survival estimates close to those of the nonparametric Turnbull estimator. The midpoint method performs adequately when intervals are short or censoring is mild, whereas parametric models are more sensitive to distributional assumptions and may yield biased estimates under heavy censoring. Analyses of real data further revealed greater variability among parametric models under high right censoring and a flattening of survival curves when censoring occurs, mainly at long event times. The proposed scaled linear redistribution method provides a way to map model-based predictions back to their observed censoring intervals while retaining their relative dispersion. The methods considered display complementary strengths across censoring regimes, with no single approach uniformly dominating.

Palavras-chave

interval-censored data, machine learning, XGBoost, imputation methods

Tipo de Documento

Artigo

Citação

Soutinho, G., & Meira-Machado, L. (2026). Imputation strategies for interval-censored data: From AFT models to machine learning and scaled redistribution. AIMS Mathematics, 11(3), 5719-5737. https://doi.org/10.3934/math.2026235. Repositório Institucional UPT. https://hdl.handle.net/11328/7000

Identificadores

TID

Designação

Tipo de Acesso

Acesso Aberto

Apoio

Descrição