SCH: Interpretable survival analysis of complex longitudinal data

Go to Project Site

Survival analysis is a statistical technique used to predict the time until specific events occur, such as hospitalization, mechanical part failure, or customer churn. Its applications in healthcare span across public health, clinical practice, and medical research. Clinicians face the challenge of integrating complex longitudinal data from various sources, including text, images, and lab values, collected at irregular intervals, to predict patient outcomes. Traditional survival analysis methods struggle with such data. This project aims to develop novel deep learning techniques, brain-inspired computer models to analyze complex data, tailored for this purpose. Importantly, these techniques will offer interpretability specific to the healthcare domain, bolstering users’ confidence in the predictions. Building upon prior work that successfully utilized X-rays and lab values to predict events like intubation, death, and ICU admission/discharge, this project will benchmark the new methods against crucial clinical applications. This interdisciplinary proposal brings together researchers specializing in computer science, biostatistics and cardiology to significantly enhance models for survival analysis in a crucial healthcare context. The research will yield new prediction methods and model interpretations, demonstrated on open datasets that explore healthcare challenges. Moreover, the proposal includes support for educational outreach programs centered around survival analysis. The investigators will collaborate with existing Cornell Tech outreach initiatives, targeting women and underrepresented minorities through partnerships with the City University of New York (CUNY) and the New York City Department of Education. By combining expertise, this project aims to drive innovation in survival analysis and promote inclusivity in STEM education.

Addressing large-scale real-world healthcare challenges with survival analysis necessitates solving complex issues related to data representation and modeling. While classical survival analysis methods like the Cox model are well-established, they do not inherently provide solutions for effective learning from long-term, irregular, and multi-modal inputs, particularly when interpretability is required. The core concept of this project revolves around a unified deep learning model for survival analysis, constructed using a Transformer backbone designed to handle complex longitudinal data. The project focuses on four key aims essential to this domain: (1) providing a unified feature representation for multi-modal data, (2) handling long-term irregularly spaced input, (3) supporting customized model interpretability through collaboration with domain experts, leveraging their expertise as soft priors, and (4) integrating this feature representation with more advanced survival analysis methodologies. The evaluation of these methods will primarily utilize the publicly available MIMIC dataset, consisting of critical care patient records. Additionally, the project team collaborates with clinicians working on heart failure, providing an additional dataset for applied evaluation.

NSF Award Number: 2306556

Start Date: August 1, 2023

End Date: July 31, 2027 (Estimated)

Total Intended Award Amount: $1,168,412.00

History of Investigator:
Michele Santacatterina (Principal Investigator) Yifan Peng (Co-Principal Investigator) Alexander Rush (Co-Principal Investigator) Jonathan Newman (Co-Principal Investigator)