Modeling Traffic Crash Severity in Complex Transportation Systems: An Efficient and Interpretable Tabular Learning Framework Under Class Imbalance

Published in Systems, 2026

Recommended citation: Li, Z., Cao, S., Miao, T., Fang, B. & Ye, Y.* (2026). "Modeling Traffic Crash Severity in Complex Transportation Systems: An Efficient and Interpretable Tabular Learning Framework Under Class Imbalance." Systems, in press.

Accurately predicting traffic crash severity is critical for intelligent transportation systems, where outcomes emerge from the interaction of infrastructure, environment, traffic control, and human behavior. However, existing approaches face three key challenges: severe class imbalance, computational inefficiency, and limited support for system-level risk understanding. To address these issues, this study proposes a unified and system-aware framework integrating Conditional Tabular Generative Adversarial Network (CTGAN), Tabular Prior-data Fitted Network (TabPFN), and eXplainable Artificial Intelligence (XAI) methods for data augmentation, efficient prediction, and interpretable analysis. CTGAN enhances rare but critical crash states while preserving feature dependencies; TabPFN enables accurate multi-class prediction with limited dataset-specific tuning; and XAI methods quantify the influence of key factors and their interactions. Experiments on a real-world crash dataset from Boston show that the proposed framework achieves competitive predictive performance with less reliance on dataset-specific hyperparameter tuning, while also providing complementary interpretability results from multiple perspectives. The results further reveal that crash severity is jointly shaped by visibility, traffic control, roadside features, and temporal dynamics, highlighting the interconnected nature of risk within the transportation system. By integrating predictive modeling with complementary interpretability analysis, the framework provides a systems-oriented basis for examining how environmental, infrastructural, and temporal conditions jointly relate to crash severity in the studied urban crash data, while offering a methodological reference for broader safety applications that require further validation.