Text as data: narrative mining of non-collision injury incidents on public buses by structural topic modeling
Published in Travel Behaviour and Society, 2024
Recommended citation: Xu, P., Wang, Q., Ye, Y., Wong, S.C., & Zhou, H.* (2025). "Text as data: narrative mining of non-collision injury incidents on public buses by structural topic modeling." Travel Behaviour and Society, 39, 100981. https://doi.org/10.1016/j.tbs.2024.100981
Introduction: Although numerous studies have investigated collisions involving public buses, there has been inadequate research on passenger injuries caused by non-collision incidents on public buses. One major obstacle is that the manual extraction of thematic information from massive document repositories is exceedingly labor intensive, cumbersome, and inaccurate. Our study thereby illustrated how to automatically characterize non-collision injury incidents on public buses by fusing advanced language processing techniques and large-scale incident reports. Methods: Based on the 12,823 textural narratives recorded by police during 2010-2019 in Hong Kong, the structural topic modeling was developed to uncover underlying themes, quantify topic prevalence, and portray complex interconnectedness. Results: Thirty-three topics were successfully labeled, with the topic stand and lost balance being the most prevalent. Non-collisions were more likely to result in serious consequences when incidents occurred because the bus skidded, when a passenger was boarding, and when a standing passenger lost the balance. Six unique patterns were uncovered, i.e., the failure to hold handrails accompanied by inappropriate behaviors of bus drivers when approaching bus stations, loss of balance among standing passengers due to the sharp braking of bus drivers in response to red traffic lights ahead, alighting passengers being hit by the door, passengers falling while climbing staircases, passengers being injured because of bus driver’s emergency maneuvers to avoid collisions with nearside pedestrians, and passengers being injured due to the careless lane-changing of bus drivers when weaving through roundabouts. Conclusions: By leveraging the emerging text mining techniques, unstructured narratives written by the police can provide valuable and organized information for regular injury surveillance. Tailor-made countermeasures were proposed to prevent non-collision injury incidents on public buses.