Data preparation for clinical data mining to identify patients at risk of readmission
Steadily rising numbers of emergency (unplanned) inpatient admissions have been the major source of pressure on the NHS over the past twenty years. There is currently still a strong need for a consistent predictive tool and an automation of the development of re-admission risk profiles, in particular, one that addresses both data preparation and predictive modelling. This paper proposes a data preparation framework for transforming raw transactional clinical data to well-formed data sets so that data mining can be applied. In this framework, rules are created according to the statistical characteristics of the data, the metadata that characterises the host information systems and medical knowledge. These rules can be used for data pre-processing, attribute selection and data transformation in order to generate appropriately prepared data sets. The proposed data preparation framework incorporates automatic methods with heuristic pre-processing treatments for the potential challenges within a large-scaled development and its applicability is not limited to clinical data.