Identification of Burned-in Text Data in Low Resolution Medical Imaging Modalities using Deep Learning Technology
Medical Image character recognition (MICR) has become a useful application of optical character recognition (OCR) models due to the advancement in computing resources, large databases of medical image records and the need for an efficient information retrieval system for various needs. However, with the unique nature of medical image modalities (MIM) such as X-rays, Ultrasounds and Magnetic Resonance Imaging (MRI), where patients’ demographics and clinical examination data exist as burned-in text on the pixel content in small font sizes with overall image low-resolution, application of traditional OCR on these low-quality image results to a poor accuracy. The traditional OCRs cannot recognise these burned-in texts under these conditions, as they are designed for mainly bi-level text with resolutions of 150DPI and above and scanned documents with a minimum of 300DPI. In contrast, these MIM have a low resolution of 96 dpi. To solve these challenges, this thesis explores the application of deep learning techniques in the aspects of deterministic modelling, semantic similarity learning, and generative modelling to solve the problems in MICR, which are low resolution, small text, small sample size and background interference. This thesis developed an ensemble of Convolutional Neural Networks (CNN) inspired by the classical Lenet-5 architecture to recognise burned-in text at the character level. Experimental results show promising results when compared with the state of the art. Furthermore, to increase the character recognition rate of the CNN models when dealing with visually similar characters (VSC), this thesis proposed and designed a channel attention-based Siamese network to efficiently apply metric learning and few shot techniques on recognising VSC while training on small sample size per class. The evaluation showed that the Siamese network could discriminate between VSC in MIM compared to regular multi-class classifiers. To deal with the small sample size problem caused by privacy concerns when acquiring MIM for deep learning tasks, this thesis proposed, deployed, and evaluated a conditional variational autoencoder (CVAE) to generate synthetic image data. The evaluation shows improvement in the accuracy of deterministic models when trained with augmented images generated by the proposed CVAE model. To ensure the generability of this thesis's findings, two datasets were used for the evaluation: an open-source medical image dataset and a privately collected medical image dataset whose collection was approved by the University of Hertfordshire’s ethics committee. An accurate MICR solution can improve health data analytics by allowing a more accessible and accurate extraction of data from MIM. This can assist in analysing image data to identify patterns, thereby improving patient care and diagnosis.
Item Type | Thesis (Doctoral) |
---|---|
Keywords | Medical image character recognition, Siamese networks, conditional variational autoencoders, image analysis, medical imaging, low-resolution imaging |
Date Deposited | 25 Sep 2025 14:06 |
Last Modified | 25 Sep 2025 14:06 |