Date of Award
Doctor of Philosophy (PhD)
Jason H. Bates
The adoption of Deep neural networks for lung cancer screening has been shown to improve detection of malignant nodules in sequential scans and to reduce the screening time1,2. However, DNNs fail to properly classify images when applied only to single low-dose computed tomography scans on indeterminate pulmonary nodules (4mm – 20mm in diameter). In addition, the limited size of most medical data sets utilized for deep learning leads to network overfitting and poor performance, making it difficult to translate to clinical settings3,4. The limited size of most medical dataset means that DNNs have difficulty identifying and evaluating features of interest and fail to generalize to novel data. However, guiding a neural network toward biological features that are known to be pathophysiologically relevant may improve both classification accuracy and generalizability5,6. For example, idiopathic pulmonary fibrosis and emphysema are both associated with increased lung inflammation and are considered pre-malignant conditions7–9. The aim of this work is to determine the contribution of biologically relevant features associated with increased risk of malignancy and embed them into deep learning methodologies to evaluate their contributions toward classification and model generalizability. Relevant biological features are identified through three methodologies: least absolute shrinkage and selection operator (LASSO), support vector machine (SVM), and random forest (RF). Using these quantitative features as the basis for classification, we increase the emphasis of specific tumor features through kernel manipulations using discrete wavelet decomposition and evaluate whether DNNs place emphasis on these select features for nodule classification. Results suggest that quantitative parenchymal features carry significant classification information across all machine learning methodologies applied. Using a combination of parenchymal features together with tumor specific features significantly improves classification performance of these methodologies compared to using only tumor specific features. Furthermore, DNNs extract abstract features that resemble these biological features when evaluating attention maps of the network. Features capturing nodule maximal diameter alongside textural and morphological features appear to drive nodule classification. The use of discrete wavelet decomposition to embed simplified features into CNNs improves classification accuracy of the model and reduces training time. This demonstrates that guiding a DNN toward select features can improve its performance while minimizing overfitting. The findings suggest that known pathophysiologically relevant features can be encoded into DNNs to improve network classification and generalizability to novel data. Furthermore, evaluating models based on misclassified nodules provides avenues to identify over-emphasized features in the network and correct them through image preprocessing. Overall, this body of work addresses several challenges present in the application of DNNs for early nodule detection. The performance of the generated models for single-shot classification of indeterminate pulmonary nodules shows promise for deployment as clinician co-pilots. Further studies on risk-benefits of these models in clinical settings are necessary to ensure proper performance prior to translation.
Number of Pages
Masquelin, Axel Herve, "Leveraging Domain Specific Knowledge to classify Indeterminate Lung Nodules in CT images using Machine Learning Methodologies" (2023). Graduate College Dissertations and Theses. 1756.
Available for download on Wednesday, September 11, 2024