Abstract

This thesis is divided into two parts. The first part is about non-redundant clustering and feature selection for high dimensional data. The second part is on applying learning techniques to lung tumor image-guided radiotherapy. In the first part, we investigate a new clustering paradigm for exploratory data analysis: find all non-redundant clustering views of the data, where data points of one cluster can belong to different clusters in other views. Typical clustering algorithms output a single clustering of the data. However, in real world applications, data can have different groupings that are reasonable and interesting from different perspectives. This is especially true for high-dimensional data, where different feature subspaces may reveal different structures of the data. We present a framework to solve this problem and suggest two approaches: (1) orthogonal clustering, and (2) clustering in orthogonal subspaces. The idea of removing redundancy between clustering solutions was inspired by our preliminary work on solving the feature selection problem via transformation methods. In particular, we developed a feature selection method based on the popular transformation approach: principal component analysis (PCA). PCA is a dimensionality reduction algorithm that do not explicitly indicate which variables are important. We designed a method that utilize the PCA result to select the original features, which are most correlated to the principal components and are as uncorrelated with each other as possible through orthogonalization. We show that our feature selection method, as a consequence of orthogonalization, preserve the special property in PCA that the retained variance can be expressed as the sum of orthogonal feature variances that are kept. In the second part, we design machine learning algorithms to aid lung tumor image-guided radiotherapy (IGRT). Precise target localization in real-time is particularly important for gated radiotherapy. However, it is difficult to gate or track the lung tumors due to the uncertainties when using external surrogates and the risk of pneumothorax when using implanted fiducial markers. We investigate algorithms for gating and for directly tracking the tumor. For gated radiotherapy, previous approach utilizes template matching to localize the tumor position. Here, we investigate two ways to improve the precision of tumor target localization by applying: (1) an ensemble of templates where the representative templates are selected by Gaussian mixture clustering, and (2) a support vector machine (SVM) classifier with radial basis kernels. Template matching only considers images inside the gating window, but images outside the gating window might provide additional information. We take advantage of both states and re-cast the gating problem into a classification problem. For the tracking problem, we explore a multiple-template matching method to capture the varying tumor appearance throughout the different phases of the breathing cycle.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call