Subspace Segmentation and High-Dimensional Data Analysis

Sekmen, Ali Safak

Subspace Segmentation and High-Dimensional Data Analysis

Sekmen, Ali Safak

Persistent Link: https://etd.library.vanderbilt.edu/etd-04022012-093720
http://hdl.handle.net/1803/11935

Date: 2012-04-18

Abstract

This thesis developed theory and associated algorithms to solve subspace segmentation problem. Given a set of data W={w_1,...,w_N} in R^D that comes from a union of subspaces, we focused on determining a nonlinear model of the form U={S_i}_{i in I}, where S_i is a set of subspaces, that is nearest to W. The model is then used to classify W into clusters. Our first approach is based on the binary reduced row echelon form of data matrix. We prove that, in absence of noise, our approach can find the number of subspaces, their dimensions, and an orthonormal basis for each subspace S_i. We provide a comprehensive analysis of our theory and determine its limitations and strengths in presence of outliers and noise. Our second approach is based on nearness to local subspaces approach and it can handle noise effectively, but it works only in special cases of the general subspace segmentation problem (i.e., subspaces of equal and known dimensions). Our approach is based on the computation of a binary similarity matrix for the data points. A local subspace is first estimated for each data point. Then, a distance matrix is generated by computing the distances between the local subspaces and points. The distance matrix is converted to the similarity matrix by applying a data-driven threshold. The problem is then transformed to segmentation of subspaces of dimension 1 instead of subspaces of dimension d. The algorithm was applied to the Hopkins 155 Dataset and generated the best results to date.

Show full item record

Files in this item

Name:: Sekmen_PhD_Dissertation.pdf
Size:: 1.518Mb
Format:: PDF

View/Open

This item appears in the following collection(s):

Electronic Theses and Dissertations