Algorithmic Enhancements to Data Colocation Grid Frameworks for Big Data Medical Image Processing

Bao, Shunxing

Algorithmic Enhancements to Data Colocation Grid Frameworks for Big Data Medical Image Processing

dc.creator	Bao, Shunxing
dc.date.accessioned	2020-08-23T15:51:11Z
dc.date.available	2018-11-27
dc.date.issued	2018-11-27
dc.identifier.uri	https://etd.library.vanderbilt.edu/etd-11222018-050053
dc.identifier.uri	http://hdl.handle.net/1803/14739
dc.description.abstract	Large-scale medical imaging studies to date have predominantly leveraged in-house, laboratory-based or traditional grid computing resources for their computing needs, where the applications often use hierarchical data structures (e.g., Network file system file stores) or databases (e.g., COINS, XNAT) for storage and retrieval. The resulting performance for laboratory-based approaches reveal that performance is impeded by standard network switches since typical processing can saturate network bandwidth during transfer from storage to processing nodes for even moderate-sized studies. On the other hand, the grid may be costly to use due to the dedicated resources used to execute the tasks and lack of elasticity. With increasing availability of cloud-based big data frameworks, such as Apache Hadoop, cloud-based services for executing medical imaging studies have shown promise. Despite this promise, our studies have revealed that existing big data frameworks illustrate different performance limitations for medical imaging applications, which calls for new algorithms that optimize their performance and suitability for medical imaging. For instance, Apache HBase’s data distribution strategy of region split and merge is detrimental to the hierarchical organization of imaging data (e.g., project, subject, session, scan, slice). Big data medical image processing applications involving multi-stage analysis often exhibit significant variability in processing times ranging from a few seconds to several days. Due to the sequential nature of executing the analysis stages by traditional software technologies and platforms, any errors in the pipeline are only detected at the later stages despite the sources of errors predominantly being the highly compute-intensive first stage. This wastes precious computing resources and incurs prohibitively higher costs for re-executing the application. To address these challenges, this research propose a framework - Hadoop & HBase for Medical Image Processing HadoopBase-MIP) - which develops a range of performance optimization algorithms and employs a number of system behaviors modeling for data storage, data access, and data processing. We also introduce how to build up prototypes to help empirical system behaviors verification. Furthermore, we introduce a discovery with the development of HadoopBase-MIP about a new type of contrast for medical imaging deep brain structure enhancement. And finally, we show how to move forward the Hadoop based framework design into a commercialized big data / High performance computing cluster with a cheap, scalable and geographically distributed file system.
dc.format.mimetype	application/pdf
dc.subject	cloud computing
dc.subject	grid computing
dc.subject	medical image processing
dc.subject	Apache Hadoop ecosystem
dc.subject	Big data infrastructure
dc.title	Algorithmic Enhancements to Data Colocation Grid Frameworks for Big Data Medical Image Processing
dc.type	dissertation
dc.contributor.committeeMember	Douglas C. Schmidt
dc.contributor.committeeMember	Alan Tackett
dc.contributor.committeeMember	Hongyang Sun
dc.type.material	text
thesis.degree.name	PHD
thesis.degree.level	dissertation
thesis.degree.discipline	Computer Science
thesis.degree.grantor	Vanderbilt University
local.embargo.terms	2018-11-27
local.embargo.lift	2018-11-27
dc.contributor.committeeChair	Bennett A. Landman
dc.contributor.committeeChair	Aniruddha Gokhale

Files in this item

Name:: bao.pdf
Size:: 8.811Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Electronic Theses and Dissertations
Electronic theses and dissertations of masters and doctoral students submitted to the Graduate School.

Show simple item record