Large Scale Data Management for Enterprise Workloads
The continual proliferation of mobile devices, social media platforms, gaming consoles, etc., combined with the ever-increasing online user population has resulted in a data deluge. The traditional data management solutions are inadequate to meet the storage and processing challenges posed by this large and complex data. Although, several large scale data management solutions have been proposed recently, data architects still face several challenges in scaling proposed systems for enterprise workloads. To this end, we propose novel mechanisms to enable scalable data management for each enterprise workload class. For online transaction processing workloads, we have developed mechanisms for scalable transaction processing in relational and distributed databases. For relational databases, we present a simple mechanism to enable robust performance profiling of cloud hosted databases. For distributed databases, we present the design of Synergy system that leverages materialized views and a lightweight concurrency control on top of a NoSQL database to provide for scalable data management with familiar relational conventions and more robust query expressiveness. For online analytical processing workloads, we empirically evaluate SQL-on-Hadoop and SQL-on-Object-Storage systems and illustrate their performance characteristics. For SQL-on-Hadoop systems, we demonstrate the size-up behavior, scale-up behavior, optimizer attributes, execution engine efficiency and impact of file formats. For SQL-on-Object-Storage systems, we implement and evaluate the performance impact of block range indexes. The knowledge gained from this thesis will enable data architects to address the scaling challenges posed by enterprise workloads.