CSR: System and Middleware Approaches to Predictable Services in Multi-Tenant Clouds (NSF CNS-1649502)

Project description and goals

Datacenter-based cloud services exhibit unpredictable performance variations due to multi-tenant interferences and the heterogeneity in datacenter hardware. The investigators attribute the causes of such performance unpredictability to the missing of two important service guarantees from existing cloud providers: resource capacity and application agility. To provide guaranteed resource capacity and enhanced application agility, this project develops independent but complementary approaches at system and middleware levels to reduce performance variations of in-cloud applications without compromising other objectives such as high datacenter utilization and good average performance. The deliverables are new system support in cloud resource management to account for interferences and hardware heterogeneity in shared infrastructures and middleware approaches to perform agile, non-invasive and application-centric resource provisioning. The research methodology combines architectural knowledge on the complex interplay between simultaneous multi-threading, multicore, and non-uniform memory access architectures with statistical learning algorithms to quantify interference and heterogeneity, and integrates the strength of self-optimizing learning and control techniques to automate resource provisioning under dynamic workloads. This project broadens impact by exploring inter-disciplinary techniques in computer system design and enhancing cloud services with predictability guarantees. The success will guide resource management and metering in future cloud systems.

Participants

  • Dr. Jia Rao, Principal investigator

  • Dr. Xiaobo Zhou, Co-Principal investigator

  • Kun Suo, Ph.D. student, 2013-2017

  • Yong Zhao, Ph.D. student, 2014-2017

  • Xiaofeng Wu, Ph.D. student, 2017

  • Anthony Ayodele, PhD student, 2013 - 2016

  • Sawyer Peterson, REU student, 2014 - 2015

  • Kevin Zarkovacki, REU student, 2014 - 2015

  • Khanh Nguyen, REU student, 2016 - 2017

  • Mason Moreland, REU student, 2016 - 2017

  • Scott Laue, REU student, 2017

Project-sponsored Publications

  • Characterizing and Optimizing Hotspot Parallel Garbage Collection on Multicore Systems
    Kun Suo, Jia Rao, Hong Jiang, and Witawas Srisa-an.
    To appear in The European Conference on Computer Systems (EuroSys), 2018

  • An Analysis and Empirical Study of Container Networks
    Kun Suo, Yong Zhao, Wei Chen, and Jia Rao.
    To appear in The IEEE International Conference on Computer Communications (INFOCOM), 2018

  • Scheduler Activations for Interference-resilient SMP Virtual Machine Scheduling
    Yong Zhao, Kun Suo, Luwei Cheng, and Jia Rao.
    In Proceedings of The ACM/IFIP/USENIX Conference on Middleware (Middleware), 2017

  • Preserving I/O Prioritization in Virtualized OSes
    Kun Suo, Yong Zhao, Jia Rao, Luwei Cheng, Xiaobo Zhou, and Francis C.M. Lau.
    In Proceedings of The Symposium on Cloud Computing (SoCC), 2017

  • Preemptive, Low Latency Datacenter Scheduling via Lightweight Virtualization
    Wei Chen, Jia Rao, and Xiaobo Zhou.
    In Proceedings of The USENIX Annual Technical Conference (ATC), 2017

  • Characterizing and Optimizing the Performance of Multithreaded Programs Under Interference
    Yong Zhao, Jia Rao and Qing Yi.
    In Proceedings of The 25th International Conference on Parallel Architecture and Compilation Techniques (PACT), 2016

  • Time Capsule: Tracing Packet Latency across Different Layers in Virtualized Systems
    Kun Suo, Jia Rao, Luwei Cheng and Francis C.M. Lau.
    In Proceedings of The 7th ACM SIGOPS Asia-Pacific Workshop on Systems (APSys), 2016
    Best paper award (2 out of 52 submissions)

  • vScale: Automatic and Efficient Processor Scaling for SMP Virtual Machines
    Luwei Cheng, Jia Rao and Francis C.M. Lau.
    In Proceedings of The European Conference on Computer Systems (EuroSys), 2016

  • Resource and Deadline-aware Job Scheduling in Dynamic Hadoop Clusters
    Dazhao Cheng, Jia Rao, Changjun Jiang and Xiaobo Zhou.
    In Proceedings of the IEEE International Parallel & Distributed Processing Symposium (IPDPS), 2015

  • StoreApp: A Shared Storage Appliance for Efficient and Scalable Virtualized Hadoop Clusters
    Yanfei Guo, Jia Rao, Dazhao Cheng, Changjun Jiang, Cheng-Zhong Xu and Xiaobo Zhou.
    In Proceedings of the 34th IEEE Conference on Computer Communications (INFOCOM), 2015

  • Co-tenancy Interference Measurement and Performance Anomaly Detection in a Multi-tenant Cloud Computing Environment
    Anthony Ayodele, Terrance Boult and Jia Rao.
    In Proceedings of IEEE International Conference on Cloud Computing (CLOUD), 2015.

  • Understanding Parallel Performance Under Interferences in Multi-tenant Clouds
    Yong Zhao, Jia Rao, Xiaobo Zhou, and Qing Yi.
    In Proceedings of International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), Poster, 2015.

  • Improving MapReduce Performance in Heterogeneous Environments with Adaptive Task Tuning
    Dazhao Cheng, Jia Rao, Yanfei Guo and Xiaobo Zhou.
    In Proceedings of ACM/IFIP/USENIX International Conference on Middleware (Middleware), 2014.

  • Moving Hadoop into the Cloud with Flexible Slots
    Yanfei Guo, Jia Rao, Changjun Jiang and Xiaobo Zhou.
    In Proceedings of ACM/IEEE International Conference for High Performance Computing, Networking, Storage, and Analysis (SC), 2014.

  • Towards Fair and Efficient SMP Virtual Machine Scheduling
    Jia Rao and Xiaobo Zhou.
    In Proceedings of ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP), 2014.

  • User-Centric Heterogeneity-Aware MapReduce Job Provisioning in the Public Cloud
    Eric Pettijohn, Yanfei Guo, Palden Lama, and Xiaobo Zhou.
    In Proceedings of USENIX International Conference on Autonomic Computing (ICAC), 2014.

Software release