Project Proposal In Cloud Computing - Research Methods
The project proposal must be on the topic: Carbon footprint reduction in Cloud Data Centers using natural direct air free cooling implementation.
At least 20 references needed. The citation and references should be in IEEE style.
The Project proposal document should be done in latex editors like overleaf. The corresponding pdf generated along with the latex code should be shared. Also, a ppt of the project proposal is needed.
The required structure of the paper is attached. Also, a couple of previous sample have been attached for reference.
Optimizing Docker container performance based on load metrics
over the Cloud
Rohan Dsouza x18139540
MSc in Cloud Computing
29th July 2019
Abstract
Virtualization technologies have had significant advancements over the course of time. Post the realm of Virtual Machines, a shift of infrastructure was seen towards containers. Subsequently containers have started dominating the virtual space and managing them in a cluster became a very tedious job as it would always need manual intervention. Container orchestration tools (Kubernetes, Docker Swarm etc.) seem to have fixed this issue by auto- mating jobs like Auto-scaling, Load balancing, adding network overlays etc. for the time being. However, this ended up adding a non-negligible amount of overhead which would be consistent to the infrastructure. This research is aimed at solving the software overhead problems in virtualized container setups which would eliminate the use of orchestration tools in the very first place. While these tools can fundamentally load-balance and auto- scale containers, this research proposes an algorithm that would do the same without any orchestration tools. That being said, application load balancers would add to the overhead and hence network load balancers would be used to load-balance incoming traffic.
Keywords: Virtualization, Containers, Container Orchestration tools, Kubernetes, Docker Swarm, Software Overhead, Performance overhead, Auto-scaling containers, Load-balancing containers
Contents
1 Introduction 2
2 Research Question 2
3 Literature Review 3 3.1 Virtual Machine’s v/s Containers . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 3.2 Software overhead by Orchestration tools . . . . . . . . . . . . . . . . . . . . . . 3 3.3 Auto Scaling and Load balancing . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 3.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
4 Research method and specification 7 4.1 Proposed methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 4.2 Proposed Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 4.3 Proposed Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
5 Proposed Plan 10
6 Conclusion 11
1
1 Introduction
Over the last decade, containerization and the orchestration tools associated with them have come under significant limelight. It has managed to successfully replace conventional virtual machines (VM’s) in production environments because of its lightweight architecture and per- formance improvements [1] [2].
Out of several containerization platforms out in the market, Docker as of this day is most broadly utilized containerization platform. As Docker out of the box did not support auto- scaling and load balancing, it became very monotonous to use them in clustered environments as manual intervention was needed almost every time. Furthermore, in 2014 post Google launch- ing their open source Kubernetes platform which was originally built for their in-house use and Docker releasing their Orchestration platform named “Swarm”, Container Orchestration(CO) tools that used Docker containers in their back-end became a mandate in clustered environ- ments. Kubernetes is as of now the most broadly used CO tools [3]. However, CO tools like Kubernetes(K8’s) and Swarm cause a significant amount of software overhead because of their predefined Pods and services running as docker containers themselves, external network overlays etc. and hence it is not a very optimal solution for small-scale setups [4] [5].
The main objective of this research will be to propose an algorithm that can optimize the load among containers by auto scaling and further on load balancing the incoming traffic among containers without the use of any Container Orchestration tools like Kubernetes, thus eliminating software overhead. This algorithm will use load metrics like CPU usage, Memory usage, Disk IO etc. as a threshold for scaling. Once the threshold is reached, the containers will auto scale and load balance the incoming traffic. Post the reduction of incoming traffic (and the load metrics beyond the set threshold), the excess containers that were auto scaled will be destroyed. As we will be using Linux systems, our code will be written in the Bourne again shell that is native in GNU Linux. We will be using the Netfilter module of the Linux kernel for load balancing as its performance is better than using external third party load packet filtering tools and also because of the fact that all packets of data entering a Linux system either ways go through that module [6]. We will also be using the native Docker API’s that are shipped with the Docker engine to fetch the load metrics.
The background and literature in this paper is further divided into the following parts: The state of the art, proposed approach, proposed implementation plan / schedule etc. to demonstrate a rather more vivid proposal. Finally at the end, we derive a conclusion for the entire proposal and the future works that would need to be done eventually that would be unlikely to get done this time.
2 Research Question
• How can Docker containers auto scale and load balance without any orchestration tools to reduce software overhead ?
• To what extent can it influence the performance ?
The main aim of this research is to reduce the software overhead among containers by pro- posing an algorithm to auto-scale and load balance containers without the use of any container orchestration tools like Kubernetes, Docker Swarm etc. Furthermore, to propose a solution to load-balance and auto-scale containers without the use of any third party application load bal- ancers. The results will be fetched by a performance evaluation between a Docker-only setup, a CO setup and a VM setup.
2
3 Literature Review
To better understand a research proposal that is this decisive and give some context of the related work, we have broken down the literature review of this paper in specific sections with each having a literature survey of its own.
3.1 Virtual Machine’s v/s Containers
Docker is an open source Operating system level virtualization that was written in the Golang is used to virtualize only a specific application using features of the existing Unix/Linux kernel while the conventional virtual machine would virtualize a complete operating system and then run applications on the guest operating system as seen in Figure 1 [7].
Figure 1: Virtual Machine’s v/s Containers Architecture. Source: [8]
Figure 1 shows us the additional overhead created by the guest operating system as opposed to the Docker architecture. While the performance overhead is obvious, the performance of a Type 2 hypervisor and a Docker image for a functioning of a database was the same as mentioned [1]. However, the testbed for the study consisted of a single server/system. The testbed should have had several servers with with each of them having different hardware architectures. Furthermore, apart from the performance overhead of just a single service, the software isolation of tenants, benchmarking of web servers and key-value store holders was performed and Docker containers had the least memory footprint as compared to native virtual machines [2].
Another research in Bigdata with Apache Spark illustrated that Docker is more flexible, offers better bootup speeds, better auto-scaling and greated flexibility for System admins while deploying their infrastructure [9]. Using several benchmarking tools in a thorough study, the performance of Dockers in terms of IO Disk read and writes, RAM read and writes and CPU intensive tasks outran the performance of virtual machines [10].
3.2 Software overhead by Orchestration tools
Orchestration tools like Docker Swarm and Kubernetes(k8s) have a lot of moving parts. Figure 2 shows the most basic setup of a Kubernetes(k8s) cluster involving just 3 nodes.
It illustrates that there are quite a few pods and services running out of the box as compared to a simple installation of Docker engine. Furthermore, there is no way to use Kubernetes if one’s setup is smaller than 3 nodes. These pods and services are a prerequisite for Kubernetes to function and hence they cannot be stopped at any cost. The prominent opensource Yahoo! Cloud Serving Benchmark(YCSB) tool was used to benchmark the CO tools and it found out
3
Figure 2: Kubernetes (K8’s) basic architecture. Source: [11]
that Docker Swarm and Kubernetes were using a significant amount of memory as compared to a Docker only setup. Furthermore, as Kubernetes has to use an overlay network like Flannel, Calico etc. for the docker containers to communicate amongst different nodes, the network had a significant amount of noise. That being said, because of the software interrupts by Kubernetes, the CPU usage was also substantial and hence there was a performance bottleneck when it was compared to a Docker only setup [4].
Another study for managing and deploying NoSQL DB’s stated that the features of these container orchastration tools like Node failure detection, Volume management and scheduling of tasks itself added to overhead. Again, YCSB tool was used for benchmarking and it was found that for Read queries, there was not much overhead by the CO tools. However, there was a remarkable amount of overhead while Insert and Write queries were benchmarked [5]. The performance of another particular network plugin of Kubernetes was monitored which could make full use of hardware assisted features and yet, it had a 30 percent additional overhead as compared to the Docker-only installation [12]. Furthermore, there is a substantial amount of software overhead that was noticed by a study during the scaling and the deployment phase of Pods in Kubernetes possibly due to synchronization between different host systems [13].
3.3 Auto Scaling and Load balancing
A decent algorithm was proposed for Auto scaling and load balancing of containers. However, the main objective of the research was power consumption and hence when the load on the containers would increase beyond the set threshold, the containers would start and when the load reduced beyond the set threshold, the containers would not get destroyed but go into a sleep state. As the primary focus of the research was to improve energy efficiency, the load balancing was not done accurately as seen in the graphs presented in the paper [14]. Since the goal of our research is not energy efficiency but to reduce the overall software overhead, we will
4
destroy the containers post load reduction. In retrospect, while we can use a slightly modified version of this algorithm for auto-scaling of our containers, we would need to significantly change our strategy for load balancing.
Figure 3: Kubernetes (K8’s) complete network flow architecture. Source: [15]
Figure 3 demonstrates the complex network architecture of the Kubernetes orchestration tool using a third party open source simple overlay network named Flannel. The docker containers are assigned a default NAT subnet of 172.17.0.0/16. However, with this NAT subnet, containers can only communicate with the containers on that machine [16]. Hence, Flannel creates a virtual overlay network of a subnet of your choice (for eg. 10.1.0.0/24) using which the pods and containers can communicate with each other even though they are across multiple hosts. The Kubernetes uses this service for Load balancing and hence happens at the application layer thus causing software overhead [17]. We will be proposing a solution for Load balancing at the network layer thus reducing the performance overhead as discussed in Section 3.2 as application load balancers in general tend to cause an overhead [18] [4] [5]. An algorithm to reduce the response time between grids in a distributed computing environment was proposed, however, thought it increased the performance for a distributed setup, there were no tests done for a non-distributed setup [19].
Existing research in Load balancing gives us a good understanding about the load balancing algorithms, models and techniques that improve the performance. The performance of each algorithm is checked against various metrics like resource-utilization, overhead, performance etc. to give us a better understanding of the study [18]. An experiment was further done to manipulate packets entering the system comparing two open source packet filtering tools out of which one of the tools named Apache along with one of its Modfilter modules that worked at the Application layer and the other one was shipped with every Linux distribution named iptables that worked at the Network layer. The end result was that for most of the parts of the experiment iptables beat Apache as it worked at the network layer alongside the linux kernel
5
[6]. Two popular opensource application webservers were used to load balance traffic among containers when the difference in memory utilization amoung the containers was significant. Not only was the load balanced equally amoung the containers but also the overhead caused by the third party web servers was not considered in the final results [20]. Hence, to summarize this section, our proposal will contain load balancing at the network layer as opposed to the application level.
3.4 Summary
Reference & Citation
Context of the paper Description
Y. Jin-Gang et al. [7]
Auto-scaling mechanism for Unified-Communication
servers based on Docker
Proposal of an algorithm for Kubernetes that can predict the workload in advance as opposed to
calculating it in real time.
G. Luo et al. [8]
Using Dockers in a EDA environment
Proposal of a containerised setup to increase efficiency in an EDA environment.
W. Felter et al. [1]
Virtual Machine’s v/s Containers
Evaluation of Performance of MySQL DB on containers and Virtual Machine’s.
M. Plauth et al. [2]
Containers v/s Unikernels Benchmarking of performance of Docker and KVM
hypervisor.
Q. Zhang et al. [9]
Containers v/s VM’s in a BigData Environment
Running Apache Spark jobs in Docker and a VM setup and evaluating performance based on multiple factors.
J. Shetty et al. [10]
Performance evaluation of VM’s, containers and Bare-
Metal servers
Performance Evaluation considering several metrics, and benchmarking tools for Openstack VM, Docker
Container and Bare-Metal servers.
E. Truyen et al. [4]
Performance evaluation of CO tools, Docker and VM’s
Performance evaluation of a Database cluster deployment using a Docker only Setup, VM setup,
Kubernetes setup and a Swarm Setup.
E. Truyen et al. [5]
Performance evaluation of various CO tools and VM’s
Performance evaluation of Swarm, Kubernetes, Mesos and Openstack VM to deploy and manage a MongoDB
cluster
D. Géhberger et al. [12]
Measuring overhead of CO tool in networking
Measuring network overhead created by Kubernetes in different types of network workloads
V. Medel et al. [13]
Benchmarking Kubernetes Benchmarking Kubernetes tool for various workloads including CPU, I/O and Network intensive workloads
M. Sureshkumar et al. [14]
Algorithm to auto-scale containers
An algorithm that auto-scales containers based on system metrics.
Y. Park et al. [17]
Classification of network topology in CO tools
Overview of network architecture of a Kubernetes setup and a performance graph
B. Deepak et al. [18]
In-depth overview of Load balancing
Classification of Load balancing algorithms, metrics used, comparing existing techniques etc.
C. Wang et al. [6]
Packet filtering analysis using firewalls
Through research into the performance and drawbacks of two opensource packet filtering tools - Apache with
Modfilter module and iptables using different case studies.
J. C. Patni et al. [19]
Load balancing algorithm for Grid Computing
Load balancing of traffic in diffierent type of heterogeneous Grid setups to decrease response time
and the communication costs between nodes.
M. R. M. Bella et al. [20]
Load balancing of web traffic using a CO tool.
Load balancing of web traffic using third party open- source web servers and evaluating performance.
Table 1: A summarized table for the entire Literature review
The entire literature review for the proposal has been summarized as shown in Table 1. The context set for each paper and a brief description for the same has been given. Due references for all the related work used is given to avoid plagiarism.
6
4 Research method and specification
We shall be breaking this section into the following parts to further explain the research proposal:
1. Proposed methodology
2. Proposed Implementation
3. Proposed Evaluation
4.1 Proposed methodology
Since we will not be using any CO tools, we will have to create some listener code that will monitor load metrics like CPU Usage, Memory usage, Disk I/O by each container and auto-scale and load balance the incoming traffic. Algorithm 1 below, shows the proposed algorithm.
Algorithm 1: Proposed algorithm for auto-scaling and load balancing of containers [14]
Prerequisites: Current CPU Utilization, CPU; Current Memory Utilization, MEM; Total no. of containers, N; Max. no. of containers, MAX; Threshold for load metrics, t;
Start: 1 if CPU > t && MEM > t then 2 Auto-scale containers upto MAX and Load-Balance incoming traffic equally among
the scaled containers ; // This will be done by using Docker API calls 3 Keep on serving traffic until load metrics reduce below the threshold
4 else if CPU < t && MEM < t then 5 if N > 1 then 6 Destroy all existing containers and remove them from the load-balancing
backends ; // This will be done by using Docker API calls and iptables utility
7 else 8 Exit ; // Break
9 else 10 Exit ; // Break
End:
The new container will be spawned by making a Docker API call as opposed to an RMI service call. Furthermore, once the CPU usage and the Memory usage falls beyond the threshold, the containers will be killed as opposed to putting them in a sleep state as the goal is reducing the software overhead as much as possible. Load-balancing will happen at the network layer as opposed to application layer.
Figure 4 illustrates the CPU load and Memory consumption going beyond the threshold and 3 additional containers spawning and serving the inbound traffic. Post the containers getting spawned, the traffic is equally getting distributed amongst the containers.
7
Figure 4: Proposed Architecture in Amazon Web Serices(AWS) Public Cloud
4.2 Proposed Implementation
For the implementation setup, we shall be using the Amazon Web Services (AWS) Cloud Platform for demonstration. We shall be using the Elastic Cloud Compute (EC2) service of the AWS Cloud Platform along with the Elastic Block Storage (EBS) service. Our instance would be of the following specifications:
• Region: Ireland
• Operating System: CentOS 7 64 bit
• Instance Type: t2.micro (Free tier)
• CPU: 1 VCPU
• RAM: 1 GiB
• Storage: 30 GiB
We will be using the Docker Community Edition(CE) for installation of docker as its free and opensource. We will be using the Nginx open source web-server as containers for which we will use to benchmark. Furthermore, we will be using iptables which is a utility to write firewall rules in the Netfilter module of the linux kernel for load-balancing of incoming traffic. Last but not the least, the code to auto-scale and load-balance traffic in real time will be run as a shell script in bash as a cron-job.
The flow diagram for the entire algorithm is shown in Figure 5. The diamond indicated a condition to occur and a decision to be made upon that condition while the rectangle simply means that computation would occur. The code will be present on the local storage and will run with root privileges to prevent any user permission issues.
8
Figure 5: Flow Chart of the algorithm without the use of any Container Orchestration tools.
4.3 Proposed Evaluation
For benchmarking the efficiency of the algorithm, we will use the check http plugin from Nagios which is used for monitoring of servers. We also will be using the opensource Ganglia tool to get graphs of HTTP GET requests. The same will be done on the Kubernetes Setup and the KVM setup and the performance will be evaluated. Finally, Linux command line utilities like ps, top, iotop etc. will be needed to calculate the CPU usage, Memory usage and Disk I/O wait time of each container.
9
5 Proposed Plan
Figure 6: Timeline for the proposed Research project
Figure 6 illustrates the timeline for the proposed plan. The dates mentioned are tentative. Gantt charts are prominently used in Project management or while planning the Software development life cycle(SDLC). It gives a broad picture of the status of each task over time. Figure 7 shows us a Gantt chart that has been plotted based on the proposed tentative timeline. It will be further developed steadily in the next Semester post discussion with my Research mentor.
Figure 7: Gantt Chart for the proposed Research project
10
6 Conclusion
This research proposal is aimed to solve the additionally added performance overhead prob- lems when using Container Orchestration tools like Kubernetes, Docker Swarm etc. The above proposed algorithm is expected reduce the overhead by becoming a light weight alternative to the orchestration tools. This is because the tools proposed for solving the problem are Linux utilities that come pre-installed in most of the Linux distributions and no additional packages would be needed that would add to the performance overhead. The proposed algorithm is mainly targeted for individuals/organizations using small-scale containerized setup’s with a few nodes.
11
References
[1] W. Felter, A. Ferreira, R. Rajamony, and J. Rubio, “An updated performance compar- ison of virtual machines and linux containers,” in 2015 IEEE international symposium on performance analysis of systems and software (ISPASS), pp. 171–172, IEEE, 2015.
[2] M. Plauth, L. Feinbube, and A. Polze, “A performance evaluation of lightweight approaches to virtualization,” Cloud Computing, vol. 2017, pp. 15–20, 2017.
[3] D. Bartoletti and C. Dai, “The forrester new waveTM: En- terprise container platform software suites, q4 2018,” p. 8, 2018. Available at https://www.redhat.com/cms/managed-files/ cm-forrester-new-wave-enterprise-container-platform-software-suites-q42018-analyst-paper-f14768-201810-en.
pdf.
[4] E. Truyen, D. Van Landuyt, B. Lagaisse, and W. Joosen, “Performance overhead of con- tainer orchestration frameworks for management of multi-tenant database deployments,” in Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing, SAC ’19, (New York, NY, USA), pp. 156–159, ACM, 2019.
[5] E. Truyen, M. Bruzek, D. Van Landuyt, B. Lagaisse, and W. Joosen, “Evaluation of container orchestration systems for deploying and managing nosql database clusters,” in 2018 IEEE 11th International Conference on Cloud Computing (CLOUD), pp. 468–475, IEEE, 2018.
[6] C. Wang, D. Zhang, H. Lu, J. Zhao, Z. Zhang, and Z. Zheng, “An experimental study on firewall performance: Dive into the bottleneck for firewall effectiveness,” in 2014 10th International Conference on Information Assurance and Security, pp. 71–76, IEEE, 2014.
[7] Y. Jin-Gang, Z. Ya-Rong, Y. Bo, and L. Shu, “Research and application of auto-scaling unified communication server based on docker,” in 2017 10th International Conference on Intelligent Computation Technology and Automation (ICICTA), pp. 152–156, IEEE, 2017.
[8] G. Luo, W. Zhang, J. Zhang, and J. Cong, “Scaling up physical design: Challenges and opportunities,” in Proceedings of the 2016 on International Symposium on Physical Design, pp. 131–137, ACM, 2016.
[9] Q. Zhang, L. Liu, C. Pu, Q. Dou, L. Wu, and W. Zhou, “A comparative study of con- tainers and virtual machines in big data environment,” in 2018 IEEE 11th International Conference on Cloud Computing (CLOUD), pp. 178–185, IEEE, 2018.
[10] J. Shetty, S. Upadhaya, H. Rajarajeshwari, G. Shobha, and J. Chandra, “An empirical performance evaluation of docker container, openstack virtual machine and bare metal server,” Indonesian Journal of Electrical Engineering and Computer Science, vol. 7, no. 1, pp. 205–213, 2017.
[11] “Introduction to kubernetes architecture.” Available at: https://x-team.com/blog/ introduction-kubernetes-architecture/. Accessed: 2019-07-09.
[12] D. Géhberger, D. Balla, M. Maliosz, and C. Simon, “Performance evaluation of low latency communication alternatives in a containerized cloud environment,” in 2018 IEEE 11th International Conference on Cloud Computing (CLOUD), pp. 9–16, IEEE, 2018.
[13] V. Medel, O. Rana, J. Á. Bañares, and U. Arronategui, “Modelling performance & resource management in kubernetes,” in 2016 IEEE/ACM 9th International Conference on Utility and Cloud Computing (UCC), pp. 257–262, IEEE, 2016.
12
https://www.redhat.com/cms/managed-files/cm-forrester-new-wave-enterprise-container-platform-software-suites-q42018-analyst-paper-f14768-201810-en.pdf
https://www.redhat.com/cms/managed-files/cm-forrester-new-wave-enterprise-container-platform-software-suites-q42018-analyst-paper-f14768-201810-en.pdf
https://www.redhat.com/cms/managed-files/cm-forrester-new-wave-enterprise-container-platform-software-suites-q42018-analyst-paper-f14768-201810-en.pdf
https://x-team.com/blog/introduction-kubernetes-architecture/
https://x-team.com/blog/introduction-kubernetes-architecture/
[14] M. Sureshkumar and P. Rajesh, “Optimizing the docker container usage based on load scheduling,” in 2017 2nd International Conference on Computing and Communications Technologies (ICCCT), pp. 165–168, IEEE, 2017.
[15] “Multi-host networking overlay with flannel.” Available at: https://docker-k8s-lab. readthedocs.io/en/latest/docker/docker-flannel.html. Accessed: 2019-07-12.
[16] “Docker container networking.” Available at: https://docs.docker.com/v17.09/ engine/userguide/networking/. Accessed: 2019-07-12.
[17] Y. Park, H. Yang, and Y. Kim, “Performance analysis of cni (container networking in- terface) based container network,” in 2018 International Conference on Information and Communication Technology Convergence (ICTC), pp. 248–250, IEEE, 2018.
[18] B. Deepak, S. Shashikala, and V. Radhika, “Load balancing techniques in cloud computing: A study,” in International Journal of Computer Applications (0975–8887) International Conference on Information and Communication Technologies, 2014.
[19] J. C. Patni and M. S. Aswal, “Distributed load balancing model for grid computing environ- ment,” in 2015 1st International Conference on Next Generation Computing Technologies (NGCT), pp. 123–126, IEEE, 2015.
[20] M. R. M. Bella, M. Data, and W. Yahya, “Web server load balancing based on memory util- ization using docker swarm,” in 2018 International Conference on Sustainable Information Engineering and Technology (SIET), pp. 220–223, IEEE, 2018.
13
https://docker-k8s-lab.readthedocs.io/en/latest/docker/docker-flannel.html
https://docker-k8s-lab.readthedocs.io/en/latest/docker/docker-flannel.html
https://docs.docker.com/v17.09/engine/userguide/networking/
https://docs.docker.com/v17.09/engine/userguide/networking/
Introduction
Research Question
Literature Review
Virtual Machine's v/s Containers
Software overhead by Orchestration tools
Auto Scaling and Load balancing
Summary
Research method and specification
Proposed methodology
Proposed Implementation
Proposed Evaluation