Hello Readers!! You might be knowing about different components of Hadoop Ecosystem. This blog is about Apache Ambari one of the components of Hadoop ecosystem for monitoring the Hadoop cluster. Let’s get started with…
What is Apache Ambari??
Apache Ambari is a software project of the Apache Software Foundation, is aimed at making Hadoop management simpler by developing software for provisioning, managing, and monitoring Apache Hadoop clusters. Ambari was a sub-project of Hadoop but is now a top-level project in its own right. It brings everything in the Hadoop ecosystem under one roof either by using its easy-to-use web based user interface or through making use of its collection of Restful APIs. Ambari’s web interface was built with simplicity in mind. The goal of Ambari is to make provisioning, managing, and monitoring as easy as possible.
The web interface is actually calling Ambari APIs, which is where the magic is really happening. These APIs can be used to automate a cluster installation with absolutely zero user interaction. Ambari is designed is with a “server-agent” type architecture. There is a single Ambari server that is installed and ran on one host. This server is the single entry point to the cluster, runs the web user interface, and provides Ambari’s RESTful APIs. Agents are installed during the provisioning step on each host in the cluster that is specified. The server then talks to the agents for carrying out tasks like installing new services and managing the cluster. Ambari’s name is Indian for the seat that one sits upon an elephant. You can think of Ambari as the ruler over the Hadoop stack, managing everything from above.
What Ambari does??
Ambari makes Hadoop management simpler by providing a consistent, secure platform for operational control. Ambari provides an intuitive Web UI as well as a robust REST API, which is particularly useful for automating cluster operations. With Ambari, Hadoop operators get the following core benefits:
Simplified Installation, Configuration and Management: Easily and efficiently create, manage and monitor clusters at scale. Takes the guesswork out of configuration with Smart Configs and Cluster Recommendations.
Centralized Security Setup: Reduce the complexity to administer and configure cluster security across the entire platform. Helps automate the setup and configuration of advanced cluster security capabilities such as Kerberos and Apache Ranger.
Full Visibility into Cluster Health: Ensure your cluster is healthy and available with a holistic approach to monitoring. Configures predefined alerts — based on operational best practices — for cluster monitoring. Captures and visualizes critical operational metrics for analysis and troubleshooting. Integrated with Hortonworks Smart Sense for proactive issue prevention and resolution.
Highly Extensible and Customizable: Enables Hadoop to fit seamlessly into your enterprise environment. Highly extensible with Ambari Stacks for bringing custom services under management, and with Views for customizing the Ambari Web UI.
Apache Ambari: Mission
Ambari enables System Administrators to:
- Provision a Hadoop Cluster
- Ambari provides a step-by-step wizard for installing Hadoop services across any number of hosts.
- Ambari handles configuration of Hadoop services for the cluster.
- Manage a Hadoop Cluster
- Ambari provides central management for starting, stopping, and reconfiguring Hadoop services across the entire cluster.
- Monitor a Hadoop Cluster
- Ambari provides a dashboard for monitoring health and status of the Hadoop cluster.
- Ambari leverages Ambari Metrics Systemfor metrics collection.
- Ambari leverages Ambari Alert Frameworkfor system alerting and will notify you when your attention is needed (e.g., a node goes down, remaining disk space is low, etc.)
- Ambari enables Application Developers and System Integrators to:
- Easily integrate Hadoop provisioning, management, and monitoring capabilities to their own applications with the Ambari REST APIs.
Architecture of Ambari
Hadoop processes automatically on clusters, without any intervention of the user. Ambari was founded to simplify the use of Hadoop. As shown in Figure 1 Ambari architecture can be divided in connected parts.
Figure 1: Architecture of Apache Ambari
First the Ambari Web, which is the main platform for Users to log in and give up request that should be run through Ambari. It builds the main Interface for any interactions between User and Application. Through Ambari web all the monitoring processes are visualized.
Secondly the Ambari Server, which is divided into smaller parts itself. The API or REST API is connected to different Web applications; the most important one is Ambari Web.
- Ambari server – This is the master process which communicates with Ambari agents installed on each node participating in the cluster. This has postgres database instance which is used to maintain all cluster related metadata.
- Ambari Agent – These are acting agents for Ambari on each node. Each agent periodically sends his own health status along with different metrics, installed services status and many more things. According master decides on next action and conveys back to the agent to act.
Other interfaces like Microsoft System Center are applications that allow the user to analyze the data furthermore the capability of Hadoop or integrate data and conclusions to other programs of the User. The result of the analysis is accessible through Ambari Web, on the monitoring screens. The connection between Ambari and Hadoop is done by the Ambari Agents. Ambari Server installs on each host an Ambari Agent, that gives every few seconds a heartbeat to the server; the server answers with an instruction for the Agent or it sends a confirmation about his current life-status. All hosts are connected to the clusters by Hadoop.
Figure 2 : High Level Architecture of Apache Ambari
As shown in Figure 2, Agents will send heartbeat to the master every few seconds and will receive commands from the master in the heartbeat responses.
Architecture of Apache Ambari Agent:
Heartbeat responses will be the only way for master to send a command to the agent. As shown in Figure 3, the command will be queued in the action queue, which will be picked up by the action executioner.
Figure 3 : Ambari Agent Architectural design
Action executioner will pick the right tool (Puppet, Python, etc.) for execution depending on the command type and action type. Thus the actions sent in the heartbeat response will be processed asynchronously at the agent. The action executioner will put the response or progress messages on the message queue. The agent will send everything on the message queue to the master in the next heartbeat.
Design Goals of Apache Ambari
The system must architecturally support any hardware and operating system, e.g. RHEL, SLES, Ubuntu, Windows, etc. Components which are inherently dependent on a platform (e.g., components dealing with yum, rpm packages, debian packages, etc.) should be pluggable with well-defined interfaces.
The architecture must not assume specific tools and technologies. Any specific tools and technologies must be encapsulated by pluggable components. The architecture will focus on pluggability of Puppet and related components which is a provisioning and configuration tool of choice, and the database to persist the state. The goal is not to immediately support replacements of Puppet, but the architecture should be easily extensible to do so in the future. The pluggability goal doesn’t encompass standardization of inter component protocols or interfaces to work with third-party implementations of components.
Version Management & Upgrade
Ambari components running on various nodes must support multiple versions of the protocols to support independent upgrade of components. Upgrade of any component of Ambari must not affect the cluster state.
The design should support easy addition of new services, components and APIs. Extensibility also implies ease in modifying any configuration or provisioning steps for the Hadoop stack. Also, the possibility of supporting Hadoop stacks other than HDP needs to be taken into account.
The system must be able to recover from any component failure to a consistent state. The system should try to complete the pending operations after recovery. If certain errors are unrecoverable, failure should still keep the system in a consistent state.
The security implies
1) Authentication and role-based authorization of Ambari users (both API and Web UI),
2) Installation, management, and monitoring of the Hadoop stack secured via Kerberos, 3) authenticating and encrypting over the wire communication between Ambari components (e.g., Ambari master agent communication).
The design strives to simplify the process of tracing failures. The failures should be propagated to the user with sufficient details and pointers for analysis.
Near Real Time and Intermediate Feedback for Operations
For operations that take a while to complete, the system needs to be able to provide the user feedback with intermediate progress regarding currently running tasks, % of operation complete, a reference to a operation log, etc., in a timely manner (near real time). In the previous version of Ambari, this was not available due to Puppet’s Master Agent architecture and its status reporting mechanism.
What is the concept of monitoring and how is it used in general??
Monitoring is getting popular nowadays. It is known as a process which is managing, observing and intervening into other processes or systems. Its main goal is to control the actions, as well as prevent and fix failures. Monitoring can be used in different ways, based on the Use-Case field and environment. The most common areas of use nowadays, next to distributed systems, is the medical field. Operations, heart disease and blood pressure are continuously getting monitored, to either prevent failures or help patients in need. This technique gives a range of security and control, which could not be given by any person for distributed systems. In Internet-scale Distributed Systems (ISDS), the core aim of monitoring is to fix and prevent faults quickly. This improves the quality of the system and its capabilities.
Monitoring applications are usually integrated into the structure of communication in between the distributed systems. By virus or dangerous information which might get send from one computer to another, monitoring would intervene and stop or divorce the machine from the whole system, to save the systems health. Industries nowadays are facing the problem, of increasing complexities in technique and systems that they are using. To find and fix failures in such situation, monitoring is very useful. It identifies failures much efficiently as a person could possibly do.
Concept of monitoring in Apache Ambari….
The concept of monitoring is given in two specific ways with Ambari. Ambari has the knowledge about every cluster health and its data. It can provide and improve the health situation of a Hadoop cluster and analyze the data in ways there are useful for the user. The user has an overview on the current situation through the Web application.
Ambari divides to monitoring Process in three parts. The first one is the List of all the Application Ambari is running on each connected clusters. It shows the name, status and number of clusters that are in default, and need to get fixed.
The second part explains all the technical details about the application, such as storing space and running time.
The last part is aimed to show illustratively how long and effective an application has been running on any clusters. Based on this data the user can give up requests to change the applications or add more clusters into one process, through the API. Ambari then will install automatically a new application to the cluster on sending heartbeat information to the Ambari Agent. Ambari has the ability of installing and giving a brief overlook about the health of the system; it’s not capable of analyzing the data qualitatively.
With growing popularity of Hadoop, many developers jump in to this technology to have a taste of it. But as they say Hadoop is not for faint hearted, many developers could not even cross the barrier of installing Hadoop. Many distributions offer pre-installed sandbox of VM to try out things but it does not give you the feel of distributed computing. However, installing a multi-node is a not an easy task and with growing number of components it is very tricky to handle so many configuration parameters. Thankfully Apache Ambari comes here to our rescue!
Apache Ambari is a web-based tool for provisioning, managing, and monitoring Apache Hadoop clusters. Ambari provides a dashboard for viewing cluster health such as heatmaps and ability to view MapReduce, Pig and Hive applications visually along with features to diagnose their performance characteristics in a user-friendly manner. It has a very simple and interactive UI to install various tools and perform various management, configuring and monitoring tasks. Apache Ambari provides us a simpler interface and saves lots of our efforts on installation, monitoring and management which would have be very tedious with so many components and their different installation steps and monitoring controls.