YAVA Ambari ver 2.2 - Installation Guide

1.Getting ready

This section describes the information and tools that will be used to install YAVA cluster using Ambari. Ambari provides end-to-end management and monitoring solution for YAVA cluster. Using the Ambari Web UI and REST APIs, you can deploy, operate, manage configuration changes, and monitoring services for all nodes on cluster from a central point.

1.1.Determine stack compatibility

Ambari 2.2.1 is compatible with YAVA 2.2.x release

1.2.Meet minimum system requirements

To run Hadoop, your system must meet the following minimum requirements:

  • Operating System Requirements
  • Browser Requirements
  • Software Requirements
  • JDK Requirements
  • Database Requirements
  • Recommended Maximum Open File Descriptors

Operating System Requirements

The following 64-bit Linux operating systems are support YAVA:

  • Red Hat Enterprise Linux (RHEL) v7.x
  • CentOS v7.x

Browser requirements

Ambari runs as a browser-based web application. The minimum required browser versions are Firefox 18 and Google Chrome 26.

Software requirements

On each of your hosts:

  • yum
  • rpm
  • scp
  • curl
  • unzip
  • wget
  • OpenSSL (v1.01, build 16 or later)
  • python (ver 2.6 or ver 2.7)

JDK requirements

The following Java runtime environments that supports YAVA are JDK 8, like Oracle JDK 1.8 64 bit (minimum JDK 1.8_0) and OpenJDK 8 64-bit. It is recommended to use Java runtime that already avalailable on your Linux system that will be used (OpenJDK 8 64 bit).

Database requirements

Ambari requires a relational database to store information about the cluster configuration and topology. The followings are relational database that can be used:

  • PostgreSQL 8 dan 9.x (9.1.13+, 9.3)
  • Mysql 5.6 atau Mariadb versi 5.5.x

By default, Ambari used PostgreSQL on Ambari Server host, or you can use another options like MySQL or Oracle. Oracle database usage on Ambari is not recommended, because it may cause conflict on port 8080 (Ambari default port). Database is also used on some YAVA service stack, like Hive Service that used MySQL database as default.

Memory requirements

Number of Hosts Memory Available Disk Space
1 1024 10 GB
10 1024 20 GB
50 2048 50 GB
100 4096 100 GB
300 4096 100 GB
500 8096 200 GB
1000 12288 200 GB
2000 16384 500 GB

Table 1. Memory Requirements

To check available memory on any host, run:

 
   free -m
 

Check the maximum open file descriptors

The recommended maximum number of open file descriptors is 10000 or more. To check the current value set for the maximum number of open file descriptors, execute the following shell commands on each host:

 
  ulimit –Sn
  ulimit –Hn
 

If the output is not greater than 10000, run the following command to set it to a suitable default:

 
  ulimit -n 10000
 

1.3.Collect information

Before deploying a YAVA cluster, you should collect the following information:

  • Fully Qualified Domain Name (FQDN) of each host in your system. Ambari use FQDN hostname to install cluster. To check the hostname, you can use command hostname -f on each host.
  • A list of components you want to set up on each host.
  • The base directories you want to use as mount points for storing Namenode, Datanode, Secondary Namenode, Oozie, YARN, Zookeeper, log, pid, and db file.

Deploying all YAVA components on a single host is possible, but is appropriate only for initial evaluation purposes. Typically, you need to set up at least three hosts; one master host and two slaves, as a minimum cluster.
You must use base directories that provide persistent storage locations for your YAVA components and Hadoop data. Installing YAVA components in locations that may be removed from a host may result in cluster failure or data loss. For example, do not use /tmp in a base directory path.

1.4.Prepare the environment

To deploy your Hadoop instance, you need to prepare your deployment environtment:

  • Install EPEL Repository
  • Set Up Static IP
  • Set Up Password-less SSH
  • Install Java 1.8.0
  • Set Up Service User Accounts
  • Enable NTP on Cluster
  • Check DNS
  • Configure IPtables
  • Disable SELinux, PackageKit, and Check umask Value

Install epel repository

This package containts extra package for Enterprise Linux (EPEL) repository from CentOS. To install this EPEL package, run this command:

 
   yum install epel-release
 

Set up static IP

Ambari Server needs network setting using Static IP, so that the Ambari server IP address is always the same. Below are the example of how to create static IP:

 
   vi /etc/sysconfig/network-scripts/ifcfg-enp0s3
 

Add the following script:

 
IPADDR=192.168.1.177
NETMASK=255.255.255.0
GATEWAY=192.168.1.1
BOOTPROTO="static"
ONBOOT="yes"
DNS=8.8.8.8
 

Set up password – less ssh

To have Ambari Server automatically install Ambari Agents on all your cluster hosts, you must set up password-less SSH connections between the Ambari Server host and all other hosts in the cluster. The Ambari Server host uses SSH public key authentication to remotely access and install the Ambari Agent. Follow these following steps:

  1. Generate public and private SSH keys on the Ambari Server host
        
          ssh-keygen
       
      

    public and private key file is usually created on /root/.ssh path

  2. Copy the SSH Public Key (id_rsa.pub) to the root account on your target host.
        
          ..ssh/id_rsa
          ..ssh/id_rsa.pub
       
      
  3. Add the SSH Public Key to the authorized_keys file on your target host.
        
          cat id_rsa.pub >> authorized_keys
       
      
  4. Depending on your version of SSH, you may need to set permission on the .ssh directory (to 700) and the authorized_keys file in that directory (to 600) on the trarget hosts.
        
          chmod 700 ~/.ssh
          chmod 600 ~/.ssh/authorized_keys
       
      
  5. From the Ambari Server, make sure you can connect to each host in the cluster using SSH, without having to enter a password.
        
          ssh root@[remote.target.host]
       
      

    where [remote.target.host] has the value of each host name in your cluster.

  6. Retain a copy of the SSH Private Key on the machine from which you will run the web-based Ambari insrall wizard.

Install JAVA 1.8.0

Make sure Java that installed on YAVA 2.2 is JDK 8. To check it, follow the following steps:

  1. To check if the package is ready on OS, run this command:
        
          yum search openjdk
       
      
  2. Install OpenJDK
        
          yum install java-1.8.0-openjdk.x86_64
          yum install java-1.8.0-openjdk-devel.x86_64
       
      
  3. Setting JAVA_HOME on bash_profile file, using this command:
        
          vi ~/.bash_profile
       
      

    add the following script:

        
          export JAVA_HOME=/usr/lib/jvm/java-1.8.0
          PATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH:$HOME/bin
          export PATH
          umask 022
       
      
  4. Then, update bash_profile file, using this command:
        
          source ~/.bash_profile
       
      

Set up service user accounts

Each YAVA service requires a service user account. The Ambari Install Wizard creates new and preserves any existing service user accounts, and uses these accounts when configuring Hadoop services. Services user accounts creation applies to service user accounts on the local operating system and to LDAP/AD accounts.

Enable NTP in cluster and host browser

The clocks of all the nodes in your cluster and the machine that runs the browser through which you access the Ambari Web interface must be able to synchronize with each other.
To check the NTP service will be automatically started upon boot, run the following command on each host:

 
  yum install -y ntp
 

Then, run the following command on each host:

 
  systemctl is-enabled ntpd
  systemctl enable ntpd
  systemctl start ntpd
  systemctl status ntpd
 

Check DNS and NSCD

All hosts in your system must be configured for both forward and reverse DNS.
If you are unable to configure DNS in this way, you should edit the /etc/hosts file on every host in your cluster to contain the IP address and Fully Qualified Domain Name of each of your hosts. The following instructions are provided as an overview and cover a basic network setup for generic Linux hosts. Different versions and flavors of Linux might require operating system(s) deployed in your environment.
Hadoop relies heavily on DNS, and as such performs many DNS lookups during normal operation. To reduce the load on your DNS infrastructure, it’s highly recommended to use the Name Service Caching Daemon (NSCD) on cluster nodes running Linux. This daemon will cache host, user, and group lookups and provide better resolution performance, and reduced load on DNS infrastructure.
Follow this command to run it:

 
  vi /etc/resolv.conf
 

Add the following script:

 
  nameserver 8.8.8.8
 
Edit the Host file
  1. Using a text editor, open the hosts file on every host in your cluster. For example:
     
      vi /etc/hosts
     
    
  2. Add a line for each host in your cluster, The line should consist of the IP address and the FQDN. For example:
     
       #IP                 FQDN                  Alias
       192.168.1.177       yava.solusi247.com    yava
     
    

    Do not remove the following two lines from your hosts file. Removing or editing the following lines may cause various programs that require network functionality to fail. Below is the default content of the file before
    being added (on CentOS 7):

     
      127.0.0.1  localhost localhost.localdomain localhost4 localhost4.localdomain4
      ::1        localhost localhost.localdomain localhost6 localhost6.localdomain6
     
    
Set the Hostname
  1. Confirm that the hostname is set by running the following command:
     
      hostname -f
     
    

    This should return the you just set.

  2. Use the “hostname” command on each host in your cluster. For example:
     
      hostname yava.solusi247.com.
     
    
Edit network configuration file
  1. Using a text editor, open the network configuration file on every host and set the desired network configuration for each host. For example:
     
      vi /etc/sysconfig/network
     
    
  2. Modify the HOSTNAME property to set the fully qualified domain name.
     
      NETWORKING=yes
      HOSTNAME= yava.solusi247.com
     
    

Configuring IPTABLES

For Ambari to communicate during setup with the hosts it deploys to and manages, certain ports must be open and available. The easiest way to do this is to temporarily disable iptables, as follows:

  1. First, install firewalld on each node :
     
       yum install firewalld
     
    
  2. Disable iptables:
     
       systemctl disable firewalld
       systemctl stop firewalld
       systemctl status firewalld
     
    

You can restart iptables after setup is complete. If the security protocols in your environment prevent disabling iptables, you can proceed with iptables enabled, if all required ports are open and available.
Ambari checks whether iptables is running during the Ambari Server setup process. If iptables is running, a warning displays, reminding you to check that required ports are open and available. The Host Confirm step in the Cluster Install Wizard also issues a warning for each host that has iptables running.

Disable SELinux and packagekit and check the umask value

  1. You must disable SELinux for the Ambari setup to function. On each host in your cluster, open SELinux configuration file:
     
       vi /etc/selinux/config
     
    

    modify this part:

     
       SELinux=disabled
     
    

    This ensures that SELinux does not turn itself on after you reboot the machine.

  2. On an installation host running RHEL/CentOS with PackageKit installed, open file:
     
       vi /etc/yum/pluginconf.d/refresh-packagekit.conf
     
    

    modify this part:

     
       enabled=0
     
    
  3. UMASK (User Mask or User File creation MASK) sets the default permissions or base permissions granted when a new file or folder is created on a Linux machine. Most Linux distros set 022 as the default umask value. Ambari supports umask values of 022 or 027. To set the umask value of 022, run this command as root on each hosts.
  4. A umask value of 022 grants read, write, execute permissions of 755 for new files or folders. A umask value of 0237 grants read, write, execute permissions of 750 for new files or folders.
     
       echo umask 0022 >> /etc/profile
     
    

1.5.Using local repository

Local repositories are frequently used in enterprise clusters that have limited outbound internet access. In these scenarios, having packages available locally provides more governance, and better installation performance. These repositories are used heavily during installation for package distribution, as well as post-install for routine cluster operations such as service start/restart operations. The following section describes the steps required to setup and use a local repository:

Setting up a local repository

Before setting up local repository, complete these preparation steps should first:

  1. Prepare the rpm Ambari and YAVA installer package to be used.
  2. The mirror server should have the same operating system as the cluster, or you can use one of the servers in the cluster, for example the Ambari-Server.
  3. Enable network access from all hosts in the cluster to the mirror server.
  4. Make sure the mirror server already has a package manager installed, such as yum (Yellowdog Updater Modified) which is a built-in redhat 7 (el7) tool, used to download, install and uninstall application packages.
  5. Install yum utilities and httpd server :
    
       yum install yum-utils createrepo httpd
     
  6. Activate httpd for web server with these commands:
    
       systemctl start httpd
       systemctl enable httpd
       systemctl status httpd
     
Repository RPM
Ambari Base RPM (Ambari Package) ambari-1-2.noarch.rpm
YAVA Base RPM (Yava Core Package – HDFS, MapReduce, Hive, HBase, Yarn, ZooKeeper, OOzie) YAVA-1-2.noarch.rpm
YAVA-UTILS Base RPM (Yava Utility Package) YAVA-UTILS-1-2.noarch.rpm

Table 2. RPM for Repository

Install RPM for Repository:


   yum install ambari-1-2.noarch.rpm
   yum install YAVA-UTILS-1-2.noarch.rpm
   yum install YAVA-1-2.noarch.rpm
 

<web.server> = FQDN
from web server host and OS used(el7).

Prepare the Ambari repository configuration file

  1. Create Ambari.repo, YAVA.repo, and YAVA-UTILS.repo file on /etc/yum.repos.d/ directory, on the machine that will be used as Ambari Server. For example, follow the following steps:
    • Create ambari.repo file:
      
        vi /etc/yum.repos.d/ambari.repo
       

      fill with below scipt:

      
           [AMBARI-2.2.1]
           name=AMBARI-2.2.1
           baseurl=http:///repos/ambari/rhel/7/x86_64/2.2.1/
           enabled=1
           gpgcheck=0
           gpgkey=http:///repos/ambari/rhel/7/x86_64/RPM-GPG-KEY/RPM-
           GPG-KEY-J.Yava
       
    • Create YAVA.repo file:
      
        vi /etc/yum.repos.d/YAVA.repo
       

      fill with below script:

      
        [YAVA-2.2.0.5]
        name=YAVA-2.2.0.5 
        baseurl=http:///repos/YAVA/rhel/7/x86_64/2.2.0.5/ 
        path=/
        enabled=1
       
    • Create YAVA-UTILS.repo file:
      
        vi /etc/yum.repos.d/YAVA.repo
       

      fill with below script:

      
        [YAVA-UTILS-1.1]
        name=YAVA-UTILS-1.1
        baseurl=http:///repos/YAVA-UTILS/rhel/7/x86_64/1.1/
        path=/
        enabled=1
       

    If GPG is not available, you can disable the GPG check by setting gpgcheck=0. After that, you can do a test to makes dure that the created file is right, by look at update repo list and test search package. Run the
    following commands:

    
      yum clean all
      yum repolist
      yum search 
     

    If success, it will show information of newly added repo list, and also the package that available while the search process.

  2. Continue the process to install Ambari Server and Setup Ambari Server.

2.Installing Ambari

To install Ambari server on a single host in your cluster, complete the following steps:

  1. Install dan Set Up Ambari Server
  2. Start Ambari Server

2.1.Install Ambari server

Before installing Ambari Server, make sure repository is already prepared before, including the configuration file for repository on server machine that Ambari Server will be installed, like ambari.repo, yava.repo, and yava-utils.repo. To install Ambari Server, follow the following steps:

  1. Login to host using root
  2. Confirm repository configuration that already created before by check existing repo list. Run this command:
     
      yum repolist
     
    

    It will show the repo list that available for Ambari. From the list, you can look that one of them is a repo from Ambari and YAVA.

  3. On Ambari Server installation process, default Ambari database, PostgreSQL, will also be installed.
     
      yum install ambari-server
     
    
  4. Enter (y) when you asked to confirm transaction and dependency check. If success, it will shows
     
      Installed ambari-server.x86_64.
     
    

2.2.Set up Ambari server

Before starting Ambari server, Ambari server need to be set up first. Ambari configuration set up is used to connect to Ambari database, install JDK, and to customize user account that used to run Ambari Server daemon. Ambari server set up will manage the process setup. It is recommended to install JDK 1.8 (openJDK) first and configure JAVA_HOME path on bash_profile. After that, set up Ambari server using this command:

 
  ambari-server setup -j (or --java-home) 
 

For example:

 
  ambari-server setup -j /usr/lib/jvm/java-1.8.0-openjdk
 

Then, it will show response from prompt setup as follows:

  1. If SElinux is not disabled, a warning will appear. Choose default (y) and continue (continue).
  2. By default, Ambari Server is running using root. Choose default (n) when prompt “Customize user account for ambari-server daemon” appears to run as root. If you want to create another user to run Ambari Server or use existing user, choose (y) when “Customize user account for ambari-server daemon” prompt appears. Then, enter the username, for example:
     
       ambari
     
    
  3. If Iptables is not disabled, a warning will appear. Choose default (y) to continue.
  4. Choose (n) on “Enter advanced database configuration“ to use Ambari default database, PostgreSQL. Default
    database name on PostgreSQL is Ambari. Default username and password are Ambari/bigdata. If you do not want to use default database, choose (y) to choose MySQL or Oracle database. It is recommended to use Ambari default database to make it easier.
  5. Continue with “remote database connection properties [y/n]” configuration process, choose (y).
  6. Set up process is done.

Note :
For Ambari Server Installation requirements like SELinux, iptables, or other, it is better to prepare them first, to avoid failure during cluster installation process.

2.3.Start Ambari server

  • To run Ambari Server, run this command on Ambari Server host:
     
      ambari-server start
     
    
  • To check Ambari Server process, run this command:
     
      ambari-server status
     
    
  • To stop Ambari Server process, run this command:
     
      ambari-server stop
     
    

3.Installing, Configuration and Deploying YAVA cluster

3.1.Login to Apache Ambari

After starting Ambari, open Ambari web using browser.

  1. Openthisurlonbrowser: http://<address.ambari.server>:8080,
    where <address.ambari.server> is host name that installed on Ambari Server, or ip address from the host.
    For example:

    
      http://yava.solusi247.com:8080
     
  2. Login to Ambari using default username/password: admin/admin

3.2.Launching the Ambari the Ambari install wizard

From Ambari Welcome page, choose Launch Install Wizard.

Ambari welcome page

Figure 1. Ambari welcome page

3.3.Set up Cluster name

  1. Enter cluster name that will be created. Avoid using white space ( ) or special characters in the name.
  2. Choose Next.

3.4.Select Stack

Service stack is a coordinated and tested set of YAVA components that will be used to deploy provided service. Choose YAVA 2.2.0.5 stack as stable stack from the available options. To see repository URL from stack, click Advanced Repository Options. Adjust it with the address that used before on repository server. For example:

Base URL

Figure 2. Base URL

For now, choice of operating system versions that support YAVA are readhat7 and its derivatives explained on Operating System requirements on previous section.

3.5.Install options

In order to build up the cluster, the install wizard prompts you for general information about how you want to set it up. You need to supply the FQDN of each of your hosts. The wizard also needs to access the private key file you created in Set Up Password-less SSH section. Using the host names and key file information, the wizard can locate, access, and interact securely with all hosts in the cluster.

  1. Use the Target Hosts text box to enter your list of hostnames, on per line. For example:
     
       yava01.solusi247.com
       yava02.solusi247.com
       yava03.solusi247.com
     
    
  2. If you want to let Ambari automatically install the Ambari Agent on all your hosts using SSH, select “Provide your SSH Private Key” and either use the Choose File button in the “Host Registration Information” section to find the private key file that matches the public key you installed earlier on all your hosts or cut and paste the key into the text box manually. If you do not want to use root, you must provide the username for an account that can execute sudo without entering a password.
  3. If you do not want Ambari to automatically install Ambari Agent, select “Perform manual registration”.
  4. Choose “Register” and “Confirm to continue”.

3.6.Confirm Hosts

Confirm Hosts prompts you to confirm that Ambari has located the correct hosts for your cluster and to check those hosts to make sure they have the correct directories, packages, and processes required to continue the installation.
If any hosts were selected in error, you can remove them by selecting the appropriate checkboxes and clicking the grey Remove Selected button. To remove a single host, click the small white Remove button in the Action column.

On Confirm Hosts process, status of the process displays in progress bar. When the progress bar color is changing to yellow, it means that there is indication or warning on the check process, for example when the host does not meet the requirement, like wget or curl is not installed. Select “Click here to see the warnings”, to see which list that already checked and which list indicates warning.

Warning page is also provide access to phyton script that run, to help solve problem that may occurred and to run command “Rerun Checks”. You can run the command when you want to repeat check process after the problem that labeled as warning, has been solved. If the check process has no problem, then the Next button will automatically active, and click it to continue the next process.

3.7.Choose Services

Base on the Stack chosen during Select Stack, you are presented with the choice of Service to install the cluster. YAVA Stack comprises many services. You may choose to install any other available service now, or to add services later. The install wizard selects all available services for installation by default, but it is not recommended.

  1. Choose none to clear all selections, or choose all to select all listed services.
  2. Choose or clear individual checkboxes to define a set of services to install now.
  3. After selecting the services to install now, choose Next.

3.8.Assign Masters

Ambari install wizard assign assigns the master components for selected services to appropriate hosts in your cluster and displays the assignments in Assign Masters. The left column shows services and current hosts. The right column shows current master component assignments by host, indicating the number of CPU cores and amount of RAM installed on each host.

  1. To change the host assignment for a service, select a hostname from the drop down menu for that service.
  2. To remove ZooKeeper instance, click the green minus icon next to the host address you want to remove.
  3. When you are satisfied with the assignments, choose Next.

3.9.Assign Slaves and Clients

Ambari installation wizard assigns the slave components (DataNodes, NodeManagers, dan RegionServers) to appropriate hosts on your cluster. It also attempts to select hosts for installing the appropriate set of clients.

  1. Use all or none to select all of the hosts in the column or none of the hosts, respectively.
    If a host has an asterisk sign next to it, that host is also running one or more master components. Hover your mouse over the asterisk sign to see which master components are on that host.
  2. Complete your selections by using the checkboxes next to specific hosts.
  3. When you are satisfied with your assignments, choose Next.

3.10.Customize Services

The Customize Services step presents you with a set of tabs that let you review and modify your YAVA cluster setup. The wizard attempts to set reasonable defaults for each of the options. You are strongly encouraged to review these settings as your requirements might be slightly different.

Browse through each service tab by hovering your cursor over each properties, you can see a brief description of what the property does. The number of service tabs shown depends on the services you decided to install in your cluster. Each service has at least ten groups of configuration property and options related to others, like database configuration for hive and oozie, admin username/password, and email alert.

You must provide database password for the Hive and Oozie services, Master Secret for Knox, and valid email address by sending a system a warning. Choose the service that shows red number. Then, fill the column according to requirement from service configuration. Repeat the step until the red warning disappears.

For example, choose Hive service and show detail from Hive Metastore part that required. For password database, provide with the password value then confirm again to provide the value on red marked area “if required”.

3.11.Review

The assignments you have made are displayed. Check to make sure everything is correct. If you need to make changes, use the left navigation bar to return to the appropriate screen.
To print your information for later reference, choose Print button.
When you are satisfied with your choices, choose Deploy.

3.12.Install, Start and Test

The progress of the installation displays on the screen. Ambari installs, starts, and runs a simple test on each component. When all status are displayed in progress bar by shows percentage value on each host. Do not refresh your browser during this process. Refreshing the browser may interrupt the progress indicators and may also cause failure in the installation process.

To see specific information on what tasks have been completed per host, click the link in the Message column for the appropriate host. In the Task pop-up, click the individual task to see the related log files. You can select filter conditions by using the Show drop-down list. To see a larger version of the log contents, click the Open icon or to copy the contents to the clipboard, use the Copy icon.

When “Successfully installed and started the services” appears, choose Next to continue the process.

3.13.Complete

The Summary page provides you a summary list of the accomplished tasks. Choose “Complete”. Ambari Web GUI displays.

Suggest Edit
YAVA - BIG DATA SOLUTION WITHIN YOUR REACH

© 2017 Labs247. All rights reserved.
YAVA logo and HGrid247 logo are registered trademarks or trademarks of the Labs247 Company.
HADOOP, the Hadoop Elephant Logo, Apache, Flume, Ambari, Yarn, Bigtop, Phoenix, Hive, Tez, Oozie, HBase, Mahout, Pig, Solr, Storm, Spark, Sqoop, Impala, and ZooKeeper are registered trademarks or trademarks of the Apache Software Foundation.