YAVA Ambari ver 2.2 - Getting Started

1.Introduction

YAVA Data Management Platform, or commonly referred to YAVA is the distribution of Hadoop for big data management with Hadoop cluster management and monitoring. YAVA is one of the Hadoop distros developed and supported by our experts at Labs247.

This section describes the information and tools that will be used to install YAVA SandBox using VirtualBox, and tutorial on how to use YAYA Sandbox. This section also contains tutorial on how to run HGrid247 on Yava, and some basic tutorials on how to use MapReduce, Hive and Hbase, including creating a user, and create, delete, or edit the database.

2.Sandbox Installation

2.1.Prerequisites

2.2.Yava Sandbox Installation

To install YAVA SandBox in VirtualBox, complete the following steps:

  1. In the VirtualBox  Manager click the New button to create a new virtual machine.
    Create New

    Figure 1. Create New

     

  2. Set your virtual machine name and select Linux as OS type. Choose Red Hat (64-bit) as OS version for the YAYA SandBox
    Input Name, Select Type and Version

    Figure 2. Input Name, Select Type and Version

     

  3. Set the amount of memory for the VM, minimum xx GB
    Set Memory

    Figure 3. Set Memory

     

  4. Now you will be prompted to create a virtual hard disk. This is an important step in running your .vmdk file. Select the option ‘Use existing hard disk’ and click the “Choose a virtual hard disk file
    Import File VMDK

    Figure 4. Import File VMDK

     

  5. In the file selection window that opens up browse and select the .vmdk file you wish to open.
    Select File VMDK

    Figure 5. Select File VMDK

     

  6. Now the ‘Use existing hard disk’ option will have the .vmdk file you selected. Click Create.
    Import File Finish

    Figure 6. Import File Finish

     

  7. Now your VirtualBox  Manager will have the new virtual machine listed. Click ‘Start’ to run the VM.
    Running YAVA SandBox

    Figure 7. Running YAVA SandBox

     

3.YAVA Tutorial

3.1.Startup Or Shutdown Service

Basic instruction to startup or shutdown YAVA Data Management Platform services :

  1. Open internet browser to access Ambari, and type :8080 or :8080
  2. Log into Ambari by using username : admin with default password : admin
  3. Start available services, for example select HDFS service and choose Start in Service Action
Startup Service

Figure 8. Startup Service

3.2.HDFS Basic Command

Below are some HDFS basic command :

  1. Inserting Data into HDFS
    • Download sample data sample_data.txt and saved in /home/yava directory
    • Create an input directory in HDFS
              
               $ hadoop fs -mkdir /user/input
              
             
    • Transfer and store the data file from local systems to the Hadoop file system using the put command.
              
               $ hadoop fs -put /home/yava/sample_data.txt /user/input
              
             
    • You can verify the file using ls command.
              
               $ hadoop fs -ls /user/input 
              
             
  2. Retrieving Data from HDFS
    • View sample_data.txt file in HDFS using cat command.
              
               $ hadoop fs -cat /user/input/sample_data.txt
              
             
    • Download the file from HDFS to the local file system using get command.
              
               $ hadoop fs -get /user/input/ /home/yava/
              
             
  3. Create New User in HDFS
    • Create new OS username : user01 and put into hadoop group. Use root to execute below command
              
               $ adduser user01 -g hadoop
               $ passwd pass123
              
             
    • Create home directory in HDFS
               
               $ su – hdfs
               $ hadoop fs -mkdir /user/user01
               $ hadoop fs -chown -R apps:hadoop /user/user01
              
             

3.3.Compile and Execute MapReduce Program

This section will demonstrate some MapReduce basic commands by using the most widely used example for learning MapReduce : the WordCount application. WordCount is A simple application to calculate the number of words in a file.

Source code WordCount.java

 
import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class WordCount {

  public static class TokenizerMapper
       extends Mapper{

    private final static IntWritable one = new IntWritable(1);
    private Text word = new Text();

    public void map(Object key, Text value, Context context
                    ) throws IOException, InterruptedException {
      StringTokenizer itr = new StringTokenizer(value.toString());
      while (itr.hasMoreTokens()) {
        word.set(itr.nextToken());
        context.write(word, one);
      }
    }
  }

  public static class IntSumReducer
       extends Reducer {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable values,
                       Context context
                       ) throws IOException, InterruptedException {
      int sum = 0;
      for (IntWritable val : values) {
        sum += val.get();
      }
      result.set(sum);
      context.write(key, result);
    }
  }

  public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "word count");
    job.setJarByClass(WordCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);
  }
}

 

Compiling WordCount.java

  1. Copy file WordCount.java to /home/yava/wordcount
     
    $ mkdir -p /home/yava/wordcount 
    $ cp WordCount.java /home/yava/wordcount 
     
    
  2. Compile the source code
     
    $ javac -classpath  $(hadoop classpath) /home/yava/wordcount/WordCount.java
    $ jar -cvf wordcount.jar /home/yava/wordcount *.class
     
    

Running WordCount program

  1. Download input file from here
  2. Copy input file to /user/yava/hadoop/input in HDFS
     
    $ hdfs dfs -copyFromLocal /home/yava/wordcount/input.txt /user/yava/hadoop/input
     
    
  3. $ hdfs dfs -copyFromLocal /home/yava/wordcount/input.txt /user/yava/hadoop/input
     
    $ hdfs dfs -ls /user/yava/hadoop/input 
     
    
  4. Run WordCount program and put the result in /user/hadoop/output
     
    $ yarn jar wordcount.jar WordCount /user/yava/hadoop/input /user/yava/hadoop/output
     
    
  5. View the result
     
    $ hdfs dfs -ls /user/yava/hadoop/output
    $ hdfs dfs -cat /user/yava/hadoop/output/part-r-00000
     
    

3.4.Explore The Data With HIVE

In this section we will introduce simple usage of Hive. The data to be used is data output from WordCount program which has format csv (separated by tab) consisting of 2 column, that is word and word count

  1. Connect to Hive by using Hive CLI
     
       $ hive
     
    
  2. Create database and use database
     
      > create database wordcount;
      > use wordcount;
     
    

Use WordCount program result as input data (refer to Compile and Execute MapReduce Program)

  1. Create table Wordcount
     
    > create table wordcount(
    > kata string,
    > jumlah_kata int)
    > row format delimited fields terminated by '\t';
     
    
  2. View table
     
    > show tables;
    > describe wordcount;
     
    
  3. Load input data to wordcount table
     
    > load data inpath '/user/yava/hadoop/output/part-r-00000' into table wordcount;
     
    
  4. View data in wordcount table
     
    > select * from wordcount;
     
    
  5. Some basic query
     
    > select count(*) from wordcount;
    > select * from wordcount order by jumlah_kata;
    > select * from wordcount where jumlah_kata > 5;
     
    

3.5.Working With HBase Shell

Here is an example of simple usage and introduction of commands from HBase. The input data used is the result of the WordCount program in the MapReduce section

  1. Connect to HBase by using HBase interactive shell
     
    $ hbase shell
     
    
  2. Create table to save WordCount result from mapreduce section
     
    > create 'wordcount', 'kata', 'jumlah'
     
    
  3. Running bulk load data to upload WordCount result data to the table
     
    $ hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator='\t' -Dimporttsv.columns=kata:word,HBASE_ROW_KEY,jumlah:number 'wordcount' /user/yava/hadoop/output/part-r-00000
     
    
  4. View table
     
    > scan ‘wordcount’
     
    
  5. View data
     
    > list
     
    

3.6.HGrid247 Basic Data Processing

  • Create Workflow
    Create Workflow

    Figure 9. Create Workflow

  • Login first to the hadoop user
    
    su – hdfs
     
  • Make sure it has a jar file to run.
  • Copy the jar files and input files to the yava sandbox, with the command:
    
    cp SCHospital.jar /home/hdfs (jar file)
    cp Hospital_General_Information.csv /home/hdfs (input file)
     
  • Copy the input files located in hdfs to /user /yava, with the command:
    
    hadoop fs -put Hospital_General_Information.csv /user/yava/
     
  • Then, run the jar file on the HDFS machine with this command:
    
    hadoop jar  
     

    example:

    
    hadoop jar SCHospital.jar SCHospital.CountHospital /user/yava/Hospital_General_Information.csv /user/yava/output
    
    Run the Jar File

    Figure 10. Run the Jar File

  • After the process is done, open the output directory to check if the output file exist with this command:
    
    hadoop fs -ls /user/yava/output
     
    Check Output File

    Figure 11. Check Output File

    If it is appear like picture above, it means the output file is exist. But, if there is only _SUCCESS file, then it is not successful

  • HGrid247 changes the default file name from part-XXXXX to hgrid247-XXXXX. To check if the content is match with the filter, use this command:
    
    hadoop fs -cat /user/yava/output/h*
    
    Results of the output file

    Figure 12. Results of the output file

  • So, this is how to run HGrid247 on YAVA.
Suggest Edit
YAVA - BIG DATA SOLUTION WITHIN YOUR REACH

© 2017 Labs247. All rights reserved.
YAVA logo and HGrid247 logo are registered trademarks or trademarks of the Labs247 Company.
HADOOP, the Hadoop Elephant Logo, Apache, Flume, Ambari, Yarn, Bigtop, Phoenix, Hive, Tez, Oozie, HBase, Mahout, Pig, Solr, Storm, Spark, Sqoop, Impala, and ZooKeeper are registered trademarks or trademarks of the Apache Software Foundation.