YAVA Ambari ver 2.2 - Getting Started
1.Introduction
YAVA Data Management Platform, or commonly referred to YAVA is the distribution of Hadoop for big data management with Hadoop cluster management and monitoring. YAVA is one of the Hadoop distros developed and supported by our experts at Labs247.
This section describes the information and tools that will be used to install YAVA SandBox using VirtualBox, and tutorial on how to use YAYA Sandbox. This section also contains tutorial on how to run HGrid247 on Yava, and some basic tutorials on how to use MapReduce, Hive and Hbase, including creating a user, and create, delete, or edit the database.
2.Sandbox Installation
2.1.Prerequisites
- YAVA Sandbox can be through: yava.labs247.id/download_box
- VirtualBox 5.1.x
- 8 GB RAM
- 20 GB Hard disk storage
2.2.Yava Sandbox Installation
To install YAVA SandBox in VirtualBox, complete the following steps:
- In the VirtualBox Manager click the New button to create a new virtual machine.
Figure 1. Create New
- Set your virtual machine name and select Linux as OS type. Choose Red Hat (64-bit) as OS version for the YAYA SandBox
Figure 2. Input Name, Select Type and Version
- Set the amount of memory for the VM, minimum xx GB
Figure 3. Set Memory
- Now you will be prompted to create a virtual hard disk. This is an important step in running your .vmdk file. Select the option ‘Use existing hard disk’ and click the “Choose a virtual hard disk file
Figure 4. Import File VMDK
- In the file selection window that opens up browse and select the .vmdk file you wish to open.
Figure 5. Select File VMDK
- Now the ‘Use existing hard disk’ option will have the .vmdk file you selected. Click Create.
Figure 6. Import File Finish
- Now your VirtualBox Manager will have the new virtual machine listed. Click ‘Start’ to run the VM.
Figure 7. Running YAVA SandBox
3.YAVA Tutorial
3.1.Startup Or Shutdown Service
Basic instruction to startup or shutdown YAVA Data Management Platform services :
- Open internet browser to access Ambari, and type
or:8080 :8080 - Log into Ambari by using username : admin with default password : admin
- Start available services, for example select HDFS service and choose Start in Service Action

Figure 8. Startup Service
3.2.HDFS Basic Command
Below are some HDFS basic command :
- Inserting Data into HDFS
- Download sample data sample_data.txt and saved in /home/yava directory
- Create an input directory in HDFS
$ hadoop fs -mkdir /user/input
- Transfer and store the data file from local systems to the Hadoop file system using the put command.
$ hadoop fs -put /home/yava/sample_data.txt /user/input
- You can verify the file using ls command.
$ hadoop fs -ls /user/input
- Retrieving Data from HDFS
- View sample_data.txt file in HDFS using cat command.
$ hadoop fs -cat /user/input/sample_data.txt
- Download the file from HDFS to the local file system using get command.
$ hadoop fs -get /user/input/ /home/yava/
- View sample_data.txt file in HDFS using cat command.
- Create New User in HDFS
- Create new OS username : user01 and put into hadoop group. Use root to execute below command
$ adduser user01 -g hadoop $ passwd pass123
- Create home directory in HDFS
$ su – hdfs $ hadoop fs -mkdir /user/user01 $ hadoop fs -chown -R apps:hadoop /user/user01
- Create new OS username : user01 and put into hadoop group. Use root to execute below command
3.3.Compile and Execute MapReduce Program
This section will demonstrate some MapReduce basic commands by using the most widely used example for learning MapReduce : the WordCount application. WordCount is A simple application to calculate the number of words in a file.
Source code WordCount.java
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
public class WordCount {
public static class TokenizerMapper
extends Mapper
Compiling WordCount.java
- Copy file WordCount.java to /home/yava/wordcount
$ mkdir -p /home/yava/wordcount $ cp WordCount.java /home/yava/wordcount
- Compile the source code
$ javac -classpath $(hadoop classpath) /home/yava/wordcount/WordCount.java $ jar -cvf wordcount.jar /home/yava/wordcount *.class
Running WordCount program
- Download input file from here
- Copy input file to /user/yava/hadoop/input in HDFS
$ hdfs dfs -copyFromLocal /home/yava/wordcount/input.txt /user/yava/hadoop/input
- $ hdfs dfs -copyFromLocal /home/yava/wordcount/input.txt /user/yava/hadoop/input
$ hdfs dfs -ls /user/yava/hadoop/input
- Run WordCount program and put the result in /user/hadoop/output
$ yarn jar wordcount.jar WordCount /user/yava/hadoop/input /user/yava/hadoop/output
- View the result
$ hdfs dfs -ls /user/yava/hadoop/output $ hdfs dfs -cat /user/yava/hadoop/output/part-r-00000
3.4.Explore The Data With HIVE
In this section we will introduce simple usage of Hive. The data to be used is data output from WordCount program which has format csv (separated by tab) consisting of 2 column, that is word and word count
- Connect to Hive by using Hive CLI
$ hive
- Create database and use database
> create database wordcount; > use wordcount;
Use WordCount program result as input data (refer to Compile and Execute MapReduce Program)
- Create table Wordcount
> create table wordcount( > kata string, > jumlah_kata int) > row format delimited fields terminated by '\t';
- View table
> show tables; > describe wordcount;
- Load input data to wordcount table
> load data inpath '/user/yava/hadoop/output/part-r-00000' into table wordcount;
- View data in wordcount table
> select * from wordcount;
- Some basic query
> select count(*) from wordcount; > select * from wordcount order by jumlah_kata; > select * from wordcount where jumlah_kata > 5;
3.5.Working With HBase Shell
Here is an example of simple usage and introduction of commands from HBase. The input data used is the result of the WordCount program in the MapReduce section
- Connect to HBase by using HBase interactive shell
$ hbase shell
- Create table to save WordCount result from mapreduce section
> create 'wordcount', 'kata', 'jumlah'
- Running bulk load data to upload WordCount result data to the table
$ hbase org.apache.hadoop.hbase.mapreduce.ImportTsv -Dimporttsv.separator='\t' -Dimporttsv.columns=kata:word,HBASE_ROW_KEY,jumlah:number 'wordcount' /user/yava/hadoop/output/part-r-00000
- View table
> scan ‘wordcount’
- View data
> list
3.6.HGrid247 Basic Data Processing
- Create Workflow
Figure 9. Create Workflow
- Login first to the hadoop user
su – hdfs
- Make sure it has a jar file to run.
- Copy the jar files and input files to the yava sandbox, with the command:
cp SCHospital.jar /home/hdfs (jar file) cp Hospital_General_Information.csv /home/hdfs (input file)
- Copy the input files located in hdfs to /user /yava, with the command:
hadoop fs -put Hospital_General_Information.csv /user/yava/
- Then, run the jar file on the HDFS machine with this command:
hadoop jar
example:
hadoop jar SCHospital.jar SCHospital.CountHospital /user/yava/Hospital_General_Information.csv /user/yava/output
Figure 10. Run the Jar File
- After the process is done, open the output directory to check if the output file exist with this command:
hadoop fs -ls /user/yava/output
Figure 11. Check Output File
If it is appear like picture above, it means the output file is exist. But, if there is only _SUCCESS file, then it is not successful
- HGrid247 changes the default file name from part-XXXXX to hgrid247-XXXXX. To check if the content is match with the filter, use this command:
hadoop fs -cat /user/yava/output/h*
Figure 12. Results of the output file
- So, this is how to run HGrid247 on YAVA.
RESOURCES
NEED HELPS
FOLLOW US

© 2017 Labs247. All rights reserved.
YAVA logo and HGrid247 logo are registered trademarks or trademarks of the Labs247 Company.
HADOOP, the Hadoop Elephant Logo, Apache, Flume, Ambari, Yarn, Bigtop, Phoenix, Hive, Tez, Oozie, HBase, Mahout, Pig, Solr, Storm, Spark, Sqoop, Impala, and ZooKeeper are registered trademarks or trademarks of the Apache Software Foundation.