Data Warehouse Offloading

One of our clients, a mass transit electric train service provider is committed to improving their service quality. One key aspect of quality improvement is to embrace the data-driven operation and management. To have the strong data driven culture, they need to establish and maintain a reliable and integrated data management system. One manifestation of these measures is the implementation of data warehouse, which is developed on a big data platform.

Big data is a term used to describe large and complex data processing and data management, in which the conventional system is unable to cope with. It is largely based on an open platform with linear scalability, avoiding vendor lock-in, and giving a broader range choices and the more cost effective alternatives.

Existing challenges to deal with:

  1. Heavy load on the data warehouse
  2. Unscalable architecture
  3. Unreliable data source from mediation
  4. Unable to provide up-to-date report
  5. High cost of hardware acquisition and maintenance

Methods to address the above points:

  1. Use commodity hardware
  2. Move ETL processes into Hadoop
  3. Keep raw & detail data

Yava and HGrid247 are chosen as the platform and tool respectively for these advantages:

1. Scalable Architecture.

Big Data Architecture is based on Hadoop technology, which is very scalable in nature. We can start with small cluster and add more nodes as the requirement increases.
The implementation can also be done gradually, by providing a training, software trial, a simple project and then implement the management and analysis of data to big data environments gradually including spatial side.

2. Longer data retention.

The advantage of this capability is to store and manage data in a structured, systematic and compressed format, so it can keep source and output data with longer retention period.

3. Best Performance with minimum cost

By having a structured, systematic and compressed data format, reports and data trace can be displayed more quickly and concisely, compared to one of regular licensed-based applications, which tend to be slow when you need to access it, and cost you periodically.

4. Long Experience in Implementation and Total Supports

Yava is built on Apache Hadoop and its supporting environments, with wide community support and adopters. Labs247’s strong experience and support in Big Data and Hadoop combined with open-source community ensures successful Yava implementation and total support of the delivered solutions.

Technology Used

Yava, HGrid247, Sqoop, Hive, HBase, Impala, Phoenix, and PHP.


© 2017 Labs247. All rights reserved.
YAVA logo and HGrid247 logo are registered trademarks or trademarks of the Labs247 Company.
HADOOP, the Hadoop Elephant Logo, Apache, Flume, Ambari, Yarn, Bigtop, Phoenix, Hive, Tez, Oozie, HBase, Mahout, Pig, Solr, Storm, Spark, Sqoop, Impala, and ZooKeeper are registered trademarks or trademarks of the Apache Software Foundation.