Coffing Data Warehousing Software

Hadoop – Hive, Impala and Zookeeper for Wells Fargo

Tera-Tom here!

Being approved on the Universal Wells Fargo software list is extremely rare and special, and our October Nexus release promises to give Wells Fargo a Universal advantage over all competitors.  We are announcing Nexus Version 12 – with integration between Hadoop’s Hive, Impala, and Zookeeper.

Let’s first discuss the differences between Hive and Impala.

Hive was incubated at Facebook and given to the Apache Foundation. It represents the earliest solution on Hadoop to work with SQL. It is written in Java and the Hive SQL is translated under the hood into MapReduce.  This provides an excellent batch processing solution, but its high latency causes slower queries.

Cloudera produced Impala and provided it to the Apache Foundation. It is designed like MPP platforms and built for speed.  That is why Cloudera has written Impala in C++ and designed it to use more RAM memory.  These improvements feel the need for speed and queries can be up to 5-80% faster than Hive.

Many companies utilize a hybrid approach to Hadoop by using both Hive and Impala together.  You see, anytime you create a table on Hive or Impala, it is stored within the Hive Metastore and use the Hadoop Distributed File System (HDFS).  So, in a hybrid system when you create a table (using Hive or Impala) it can be queried with Hive or Impala because the underlying table is in HDFS.  Therefore, you can decide to query it with either Hive or Impala, based upon your need for a long-running batch query or a lightning speed interactive query.

ZooKeeper is the Apache project that takes large implementations of Hadoop commodity servers and provides a distributed centralized coordination service that enable synchronization across large clusters.  Distributed applications require coordination services, such as naming services that allow one node to find a specific server in a cluster of thousands of servers, or for serialized updates.

We have been working with Wells Fargo for five years and this past year we have been working to provide the perfect integration between all Hadoop systems and all traditional Wells Fargo systems, such as DB2, Oracle, SQL Server and Teradata.  Be prepared to be amazed as you read on!

Nexus has developed four different Nexus foundations for Wells Fargo so everyone can work and collaborate.

  • Nexus converts table structures and data types automatically and moves a single-table or an entire database between all Wells Fargo systems using a wide variety of strategies (QueryGrid, TPT, Bulk Copy, Sqoop, etc.).
  • Nexus shows tables/views visually and their relationships with other tables/views and builds the SQL automatically as users point-and-click. Nexus is considered the greatest visualization tool ever built because it also does cross-system joins between all Wells Fargo systems.
  • Nexus takes every answer set and places it in the Garden of Analysis where the user can join answer sets or re-query them with point-and-click templates to get analytics, graphs and charts and additional reports that are processed inside the user’s PC. No other tool can do this anywhere.
  • Nexus has BizStar, which allows users to receive reports, Excel spreadsheets, word documents, videos and other unstructured data so other team members or the IT staff can share information. The BizStar also has a series of menus that allow business users to run queries with a single-click of a button.  BizStar even has the Multi-Step Process Builder so users can perform a wide variety of repetitive task on data and automate the entire process from start to finish.

Imagine doing a cross-system join between any combination of Teradata, Aster Data, Oracle, SQL Server, DB2, Hive, Impala and even Excel in a single query that has been automatically built by merely pointing on the tables and views needed and selecting the columns desired for the report.

Now imagine taking that answer set and using the Garden of Analysis to generate another 50 reports, plus graphs and charts in minutes and then sharing these with hundreds or even thousands of team members in their BizStar menus.

Now imagine setting up this entire process in the BizStar Multi-Step Process Builder so it is automated and can be scheduled to run.

I will be setting up multiple demos in September or you can request a demo for just yourself or with your team.  Thanks again for your support.


Tom Coffing
CEO, Coffing Data Warehousing
Phone: 513 300-0341