BloodyArmy's Notes: What is Hadoop?

Saturday, August 24, 2013

Hadoop 101

Hadoop's Wider Ecosystem

HBase - A column oriented database (or data store for storing structured data) modeled after Google's BigTable.
ZooKeeper - A distributed locking system modeled after Google's Chubby Locking System, a service for maintaining configuration and distributed synchronization.
Hive - A SQL like language on top of Hadoop, it provides a SQL like interface for querying data in Hadoop.
Cascading - A DSL mend for making it easier to work with processed data inside Hadoop, it is a framework for creating data processing workflows in Hadoop.
Pig - Another DSL with the same goal, making it easier to work with Hadoop. It is a high level language for creating MapReduce programs.
Flume - Useful for moving log data into Hadoop.

BloodyArmy's Notes