Saturday, August 24, 2013

What is Hadoop?


Hadoop 101
  1. Distributed System
  2. Open Source
  3. Consist of 2 primary components which form the core of what Hadoop can do:
    1. HDFS: Distributed file system modeled after GFS (Google's File System).
    2. MapReduce: Distributed batch processing modeled after Google's MapReduce.
Hadoop's Wider Ecosystem
  1. HBase - A column oriented database (or data store for storing structured data) modeled after Google's BigTable.
  2. ZooKeeper - A distributed locking system modeled after Google's Chubby Locking System, a service for maintaining configuration and distributed synchronization.
  3. Hive - A SQL like language on top of Hadoop, it provides a SQL like interface for querying data in Hadoop.
  4. Cascading - A DSL mend for making it easier to work with processed data inside Hadoop, it is a framework for creating data processing workflows in Hadoop. 
  5. Pig - Another DSL with the same goal, making it easier to work with Hadoop. It is a high level language for creating MapReduce programs.
  6. Flume - Useful for moving log data into Hadoop.


No comments:

Post a Comment