Hadoop 101
- Distributed System
- Open Source
- Consist of 2 primary components which form the core of what Hadoop can do:
- HDFS: Distributed file system modeled after GFS (Google's File System).
- MapReduce: Distributed batch processing modeled after Google's MapReduce.
Hadoop's Wider Ecosystem
- HBase - A column oriented database (or data store for storing structured data) modeled after Google's BigTable.
- ZooKeeper - A distributed locking system modeled after Google's Chubby Locking System, a service for maintaining configuration and distributed synchronization.
- Hive - A SQL like language on top of Hadoop, it provides a SQL like interface for querying data in Hadoop.
- Cascading - A DSL mend for making it easier to work with processed data inside Hadoop, it is a framework for creating data processing workflows in Hadoop.
- Pig - Another DSL with the same goal, making it easier to work with Hadoop. It is a high level language for creating MapReduce programs.
- Flume - Useful for moving log data into Hadoop.
No comments:
Post a Comment