Course Outline:
What is Big Data & Why Hadoop?
Hadoop Overview & it’s Ecosystem
HDFS – Hadoop Distributed File System
Map Reduce Anatomy
Developing Map Reduce Programs
Advanced Map Reduce Concepts
Advanced Map Reduce Algorithms
Advanced Tips & Techniques
Monitoring & Management of Hadoop
Using Hive & Pig ( Advanced )
HBase
NoSQL
Sqoop
Deploying Hadoop on Cloud
Hadoop Best Practices and Use Cases
Course Contents:
1. Big Data
The problem space and example applications
Why don’t traditional approaches scale?
Requirements
2. Hadoop Background
Hadoop History
The ecosystem and stack: HDFS, MapReduce, Hive, Pig…
Cluster architecture overview
3. Development Environment
Hadoop distribution and basic commands
Eclipse development
4. HDFS Introduction
The HDFS command line and web interfaces
The HDFS Java API (lab)
5. MapReduce Introduction
Key philosophy: move computation, not data
Core concepts: Mappers, reducers, drivers
The MapReduce Java API (lab)
6. Real-World MapReduce
Optimizing with Combiners and Partitioners (lab)
More common algorithms: sorting, indexing and searching (lab)
Relational manipulation: map-side and reduce-side joins (lab)
Chaining Jobs
Testing with MRUnit
7. Higher-level Tools