The Best College Academy of Our Small City

Latest News - QUIS NOSTRUM - Exercitationem ullam corporis suscipit laboriosam

Hadoop Big Data

Course Outline:

What is Big Data & Why Hadoop?
Hadoop Overview & it’s Ecosystem
HDFS – Hadoop Distributed File System
Map Reduce Anatomy
Developing Map Reduce Programs
Advanced Map Reduce Concepts
Advanced Map Reduce Algorithms
Advanced Tips & Techniques
Monitoring & Management of Hadoop
Using Hive & Pig ( Advanced )
HBase
NoSQL
Sqoop
Deploying Hadoop on Cloud
Hadoop Best Practices and Use Cases

Course Contents:

1. Big Data

  • The problem space and example applications

  • Why don’t traditional approaches scale?

  • Requirements

2. Hadoop Background

  • Hadoop History

  • The ecosystem and stack: HDFS, MapReduce, Hive, Pig…

  • Cluster architecture overview

  • 3. Development Environment

    • Hadoop distribution and basic commands

    • Eclipse development

    4. HDFS Introduction

    • The HDFS command line and web interfaces

    • The HDFS Java API (lab)

    5. MapReduce Introduction

    • Key philosophy: move computation, not data

    • Core concepts: Mappers, reducers, drivers

    • The MapReduce Java API (lab)

    6. Real-World MapReduce

    • Optimizing with Combiners and Partitioners (lab)

    • More common algorithms: sorting, indexing and searching (lab)

    • Relational manipulation: map-side and reduce-side joins (lab)

    • Chaining Jobs

    • Testing with MRUnit

    7. Higher-level Tools

    • Patterns to abstract “thinking in MapReduce”

    • The Cascading library (lab)

    • The Hive database (lab)