Training Institute In Bhopal |DreamSoft Systems

Big Data & Hadoop Course Contents

  • Module 1: Introduction to Hadoop
  • • What is Big Data
    • Need and significance of innovative technologies
    • What is Hadoop
    • 3 Vs (Characteristics)
    • History of Hadoop and its Uses
    • Different Components of Hadoop
    • Various Hadoop Distributions

  • Module 2: HDFS ( Hadoop Distributed File System)
  • • Significance of HDFS in Hadoop
    • HDFS Features
    • Daemons of Hadoop and functionalities NameNode, DataNode , JobTracker , TaskTrack, Secondary NameNode
    • Data Storage in HDFS Blocks, Heartbeats, Data Replication, HDFS Federation, High Availability
    • Accessing HDFS CLI (Command Line Interface) Unix and Hadoop Commands, Java Based Approach
    • Data Flow Anatomy of a File Read,Anatomy of a File Write
    • Hadoop Archives

  • Module 3 : MapReduce
  • • Introduction to MapReduce
    • MapReduce Architecture
    • MapReduce Programming Model
    • MapReduce Algorithm and Phases
    • Data Types
    • Input Splits and Records
    • Blocks Vs Splits
    • Basic MapReduce Program Driver Code, Mapper Code, Reducer Code, Combiner and Shuffler
    • Creating Input and Output formats in MapReduce Jobs File Input / Output Format, Text Input / Output Format, Sequence File Input / Output Format, etc.
    • How to Debug MapReduce Jobs in Local and Pseudo cluster mode
    • The MapReduce Web UI
    • Introduction to MapReduce Streaming
    • Data Localization in MapReduce
    • Distributed Cache
    • Compression Mechanisms
    • Joins, Map-Side Joins, Reduce-Side Joins

  • Module 4 : Pig
  • • Introduction to Apache Pig
    • MapReduce Vs. Apache Pig
    • SQL Vs. Apache Pig
    • Different Data types in Apache Pig
    • Modes of Execution in Apache Pig Local Mode, Map Reduce or Distributed Mode
    • Execution Mechanism Grunt shell, Script, Embedded
    • Data Processing Operators Loading and Storing Data, Filtering Data, Grouping and Joining Data, Sorting Data, Combining and Splitting Data
    • How to write a simple PIG Script
    • UDFs in PIG

  • Module 5 : Sqoop
  • • Introduction to Sqoop
    • Sqoop Architecture and Internals
    • MySQL client and server installation
    • How to connect relational database using Sqoop
    • Sqoop Commands
    • Different flavors of imports, Export, HIVE imports

  • Module 6: Hive
  • • The Metastore
    • Comparison with Traditional Databases Schema on Read Versus Schema on Write, Updates, Transactions, and Indexes
    • HiveQL Data Types, Operators and Functions
    • Tables Managed Tables and External Tables, Partitions and Buckets Storage Formats, Importing Data, Altering Tables Dropping Tables
    • Querying Data Sorting and Aggregating, MapReduce Scripts, Joins, Subqueries, Views
    • User-Defined Functions Writing a UDF, Writing a UDAF

  • Module 7: HBase
  • • Introduction to Hbase
    • HBase Vs HDFS
    • Use Cases
    • Basics Concepts Column families, Scans
    • Hbase Architecture
    • Zoo Keeper
    • Clients REST, Thrift, Java Based, Avro
    • MapReduce integration
    • MapReduce over Hbase
    • Schema definition
    • Basic CRUD Operations

  • Module 8 : Introduction to Flume, Oozie, HCatalog, Mahout, Solr, Hue, Impala, Tableau

Course Duration : 120 Hrs

Daily Two Hours, 3 days a week

Contact immediately for special offer - Click here