Big Data / Hadoop Developer Certification

Big Data Training
Wrox Certified Big Data Training Program
WCBDD Program

The objective of the global training initiative of the Wrox Certified Big Data Developer (WCBDD)
program is to create a new international breed of versatile Big Data programmers, developers, and
technology specialists, thoroughly conversant with Big Data related tools, platforms, and their
architecture to help organizations store, manage and process the huge volume and variety of data
efficiently. At the same time, the program helps build in the required business and industry
knowhow to integrate, implement, customize and manage a Big Data infrastructure effectively.

The WCBDD program aims to:

! Provide participants with skills for handling technology, storage, processing, management
and security infrastructure for Big Data
! Provide participants the experience of working with Hadoop and its component tool
! Enable participants to develop Map Reduce and Pig programs, manipulate distributed files
and understand APIs supporting MapReduce programs
! Familiarize participants with some popular Hadoop commercial distribution systems such as

Cloudera, Hortonworks, and Greenplum

! Give hands-on experience of installing and working with Big Data programming tools on real
! through extensive lab-work and projects in a controlled environment
! Include an integrated live project in the end to allow participants to develop an integrated

Big Data program
What sets apart Wrox IT Certifications?

! These are the World’s First Platform-Independent and Vendor-Neutral Qualifications for Job-
Readiness and Reskilling
! Innovative scenario-based learning combined with real-life case studies.
! Carefully designed, outcome-based session plans for maximum skill transfer in a minimum
amount of time.
! Integrated assessments, projects, and hands-on lab sessions constantly measure your
industry relevant skills.

Confidential WCBDD Training
Key Features

! Allows participants to comprehend and experience the entire Big Data Program Development
! Familiarizes participants with the role and use of Big Data in various relevant industries
through numerous case studies
! Provides experience of working with Hadoop and its component tools
! Enables participants to develop MapReduce programs, manipulate distribute files in the
filesystem, and learn APIs supporting MapReduce programs in Hadoop
! Familiarizes participants with popular Hadoop commercial distribution systems such as

Cloudera, Hortonworks, and Greenplum

! Gives hands-on experience of installing and working with Big Data programming tools on
real data-sets, with extensive lab-work and projects through a controlled environment
! Includes an integrated live project in the end to allow participants to develop Big Data

Confidential WCBDD Training
WCBDD program

The WCBDD program consists of 7 Learning Modules and a project.
Based on their skill levels, participants can choose to opt for any number of the modules available
for skill-building in specific areas, as listed in the module objectives.
Entry level participants are recommended to opt for all the 7 modules and complete the project so
as to build adequate job-readiness competency as Big Data developers. Professionals or participants
who already possess some of the required skill-sets can choose to opt for only those modules which
would help them enhance their skills in specific areas
Each learning module can be covered in approximately 10-Hour Training and 5-Hour Lab Sessions.
Hence the complete training program can be conducted in a flexible manner over 70 learning hours
for entry-level participants and in approximately 30-40 hours for professionals or candidates with
more advanced skills. The project can be completed in an estimated 10-15 hours.

Pre-requisites for the Program

! Basics of programing language
" Concepts of OOP
" Basics of scripting language(Like PERL or RUBY)
! Basics of Linux/Unix operating systems
! Good understanding of Java programming language
" Core Java
! Understanding of basic SQL statements
! Participants with 2+ Years of Professional Experience

Customised Schedule (Tentative)

Assuming the pre-requisites are met by participants, the course can be customised and delivered in
14 Days (6 Hours per day). This is a tentative schedule and will be refined based on the actual skills
of the participants

Days Module Objective Lab
1 Introduction to Big Data


JAVA programming,
OOP concepts,
Understand the role and importance of
Big Data
Labs include brainstorming on how Big
Data solutions can be provided for
certain business scenarios., and
understand how to install Hadoop,
Hive, and HBase locally.
Discuss the use and applications of Big Data in various industries 
Discuss the major technologies associated with Big Data
Explain the roles of the various components of Hadoop ecosystem
Explain the fundamental concepts of MapReduce and its use in the Hadoop ecosystem

Managing a Big Data
Discuss the key technology foundations required for Big Data
Labs focus on hands-on experience of installing and configuring the various
Compare traditional data management systems with Big Data management systems
Hadoop ecosystem tools
Evaluate the key framework requirements for Big Data analytics
Discuss the process of integrating data
Explain the relevance of real-time data
Evaluate requirements to implement Big
Data in an organization
Explain how to use Big Data and real-time data as a Business planning tool


Storing and Processing
Data: HDFS, HBase and MapReduce
Analyze Hadoop’s data storage model with HDFS and HBase

Labs enable practice data manipulation in HBase tables, implementing MapReduce programming at various levels, and accessing HBase data through MapReduce

Develop basic MapReduce programs

Leverage MapReduce extensibility for customizing execution
Test and debug a MapReduce program in the design time
Implement a MapReduce program for a given scenario


Increasing Efficiency with Hadoop Tools:
Hive, Pig, Oozie
Discuss the Hive data storage principle 
Labs help implement the concepts learning in the session in various
Perform operations with data in Hive
Implement Advance Query features of Hive
Explain the File formats and Records formats supported by the Hive
Use Pig to automate the design and implementation of MapReduce
Analyze workflow design and management using Oozie
Design and implement an Oozie Workflow
Discuss the Hive data storage principle

Advance Hadoop
Features: Zookeeper,
Sqoop, Flume, Yarn and Storm
Implement the use of Apache Zookeeper for distributed coordination service
Labs provide hands-on experience of working with ZooKeeper, Flume,
Sqoop, and Storm.
Load data into Hive and HBase from non-Hadoop storage systems
Describe the role of Flume
Use Flume for data aggregation
Explain the role of YARN and compare with the role of MapReduce in Hadoop 1.0
Explain how to manage real-time data on Hadoop with Storm on YARN
Develop Storm on YARN applications

Leveraging NoSQL for Big Data
Interface and interact with NoSQL 
Labs provide hands-on experience of working with MongoDB and
Perform CRUD Operations and querying in various NoSQL databases
Analyze how security is implemented in Hadoop
Configure Hadoop applications to run on Amazon Web Services (AWS)
Design real-time applications of Hadoop


Commercial Hadoop Distribution and Management Tools
Explore Cloudera Manager platform 
This module does not have any specific Lab exercises, but participants are advised to download and explore the various commercial distribution sandboxes or trial versions and familiarize themselves with the distributions explained.

Use Cloudera Manager for adding and managing services
Configure Hive metastore for various platforms
Set up Cloudera Manager 4.5 for Hive
Deploy Hortonworks Data Platform (HDP) clusters for Big Data analysis
Use Talend Open Studio for data analysis
Explain Greenplum Pivotal HD architecture
Discuss and install InfoSphere BigInsights
Discuss and install MapR and MapR Sandbox
Prepare for appearing in job interviews


Main Project

Creating an application using various Big Data tools to query a library weblog data to understand customer preferences, choices, trends etc.