Introduction and relevance
Uses of Big Data analytics in various industries like Telecom, E- commerce, Finance and Insurance etc.
Problems with Traditional Large-Scale Systems
Motivation for Hadoop
Different types of projects by Apache
Role of projects in the Hadoop Ecosystem
Key technology foundations required for Big Data
Limitations and Solutions of existing Data Analytics Architecture
Comparison of traditional data management systems with Big Data management systems
Evaluate key framework requirements for Big Data analytics
Hadoop Ecosystem & Hadoop 2.x core components
Explain the relevance of real-time data
Explain how to use big and real-time data as a Business planning tool
Hadoop Master-Slave Architecture
Data manipulation tools (Operators, Functions, Procedures, control structures, Loops, arrays etc)
The Hadoop Distributed File System - Concept of data storage
Explain different types of cluster setups(Fully distributed/Pseudo etc)
Hadoop cluster set up - Installation
Hadoop 2.x Cluster Architecture
A Typical enterprise cluster – Hadoop Cluster Modes
Understanding cluster management tools like Cloudera manager/Apache ambari
HDFS Overview & Data storage in HDFS
Get the data into Hadoop from local machine(Data Loading Techniques) - vice versa
Map Reduce Overview (Traditional way Vs. MapReduce way)
Concept of Mapper & Reducer
Understanding MapReduce program Framework
Develop MapReduce Program using Java (Basic)
Develop MapReduce program with streaming API) (Basic)
Integrating Hadoop into an Existing Enterprise
Loading Data from an RDBMS into HDFS by Using Sqoop
Managing Real-Time Data Using Flume
Accessing HDFS from Legacy Systems
Apache PIG - MapReduce Vs Pig, Pig Use Cases
PIG’s Data Model
Pig Latin Program & Execution
Pig Latin : Relational Operators, File Loaders, Group Operator, COGROUP Operator, Joins and COGROUP, Union, Diagnostic Operators, Pig UDF
Writing JAVA UDF’s
Embedded PIG in JAVA
Use Pig to automate the design and implementation of MapReduce applications
Use Pig to apply structure to unstructured Big Data
Apache Hive - Hive Vs. PIG - Hive Use Cases
Discuss the Hive data storage principle
Explain the File formats and Records formats supported by the Hive environment
Perform operations with data in Hive
Hive QL: Joining Tables, Dynamic Partitioning, Custom Map/Reduce Scripts
Hive Script, Hive UDF
Hive Persistence formats
Loading data in Hive - Methods
Serialization & Deserialization
Handling Text data using Hive
Integrating external BI tools with Hadoop Hive
What Is Spark
What Is Scala
Learn from our comprehensive collection of project case-studies, hand-picked by industry experts, to give you an in-depth understanding of how data science moves industries like telecom, transportation, e-commerce & more.
You will be having the opportunity of 10-15 Hrs e-learning exercises along with instructor-led-training which enable candidates to get the maximum out of the subjects and empowering them to build logics to hand any new requirement.
This program has been designed in collaboration with some of the most influential analytics leader and top academician in data science.
Thanks to the digital revolution that is sweeping the world and India in particular, data scientists are now the most sought-after professionals by big corporations as well as startups. And companies across industries are rewarding good data analysts and scientists with desirable career growth and salaries.
The field of data science is thriving as it is proving to be effective not just across industries but also across departments within organizations.
6 out of 10 developers are gaining or looking to gain skills in machine learning and deep learning.
Data scientists make around 75 Lakhs on average.
India alone will need around 2,00,000 data scientists by 2020