Pentaho Hadoop Framework Fundamentals | Online Training - Mildain
  • pentaho and hadoop framework

Pentaho and Hadoop Framework Fundamentals Training

Why take Pentaho and Hadoop Fundamental Online Training ?

  • There is an excessive need for professionals skilled in data integration for better salary and excellent job opportunities.

  • There is an excessive need for professionals skilled in data integration for better salary and excellent job opportunities.

Product Description

  • Training Course 

This course is designed to introduce you to various big data concepts with the Hadoop framework of technologies and Pentaho products. Building upon Pentaho Data Integration Fundamentals, you will learn how Pentaho works with the following Hadoop Framework technologies:

  • HDFS
  • Sqoop
  • Pig
  • Oozie
  • MapReduce
  • YARN
  • Hive
  • Impala
  • HBase
  • Flume

This course focuses heavily on labs to allow you practical hands-on application of the topics covered in each section.

  • Objective: 

 

  • COURSE BENEFITS

  • Improve productivity by giving your data integration team the skills they need to use Pentaho Data Integration with Hadoop data sources
  • Implement the Streamlined Data Refinery big data blueprint using PDI and Hadoop.
  • Interactive, hands-on training materials significantly improve skill development and maximize retention

  • SKILLS ACHIEVED

At the completion of this course, you should be able to:

  • Use Hadoop technologies from the native command line and with Pentaho Data Integration
  • Employ data ingestion and processing best practices
  • Use Pentaho Interactive Reporting and Analyzer to report from Impala

 

DAY 1

  • MODULE 1: INTRODUCTION TO PENTAHO AND BIG DATA

Lesson 1: Big Data Architectures

Lesson 2: Streamlined Data Refinery Use Case

Lesson 3: Overview of Pentaho Tools

Exercise 1: Exploring the Environment

  • MODULE 2: HDFS 

  • Lesson 1:Basics of HDFS
  • Understanding How HDFS Reads/Writes Data
  • Data Replication and Fault-Tolerance

Lesson 2: HDFS Best Practices

  • Files Sizes
  • File Types
  • Compression

Lesson 3: Pentaho Data Integration with HDFS

  • Import/Export Data Between Local File System and HDFS
  • File Management Steps
  • PDI/HDFS Best Practices

  • MODULE 3: DATA INGESTION

Lesson 1: Sqoop

  • Sqoop Basics
  • PDI and Sqoop

Exercise 3: Import/Export Data between a DB and Hadoop Using Sqoop

Lesson 2: Flume

  • Flume Basics
  • PDI and Flume

Lesson 3: Data Ingestion Best Practices

  • PDI vs. Sqoop vs. Flume
  • Aggregating Smaller Files into Bigger Files
  • Best Ways to Store Non-Splitable Files Like XML and JSON

  • MODULE 4: DATA PROCESSING

Lesson 1: MapReduce Concepts

  • Mapper
  • Reducer
  • Combiner

Lesson 2: MR1 Architecture

  • Driver
  • Job Tracker
  • Task Tracker
  • Shuffle/Sort
  • Partioner

Lesson 3: YARN / MR2

  • YARN Basics
  • MR2 Architecture

Lesson 4: Using PDI to Write MapReduce

  • Developing MR Using PDI
  • MR1 vs. YARN/MR2 in PDI

Lesson 5: PDI/MR Best Practices

  • The Do’s and Don’ts of Writing PDI and MapReduce Apps
  • Using Hadoop’s Distributed Cache
  • Compression

Exercise 4:Using PDI to Develop PMR on MR2/YARN

Lesson 6: PDI/Carte on YARN

  • Basics of Carte on YARN

Exercise 5:Build a Transformation that Runs on YARN

Lesson 7: Pig

  • Pig Basics
  • PDI and Pig

Exercise 6: Run the Pig Application and Execute a Pig Script Using PDI

  • MODULE 5: JOB ORCHESTRATION

Lesson 1: Oozie Basics

Lesson 2: PDI Job Orchestration Features Lesson 3: PDI and Oozie

Exercise 7: Run the Oozie Application and Execute an Oozie Script from PDI

  • MODULE 6: HADOOP AND SQL

Lesson 1: Traditional Hive

Lesson 2: Hive/TEZ

Lesson 3: Impala

Lesson 4: Using PDI with Hive and Impala Lesson 5: SQL/Hadoop/PDI Best Practices

Exercise 8: Working with Impala using the Command Line, HUE, and PDI

  • MODULE 7: HBASE 

Lesson 1: HBase Basics

Lesson 2: PDI and HBase

Lesson 3: PD/HBase Best Practices

Exercise 9: Working with HBase and PDI

  • MODULE 8: REPORTING ON BIG DATA

Lesson 1: Using Pentaho Report Designer with Hadoop Lesson 2: Using Pentaho Analyzer with Hadoop

Lesson 3: Best Practices for Reporting on Data in Hadoop

Exercise 9: Create a PRD and Analyzer Report using Data in Hadoop

  • MODULE 9: ADDITIONAL PENTAHO AND BIG DATA TECHNOLOGIES

Lesson 1: Flume and PDI

Lesson 2: Storm and PDI

Lesson 3: Kafka and PDI

  • Approx 01 Weeks

  • Sessions:-12 Sessions (each session of 02 hours)

Mildain Solutions

Plot No 17, C Block Market, Sec 36,
Noida (U.P.)-201301(India)
(+91) 1204326873
(+91) 7834800821
Contact Us

Your Name (required)

Your Email (required)

Contact Number

Course

Location

Query

Reviews

There are no reviews yet.

Be the first to review “Pentaho and Hadoop Framework Fundamentals Training”

Your email address will not be published. Required fields are marked *

Grab Corporate/Online Training at best Cost.