Hands-on data warehousing with Azure Data Factory : ETL techniques to load and transform data from various sources, both on-premises and on cloud / Christian Coté, Michelle Gutzait, Giuseppe Ciaburro.
2018
QA76.9.D37
Formats
| Format | |
|---|---|
| BibTeX | |
| MARCXML | |
| TextMARC | |
| MARC | |
| DublinCore | |
| EndNote | |
| NLM | |
| RefWorks | |
| RIS |
Linked e-resources
Details
Title
Hands-on data warehousing with Azure Data Factory : ETL techniques to load and transform data from various sources, both on-premises and on cloud / Christian Coté, Michelle Gutzait, Giuseppe Ciaburro.
Author
ISBN
9781789130096 (electronic bk.)
1789130093 (electronic bk.)
9781789137620
1789137624
1789130093 (electronic bk.)
9781789137620
1789137624
Published
Birmingham, UK : Packt Publishing, 2018.
Language
English
Description
1 online resource : illustrations
Call Number
QA76.9.D37
System Control No.
(OCoLC)1042342224
Summary
Leverage the power of Microsoft Azure Data Factory v2 to build hybrid data solutions About This Book Combine the power of Azure Data Factory v2 and SQL Server Integration Services Design and enhance performance and scalability of a modern ETL hybrid solution Interact with the loaded data in data warehouse and data lake using Power BI Who This Book Is For This book is for you if you are a software professional who develops and implements ETL solutions using Microsoft SQL Server or Azure cloud. It will be an added advantage if you are a software engineer, DW/ETL architect, or ETL developer, and know how to create a new ETL implementation or enhance an existing one with ADF or SSIS. What You Will Learn Understand the key components of an ETL solution using Azure Data Factory and Integration Services Design the architecture of a modern ETL hybrid solution Implement ETL solutions for both on-premises and Azure data Improve the performance and scalability of your ETL solution Gain thorough knowledge of new capabilities and features added to Azure Data Factory and Integration Services In Detail ETL is one of the essential techniques in data processing. Given data is everywhere, ETL will always be the vital process to handle data from different sources. Hands-On Data Warehousing with Azure Data Factory starts with the basic concepts of data warehousing and ETL process. You will learn how Azure Data Factory and SSIS can be used to understand the key components of an ETL solution. You will go through different services offered by Azure that can be used by ADF and SSIS, such as Azure Data Lake Analytics, Machine Learning and Databrick's Spark with the help of practical examples. You will explore how to design and implement ETL hybrid solutions using different integration services with a step-by-step approach. Once you get to grips with all this, you will use Power BI to interact with data coming from different sources in order to reveal valuable insights. By the end of this book, you will not only learn how to build your own ETL solutions but also address the key challenges that are faced while building them. Style and approach A step-by-step guide to develop data movement code using SSIS, Azure Data Factory, and database stored procedures for implementing intelligent BI solutions. Downloading the example code for this book You can download the example code files for all Packt books you have purchased from your account at http://www.PacktPub.com. If you ...
Formatted Contents Note
Cover
Title Page
Copyright and Credits
Packt Upsell
Contributors
Table of Contents
Preface
Chapter 1: The Modern Data Warehouse
The need for a data warehouse
Driven by IT
Self-service BI
Cloud-based BI - big data and artificial intelligence
The modern data warehouse
Main components of a data warehouse
Staging area
Data warehouse
Cubes
Consumption layer - BI and analytics
What is Azure Data Factory
Limitations of ADF V1.0
What's new in V2.0?
Integration runtime
Linked services
Datasets
Pipelines
Activities
Parameters
Expressions
Controlling the flow of activities
SSIS package deployment in Azure
Spark cluster data store
Summary
Chapter 2: Getting Started with Our First Data Factory
Resource group
Azure Data Factory
Datasets
Linked services
Integration runtimes
Activities
Monitoring the data factory pipeline runs
Azure Blob storage
Blob containers
Types of blobs
Block blobs
Page blobs
Replication of storage
Creating an Azure Blob storage account
SQL Azure database
Creating the Azure SQL Server
Attaching the BACPAC to our database
Copying data using our data factory
Summary
Chapter 3: SSIS Lift and Shift
SSIS in ADF
Sample setup
Sample databases
SSIS components
Integration services catalog setup
Sample solution in Visual Studio
Deploying the project on-premises
Leveraging our package in ADF V2
Integration runtimes
Azure integration runtime
Self-hosted runtime
SSIS integration runtime
Adding an SSIS integration runtime to the factory
SSIS execution from a pipeline
Summary
Chapter 4: Azure Data Lake
Creating and configuring Data Lake Store
Next Steps
Ways to copy/import data from a database to the Data Lake.
Ways to store imported data in files in the Data Lake
Easily moving data to the Data Lake Store
Ways to directly copy files into the Data Lake
Prerequisites for the next steps
Creating a Data Lake Analytics resource
Using the data factory to manipulate data in the Data Lake
Task 1 - copy/import data from SQL Server to a blob storage file using data factory
Task 2 - run a U-SQL task from the data factory pipeline to summarize data
Service principal authentication
Run U-SQL from a job in the Data Lake Analytics
Summary
Chapter 5: Machine Learning on the Cloud
Machine learning overview
Machine learning algorithms
Supervised learning
Unsupervised learning
Reinforcement learning
Machine learning tasks
Making predictions with regression algorithms
Automated classification using machine learning
Identifying groups using clustering methods
Dimensionality reduction to improve performance
Feature selection
Feature extraction
Azure Machine Learning Studio
Azure Machine Learning Studio account
Azure Machine Learning Studio experiment
Dataset
Module
Work area
Breast cancer detection
Get the data
Prepare the data
Train the model
Score and evaluate the model
Summary
Chapter 6: Introduction to Azure Databricks
Azure Databricks setup
Prepare the data to ingest
Setting up the folder in the Azure storage account
Self-hosted integration runtime
Linked service setup
Datasets setup
SQL Server dataset
Blob storage dataset
Linked service
Dataset
Copy data from SQL Server to sales-data
Publish and trigger the copy activity
Databricks notebook
Calling Databricks notebook execution in ADF
Summary
Chapter 7: Reporting on the Modern Data Warehouse
Different types of BI
Self-service - personal.
Team BI - sharing personal BI data
Corporate BI
Power BI Premium
Power BI Report Server
Power BI consumption
Creating our Power BI reports
Reporting with on-premise data sources
Incorporating Spark data
Summary
Index.
Title Page
Copyright and Credits
Packt Upsell
Contributors
Table of Contents
Preface
Chapter 1: The Modern Data Warehouse
The need for a data warehouse
Driven by IT
Self-service BI
Cloud-based BI - big data and artificial intelligence
The modern data warehouse
Main components of a data warehouse
Staging area
Data warehouse
Cubes
Consumption layer - BI and analytics
What is Azure Data Factory
Limitations of ADF V1.0
What's new in V2.0?
Integration runtime
Linked services
Datasets
Pipelines
Activities
Parameters
Expressions
Controlling the flow of activities
SSIS package deployment in Azure
Spark cluster data store
Summary
Chapter 2: Getting Started with Our First Data Factory
Resource group
Azure Data Factory
Datasets
Linked services
Integration runtimes
Activities
Monitoring the data factory pipeline runs
Azure Blob storage
Blob containers
Types of blobs
Block blobs
Page blobs
Replication of storage
Creating an Azure Blob storage account
SQL Azure database
Creating the Azure SQL Server
Attaching the BACPAC to our database
Copying data using our data factory
Summary
Chapter 3: SSIS Lift and Shift
SSIS in ADF
Sample setup
Sample databases
SSIS components
Integration services catalog setup
Sample solution in Visual Studio
Deploying the project on-premises
Leveraging our package in ADF V2
Integration runtimes
Azure integration runtime
Self-hosted runtime
SSIS integration runtime
Adding an SSIS integration runtime to the factory
SSIS execution from a pipeline
Summary
Chapter 4: Azure Data Lake
Creating and configuring Data Lake Store
Next Steps
Ways to copy/import data from a database to the Data Lake.
Ways to store imported data in files in the Data Lake
Easily moving data to the Data Lake Store
Ways to directly copy files into the Data Lake
Prerequisites for the next steps
Creating a Data Lake Analytics resource
Using the data factory to manipulate data in the Data Lake
Task 1 - copy/import data from SQL Server to a blob storage file using data factory
Task 2 - run a U-SQL task from the data factory pipeline to summarize data
Service principal authentication
Run U-SQL from a job in the Data Lake Analytics
Summary
Chapter 5: Machine Learning on the Cloud
Machine learning overview
Machine learning algorithms
Supervised learning
Unsupervised learning
Reinforcement learning
Machine learning tasks
Making predictions with regression algorithms
Automated classification using machine learning
Identifying groups using clustering methods
Dimensionality reduction to improve performance
Feature selection
Feature extraction
Azure Machine Learning Studio
Azure Machine Learning Studio account
Azure Machine Learning Studio experiment
Dataset
Module
Work area
Breast cancer detection
Get the data
Prepare the data
Train the model
Score and evaluate the model
Summary
Chapter 6: Introduction to Azure Databricks
Azure Databricks setup
Prepare the data to ingest
Setting up the folder in the Azure storage account
Self-hosted integration runtime
Linked service setup
Datasets setup
SQL Server dataset
Blob storage dataset
Linked service
Dataset
Copy data from SQL Server to sales-data
Publish and trigger the copy activity
Databricks notebook
Calling Databricks notebook execution in ADF
Summary
Chapter 7: Reporting on the Modern Data Warehouse
Different types of BI
Self-service - personal.
Team BI - sharing personal BI data
Corporate BI
Power BI Premium
Power BI Report Server
Power BI consumption
Creating our Power BI reports
Reporting with on-premise data sources
Incorporating Spark data
Summary
Index.
Digital File Characteristics
data file
Source of Description
Online resource; title from title page (Safari, viewed June 29, 2018).
Linked Resources
Record Appears in