Apache Spark for data science cookbook : overinsightful 90 recipes to get lightning-fast analytics with Apache Spark / Padma Priya Chitturi.
2016
QA76.9.D343
Formats
Format | |
---|---|
BibTeX | |
MARCXML | |
TextMARC | |
MARC | |
DublinCore | |
EndNote | |
NLM | |
RefWorks | |
RIS |
Linked e-resources
Details
Title
Apache Spark for data science cookbook : overinsightful 90 recipes to get lightning-fast analytics with Apache Spark / Padma Priya Chitturi.
ISBN
9781785288807 (electronic bk.)
1785288806 (electronic bk.)
9781785880100
1785288806 (electronic bk.)
9781785880100
Published
Birmingham, UK : Packt Publishing, 2016.
Language
English
Description
1 online resource (1 volume) : illustrations
Call Number
QA76.9.D343
System Control No.
(OCoLC)969355608
Summary
Over insightful 90 recipes to get lightning-fast analytics with Apache Spark About This Book Use Apache Spark for data processing with these hands-on recipes Implement end-to-end, large-scale data analysis better than ever before Work with powerful libraries such as MLLib, SciPy, NumPy, and Pandas to gain insights from your data Who This Book Is For This book is for novice and intermediate level data science professionals and data analysts who want to solve data science problems with a distributed computing framework. Basic experience with data science implementation tasks is expected. Data science professionals looking to skill up and gain an edge in the field will find this book helpful. What You Will Learn Explore the topics of data mining, text mining, Natural Language Processing, information retrieval, and machine learning. Solve real-world analytical problems with large data sets. Address data science challenges with analytical tools on a distributed system like Spark (apt for iterative algorithms), which offers in-memory processing and more flexibility for data analysis at scale. Get hands-on experience with algorithms like Classification, regression, and recommendation on real datasets using Spark MLLib package. Learn about numerical and scientific computing using NumPy and SciPy on Spark. Use Predictive Model Markup Language (PMML) in Spark for statistical data mining models. In Detail Spark has emerged as the most promising big data analytics engine for data science professionals. The true power and value of Apache Spark lies in its ability to execute data science tasks with speed and accuracy. Spark's selling point is that it combines ETL, batch analytics, real-time stream analysis, machine learning, graph processing, and visualizations. It lets you tackle the complexities that come with raw unstructured data sets with ease. This guide will get you comfortable and confident performing data science tasks with Spark. You will learn about implementations including distributed deep learning, numerical computing, and scalable machine learning. You will be shown effective solutions to problematic concepts in data science using Spark's data science libraries such as MLLib, Pandas, NumPy, SciPy, and more. These simple and efficient recipes will show you how to implement algorithms and optimize your work. Style and approach This book contains a comprehensive range of recipes designed to help you learn the fundamentals and tackle the difficul...
Formatted Contents Note
Cover
Copyright
Credits
About the Author
About the Reviewer
www.PacktPub.com
Customer Feedback
Table of Contents
Preface
Chapter 1: Big Data Analytics with Spark
Introduction
Initializing SparkContext
Getting ready
How to do it...
How it works...
There's more...
See also
Working with Spark's Python and Scala shells
How to do it...
How it works...
There's more...
See also
Building standalone applications
Getting ready
How to do it...
How it works...
There's more...
See also
Working with the Spark programming model
How to do it...
How it works...
There's more...
See also
Working with pair RDDs
Getting ready
How to do it...
How it works...
There's more...
See also
Persisting RDDs
Getting ready
How to do it...
How it works...
There's more...
See also
Loading and saving data
Getting ready
How to do it...
How it works...
There's more...
See also
Creating broadcast variables and accumulators
Getting ready
How to do it...
How it works...
There's more...
See also
Submitting applications to a cluster
Getting ready
How to do it...
How it works...
There's more...
See also
Working with DataFrames
Getting ready
How to do it...
How it works...
There's more...
See also
Working with Spark Streaming
Getting ready
How to do it...
How it works...
There's more...
See also
Chapter 2: Tricky Statistics with Spark
Introduction
Working with Pandas
Variable identification
Getting ready
How to do it...
How it works...
There's more...
See also
Sampling data
Getting ready
How to do it...
How it works...
There's more...
See also
Summary and descriptive statistics
Getting ready
How to do it...
How it works...
There's more...
See also
Generating frequency tables.
Getting ready
How to do it...
How it works...
There's more...
See also
Installing Pandas on Linux
Getting ready
How to do it...
How it works...
There's more...
See also
Installing Pandas from source
Getting ready
How to do it...
How it works...
There's more...
See also
Using IPython with PySpark
Getting ready
How to do it...
How it work...
There's more...
See also
Creating Pandas DataFrames over Spark
Getting ready
How to do it...
How it works...
There's more...
See also
Splitting, slicing, sorting, filtering, and grouping DataFrames over Spark
Getting ready
How to do it...
How it works...
There's more...
See also
Implementing co-variance and correlation using Pandas
Getting ready
How to do it...
How it works...
There's more...
See also
Concatenating and merging operations over DataFrames
Getting ready
How to do it...
How it works...
There's more...
See also
Complex operations over DataFrames
Getting ready
How to do it...
How it works...
There's more...
See also
Sparkling Pandas
Getting ready
How to do it...
How it works...
There's more...
See also
Chapter 3: Data Analysis with Spark
Introduction
Univariate analysis
Getting ready
How to do it...
How it works...
There's more...
See also
Bivariate analysis
Getting ready
How to do it...
How it works...
There's more...
See also
Missing value treatment
Getting ready
How to do it...
How it works...
There's more...
See also
Outlier detection
Getting ready
How to do it...
How it works...
There's more...
See also
Use case
analyzing the MovieLens dataset
Getting ready
How to do it...
How it works...
There's more...
See also
Use case
analyzing the Uber dataset
Getting ready
How to do it...
How it works...
There's more....
See also
Chapter 4: Clustering, Classification, and Regression
Introduction
Supervised learning
Unsupervised learning
Applying regression analysis for sales data
Variable identification
Getting ready
How to do it...
How it works...
There's more...
See also
Data exploration
Getting ready
How to do it...
How it works...
There's more...
See also
Feature engineering
Getting ready
How to do it...
How it works...
There's more...
See also
Applying linear regression
Getting ready
How to do it...
How it works...
There's more...
See also
Applying logistic regression on bank marketing data
Variable identification
Getting ready
How to do it...
How it works...
There's more...
See also
Data exploration
Getting ready
How to do it...
How it works...
There's more...
See also
Feature engineering
Getting ready
How to do it...
How it works...
There's more...
See also
Applying logistic regression
Getting ready
How to do it...
How it works...
There's more...
See also
Real-time intrusion detection using streaming k-means
Variable identification
Getting ready
How to do it...
How it works...
There's more...
See also
Simulating real-time data
Getting ready
How to do it...
How it works...
There's more...
See also
Applying streaming k-means
Getting ready
How to do it...
How it works...
There's more...
See also
Chapter 5: Working with Spark MLlib
Introduction
Working with Spark ML pipelines
Implementing Naive Bayes' classification
Getting ready
How to do it...
How it works...
There's more...
See also
Implementing decision trees
Getting ready
How to do it...
How it works...
There's more...
See also
Building a recommendation system
Getting ready
How to do it...
How it works....
There's more...
See also
Implementing logistic regression using Spark ML pipelines
Getting ready
How to do it...
How it works...
There's more...
See also
Chapter 6: NLP with Spark
Introduction
Installing NLTK on Linux
Getting ready
How to do it...
How it works...
There's more...
See also
Installing Anaconda on Linux
Getting ready
How to do it...
How it works...
There's more...
See also
Anaconda for cluster management
Getting ready
How to do it...
How it works...
There's more...
See also
POS tagging with PySpark on an Anaconda cluster
Getting ready
How to do it...
How it works...
There's more...
See also
NER with IPython over Spark
Getting ready
How to do it...
How it works...
There's more...
See also
Implementing openNLP
chunker over Spark
Getting ready
How to do it...
How it works...
There's more...
See also
Implementing openNLP
sentence detector over Spark
Getting ready
How to do it...
How it works...
There's more...
See also
Implementing stanford NLP
lemmatization over Spark
Getting ready
How to do it...
How it works...
There's more...
See also
Implementing sentiment analysis using stanford NLP over Spark
Getting ready
How to do it...
How it works...
There's more...
See also
Chapter 7: Working with Sparkling Water
H2O
Introduction
Features
Working with H2O on Spark
Getting ready
How to do it...
How it works...
There's more...
See also
Implementing k-means using H2O over Spark
Getting ready
How to do it...
How it works...
There's more...
See also
Implementing spam detection with Sparkling Water
Getting ready
How to do it...
How it works...
There's more...
See also
Deep learning with airlines and weather data
Getting ready
How to do it...
How it works....
There's more...
See also
Implementing a crime detection application
Getting ready
How to do it...
How it works...
There's more...
See also
Running SVM with H2O over Spark
Getting ready
How to do it...
How it works...
There's more...
See also
Chapter 8: Data Visualization with Spark
Introduction
Visualization using Zeppelin
Getting ready
How to do it...
Installing Zeppelin
Customizing Zeppelin's server and websocket port
Visualizing data on HDFS
parameterizing inputs
Running custom functions
Adding external dependencies to Zeppelin
Pointing to an external Spark Cluster
How to do it...
How it works...
There's more...
See also
Creating scatter plots with Bokeh-Scala
Getting ready
How to do it...
How it works...
There's more...
See also
Creating a time series MultiPlot with Bokeh-Scala
Getting ready
How to do it...
How it work...
There's more...
See also
Creating plots with the lightning visualization server
Getting ready
How to do it...
How it works...
There's more...
See also
Visualize machine learning models with Databricks notebook
Getting ready
How to do it...
How it works...
There's more...
See also
Chapter 9: Deep Learning on Spark
Introduction
Installing CaffeOnSpark
Getting ready
How to do it...
How it works...
There's more...
See also
Working with CaffeOnSpark
Getting ready
How to do it...
How it works...
There's more...
See also
Running a feed-forward neural network with DeepLearning 4j over Spark
Getting ready
How to do it...
How it works...
There's more...
See also
Running an RBM with DeepLearning4j over Spark
Getting ready
How to do it...
How it works...
There's more...
See also
Running a CNN for learning MNIST with DeepLearning4j over Spark
Getting ready
How to do it....
Copyright
Credits
About the Author
About the Reviewer
www.PacktPub.com
Customer Feedback
Table of Contents
Preface
Chapter 1: Big Data Analytics with Spark
Introduction
Initializing SparkContext
Getting ready
How to do it...
How it works...
There's more...
See also
Working with Spark's Python and Scala shells
How to do it...
How it works...
There's more...
See also
Building standalone applications
Getting ready
How to do it...
How it works...
There's more...
See also
Working with the Spark programming model
How to do it...
How it works...
There's more...
See also
Working with pair RDDs
Getting ready
How to do it...
How it works...
There's more...
See also
Persisting RDDs
Getting ready
How to do it...
How it works...
There's more...
See also
Loading and saving data
Getting ready
How to do it...
How it works...
There's more...
See also
Creating broadcast variables and accumulators
Getting ready
How to do it...
How it works...
There's more...
See also
Submitting applications to a cluster
Getting ready
How to do it...
How it works...
There's more...
See also
Working with DataFrames
Getting ready
How to do it...
How it works...
There's more...
See also
Working with Spark Streaming
Getting ready
How to do it...
How it works...
There's more...
See also
Chapter 2: Tricky Statistics with Spark
Introduction
Working with Pandas
Variable identification
Getting ready
How to do it...
How it works...
There's more...
See also
Sampling data
Getting ready
How to do it...
How it works...
There's more...
See also
Summary and descriptive statistics
Getting ready
How to do it...
How it works...
There's more...
See also
Generating frequency tables.
Getting ready
How to do it...
How it works...
There's more...
See also
Installing Pandas on Linux
Getting ready
How to do it...
How it works...
There's more...
See also
Installing Pandas from source
Getting ready
How to do it...
How it works...
There's more...
See also
Using IPython with PySpark
Getting ready
How to do it...
How it work...
There's more...
See also
Creating Pandas DataFrames over Spark
Getting ready
How to do it...
How it works...
There's more...
See also
Splitting, slicing, sorting, filtering, and grouping DataFrames over Spark
Getting ready
How to do it...
How it works...
There's more...
See also
Implementing co-variance and correlation using Pandas
Getting ready
How to do it...
How it works...
There's more...
See also
Concatenating and merging operations over DataFrames
Getting ready
How to do it...
How it works...
There's more...
See also
Complex operations over DataFrames
Getting ready
How to do it...
How it works...
There's more...
See also
Sparkling Pandas
Getting ready
How to do it...
How it works...
There's more...
See also
Chapter 3: Data Analysis with Spark
Introduction
Univariate analysis
Getting ready
How to do it...
How it works...
There's more...
See also
Bivariate analysis
Getting ready
How to do it...
How it works...
There's more...
See also
Missing value treatment
Getting ready
How to do it...
How it works...
There's more...
See also
Outlier detection
Getting ready
How to do it...
How it works...
There's more...
See also
Use case
analyzing the MovieLens dataset
Getting ready
How to do it...
How it works...
There's more...
See also
Use case
analyzing the Uber dataset
Getting ready
How to do it...
How it works...
There's more....
See also
Chapter 4: Clustering, Classification, and Regression
Introduction
Supervised learning
Unsupervised learning
Applying regression analysis for sales data
Variable identification
Getting ready
How to do it...
How it works...
There's more...
See also
Data exploration
Getting ready
How to do it...
How it works...
There's more...
See also
Feature engineering
Getting ready
How to do it...
How it works...
There's more...
See also
Applying linear regression
Getting ready
How to do it...
How it works...
There's more...
See also
Applying logistic regression on bank marketing data
Variable identification
Getting ready
How to do it...
How it works...
There's more...
See also
Data exploration
Getting ready
How to do it...
How it works...
There's more...
See also
Feature engineering
Getting ready
How to do it...
How it works...
There's more...
See also
Applying logistic regression
Getting ready
How to do it...
How it works...
There's more...
See also
Real-time intrusion detection using streaming k-means
Variable identification
Getting ready
How to do it...
How it works...
There's more...
See also
Simulating real-time data
Getting ready
How to do it...
How it works...
There's more...
See also
Applying streaming k-means
Getting ready
How to do it...
How it works...
There's more...
See also
Chapter 5: Working with Spark MLlib
Introduction
Working with Spark ML pipelines
Implementing Naive Bayes' classification
Getting ready
How to do it...
How it works...
There's more...
See also
Implementing decision trees
Getting ready
How to do it...
How it works...
There's more...
See also
Building a recommendation system
Getting ready
How to do it...
How it works....
There's more...
See also
Implementing logistic regression using Spark ML pipelines
Getting ready
How to do it...
How it works...
There's more...
See also
Chapter 6: NLP with Spark
Introduction
Installing NLTK on Linux
Getting ready
How to do it...
How it works...
There's more...
See also
Installing Anaconda on Linux
Getting ready
How to do it...
How it works...
There's more...
See also
Anaconda for cluster management
Getting ready
How to do it...
How it works...
There's more...
See also
POS tagging with PySpark on an Anaconda cluster
Getting ready
How to do it...
How it works...
There's more...
See also
NER with IPython over Spark
Getting ready
How to do it...
How it works...
There's more...
See also
Implementing openNLP
chunker over Spark
Getting ready
How to do it...
How it works...
There's more...
See also
Implementing openNLP
sentence detector over Spark
Getting ready
How to do it...
How it works...
There's more...
See also
Implementing stanford NLP
lemmatization over Spark
Getting ready
How to do it...
How it works...
There's more...
See also
Implementing sentiment analysis using stanford NLP over Spark
Getting ready
How to do it...
How it works...
There's more...
See also
Chapter 7: Working with Sparkling Water
H2O
Introduction
Features
Working with H2O on Spark
Getting ready
How to do it...
How it works...
There's more...
See also
Implementing k-means using H2O over Spark
Getting ready
How to do it...
How it works...
There's more...
See also
Implementing spam detection with Sparkling Water
Getting ready
How to do it...
How it works...
There's more...
See also
Deep learning with airlines and weather data
Getting ready
How to do it...
How it works....
There's more...
See also
Implementing a crime detection application
Getting ready
How to do it...
How it works...
There's more...
See also
Running SVM with H2O over Spark
Getting ready
How to do it...
How it works...
There's more...
See also
Chapter 8: Data Visualization with Spark
Introduction
Visualization using Zeppelin
Getting ready
How to do it...
Installing Zeppelin
Customizing Zeppelin's server and websocket port
Visualizing data on HDFS
parameterizing inputs
Running custom functions
Adding external dependencies to Zeppelin
Pointing to an external Spark Cluster
How to do it...
How it works...
There's more...
See also
Creating scatter plots with Bokeh-Scala
Getting ready
How to do it...
How it works...
There's more...
See also
Creating a time series MultiPlot with Bokeh-Scala
Getting ready
How to do it...
How it work...
There's more...
See also
Creating plots with the lightning visualization server
Getting ready
How to do it...
How it works...
There's more...
See also
Visualize machine learning models with Databricks notebook
Getting ready
How to do it...
How it works...
There's more...
See also
Chapter 9: Deep Learning on Spark
Introduction
Installing CaffeOnSpark
Getting ready
How to do it...
How it works...
There's more...
See also
Working with CaffeOnSpark
Getting ready
How to do it...
How it works...
There's more...
See also
Running a feed-forward neural network with DeepLearning 4j over Spark
Getting ready
How to do it...
How it works...
There's more...
See also
Running an RBM with DeepLearning4j over Spark
Getting ready
How to do it...
How it works...
There's more...
See also
Running a CNN for learning MNIST with DeepLearning4j over Spark
Getting ready
How to do it....
Source of Description
Description based on online resource; title from cover (Safari, viewed January 17, 2017).
Linked Resources
Record Appears in