Introduction
Data scientists are big data wranglers. They take an enormous mass of messy data points (unstructured and structured) and use their formidable skills in math, statistics and programming to clean, massage and organize them. Then they apply all their analytic powers – industry knowledge, contextual understanding, skepticism of existing assumptions – to uncover hidden solutions to business challenges. <
What is this course about?
This Data Science course will cover the whole data life cycle ranging from Data Acquisition and Data Storage using R-Hadoop concepts, Applying modeling through R programming using Machine learning algorithms and illustrate impeccable Data Visualization by leveraging on 'R' capabilities.
Who will benefit from this course?
The course is designed for all those who want to learn machine learning techniques with implementation in R language, and wish to apply these techniques on Big Data. The following professionals can go for this course:
1. Developers aspiring to be a 'Data Scientist'
2. Analytics Managers who are leading a team of analysts
3. SAS/SPSS Professionals looking to gain understanding in Big Data Analytics
4. Business Analysts who want to understand Machine Learning (ML) Techniques
5. Information Architects who want to gain expertise in Predictive Analytics
6. 'R' professionals who want to captivate and analyze Big Data
7. Hadoop Professionals who want to learn R and ML techniques
8. Analysts wanting to understand Data Science methodologies
After completion of this training course, you will be able to:
This training has a clear focus on the vital concepts of business analytics and R. By the end of the training, participants will be able to:
1. Work on data exploration, data visualization, and predictive modeling techniques with ease.
2. Gain fundamental knowledge on analytics and how it assists with decision making.
3. Work with confidence using the R language.
4. Understand and work on statistical concepts like linear & logistic regression, cluster analysis, and forecasting.
5. Develop a structured approach to use statistical techniques and R language.
6. Perform sharp data analysis to make business decisions.
What background do I need?
The pre-requisites for learning 'Mastering Data Analytics with R' include basic statistics knowledge. We provide a complimentary course "Statistics Essentials for R" to all the participants who enroll for the Data Analytics with R Training. This course helps you brush up your statistics skills.
I am from a non-technical background. Will I benefit from this course?
Yes, the course presents both the business and technical benefits of Big Data analytics and Data Visualization. The data mining and technical discussions are at a level that attendees with a business background can understand and apply. Where technical knowledge is required, sufficient guidance for all backgrounds is provided to enable activities to be completed and the learning objectives achieved.
Which Case-Studies will be a part of the Course?
Towards the end of the Course, you will be working on a live project. Here are the few Industry-wise case studies e.g. Finance, Retail, Media, Aviation, Sports etc. which you can take up as your project work:
Project#1: Flight Delay Prediction
Industry : Aviation
Description : The goal of this project is to predict the Arrival Time of a flight given the parameters like:"UniqueCarrier", "DepDelay", "AirTime", "Distance", "ArrDelay", etc. Whether these attributes affect the arrival delay and if yes, to which extent? Construct a model and predict the arrival delay.Compute the (Source Airport - Destination Airport) mean scheduled time, actual and inflight time with the help of MapReduce in R and visualize the results using R.
Project #2: Stock Market Prediction
Industry : Finance
Description : This problem is about making predictions on the stock market data.The dataset contains the daily quotes of the SP500 stock index from 1970-01-02 to 2009-09-15 (10,000+ daily sessions). For each day information is given on the Open, High, Low and Close prices, and also for the Volume and adjusted close price.
Project #3: Twitter Analytics
Industry : Social Media
Description : This problem is about social media analytics. This can be defined as Measuring, Analyzing, and Interpreting interactions and associations between people, topics and ideas. The dataset to be analyzed is captured by Live Twitter Streaming. This problem is mainly about how to use twitter analytics to find meaningful data by performing Sentiment analysis of the tweets obtained and visualizing the conclusions.
Project #4: Recommendation System
Industry : e-commerce
Description : The problem of creating recommendations given a large data set from directly elicited ratings is a widely potential area which was lately boosted by players like Amazon, Netflix, Google to name a few. In this project, you are given a collection of real world data from the different users involving the products they like, rating assigned to the product, etc. and you have to create and come up with recommendations for the users.
Project #5: NFL Data Analysis
Industry : Sports
Description : The dataset is a set of tweets by fans from a NFL game. This project is about analyzing the tweets posted by football fans all over the world on the NFL tournament semi-finals and find out insights like: top 10 most popular topics being discussed, most talked about team etc.
Methodology
PowerPoint Presentation,Handouts,Hands on Lab Practice,
Brainstorming
Contents of Training:
Section: 1
Course Introduction
1. Introduction to Course
2. Course Curriculum
3. What is Data Science?
4. Course FAQ
Section: 2
Course Best Practices
5. How to Get Help in the Course!
Quiz 1: Welcome to the Course.
6. Installation and Set-Up
Section: 3
Windows Installation Set-Up
7. Windows Installation Procedure
Section: 4
Development Environment Overview
10. Development Environment Overview
11. Course Notes
Section: 5
Introduction to R Basics
13. Introduction to R Basics
14. Arithmetic in R
15. Variables
16. R Basic Data Types
17. Vector Basics
18. Vector Operations
19. Vector Indexing and Slicing
20. Getting Help with R and RStudio
21. Comparison Operators
22. R Basics Training Exercise
23. R Basics Training Exercise - Solutions Walkthrough
Section: 6
R Matrices
24. Introduction to R Matrices
25. Creating a Matrix
26. Matrix Arithmetic
27. Matrix Operations
28. Matrix Selection and Indexing
29. Factor and Categorical Matrices
30. Matrix Training Exercise
31. Matrix Training Exercises - Solutions Walkthrough
Section: 7
R Data Frames
32. Introduction to R Data Frames
33. Data Frame Basics
34. Data Frame Indexing and Selection
35. Overview of Data Frame Operations - Part 1
36. Overview of Data Frame Operations - Part 2
37. Data Frame Training Exercise
38. Data Frame Training Exercises - Solutions Walkthrough
Section: 8
R Lists
39. List Basics
Section: 9
Data Input and Output with R
40. Introduction to Data Input and Output with R
41. CSV Files with R
42. Excel Files with R
43. SQL with R
44. Web Scraping with R
Section: 10
R Programming Basics
45. Introduction to Programming Basics
46. Logical Operators
47. if, else, and else if Statements
48. Conditional Statements Training Exercise
49. Conditional Statements Training Exercise - Solutions Walkthrough
50. While Loops
51. For Loops
52. Functions
53. Functions Training Exercise
54. Functions Training Exercise - Solutions
Section: 11
Advanced R Programming
55. Introduction to Advanced R Programming
56. Built-in R Features
57. Apply
58. Math Functions with R
59. Regular Expressions
60. Dates and Timestamps
Section: 12
Data Manipulation with R
61. Data Manipulation Overview
62. Guide to Using Dplyr
63. Guide to Using Dplyr - Part 2
64. Pipe Operator
65. Dplyr Training Exercise
66. Dplyr Training Exercise - Solutions Walkthrough
67. Guide to Using Tidyr
Section: 13
Data Visualization with R
68. Overview of ggplot2
69. Histograms
70. Scatterplots
71. Barplots
72. Boxplots
73. 2 Variable Plotting
74. Coordinates and Faceting
75. Themes
76. ggplot2 Exercises
77. ggplot2 Exercise Solutions
Section: 14
Data Visualization Project
78. Data Visualization Project
79. Data Visualization Project - Solutions Walkthrough - Part 1
80. Data Visualization Project Solutions Walkthrough - Part 2
Section: 15
Interactive Visualizations with Plotly
81. Overview of Plotly and Interactive Visualizations
82. Resources for Plotly and ggplot2
Section: 16
Capstone Data Project
83. Introduction to Capstone Project
84. Capstone Project Solutions Walkthrough
Section: 17
Introduction to Machine Learning with R
85. Introduction to Machine Learning
Section: 28
Machine Learning with R - Linear Regression
86. Introduction to Linear Regression
87. Linear Regression with R - Part 1
88. Linear Regression with R - Part 2
89. Linear Regression with R - Part 3
Section: 19
Machine Learning Project - Linear Regression
90. Introduction to Linear Regression Project
91. ML - Linear Regression Project - Solutions Part 1
92. ML - Linear Regression Project - Solutions Part 2
Section: 20
Machine Learning with R - Logistic Regression
93. Introduction to Logistic Regression
94. Logistic Regression with R - Part 1
95. Logistic Regression with R - Part 2
Section: 21
Machine Learning Project - Logistic Regression
96. Introduction to Logistic Regression Project
97. Logistic Regression Project Solutions - Part 1
98. Logistic Regression Project Solutions - Part 2
99. Logistic Regression Project - Solutions Part 3
Section: 22
Machine Learning with R - K Nearest Neighbors
100. Introduction to K Nearest Neighbors
101. K Nearest Neighbors with R
Section: 23
Machine Learning Project - K Nearest Neighbors
102. Introduction K Nearest Neighbors Project
103. K Nearest Neighbors Project Solutions
Section: 24
Machine Learning with R - Decision Trees and Random Forests
104. Introduction to Tree Methods
105. Decision Trees and Random Forests with R
Section: 25
Machine Learning Project - Decision Trees and Random Forests
106. Introduction to Decision Trees and Random Forests Project
107. Tree Methods Project Solutions - Part 1
108. Tree Methods Project Solutions - Part 2
Section: 26
Machine Learning with R - Support Vector Machines
109. Introduction to Support Vector Machines
110. Support Vector Machines with R
Section: 27
Machine Learning Project - Support Vector Machines
111. Introduction to SVM Project
112. Support Vector Machines Project - Solutions Part 1
113. Support Vector Machines Project - Solutions Part 2
Section: 28
Machine Learning with R - K-means Clustering
114. Introduction to K-Means Clustering
115. K Means Clustering with R
`
Section: 29
Machine Learning Project - K-means Clustering
116. Introduction to K Means Clustering Project
117. K Means Clustering Project - Solutions Walkthrough
Section: 30
Machine Learning with R - Natural Language Processing
118. Introduction to Natural Language Processing
119. Natural Language Processing with R - Part 1
120. Natural Language Processing with R - Part 2
Section: 31
Machine Learning with R - Neural Nets
121. Introduction to Neural Nets
122. Neural Nets with R
Section: 32
Machine Learning Project - Neural Nets
123. Introduction to Neural Nets Project
124. Neural Nets Project - Solutions
Section: 33
Statistics
Introduction
125. Qualitative Data
126. Frequency Distribution of Qualitative Data
127. Relative Frequency Distribution of Qualitative Data
128. Bar Graph
129. Pie Chart
130. Category Statistics
Quantitative Data
131. Frequency Distribution of Quantitative Data
132. Histogram
133. Relative Frequency Distribution of Quantitative Data
134. Cumulative Frequency Distribution
135. Cumulative Frequency Graph
136. Cumulative Relative Frequency Distribution
137. Cumulative Relative Frequency Graph
138. Stem-and-Leaf Plo
t
139. Scatter Plot
Numerical Measures
140. Mean
141. Median
142. Quartile
143. Percentile
144. Range
145. Interquartile Range
146. Box Plot
147. Variance
148. Standard Deviation
149. Covariance
150. Correlation Coefficient
151. Central Moment
152. Skewness
153. Kurtosis
Section: 34
Probability Distributions
154. Binomial Distribution
155. Poisson Distribution
156. Continuous Uniform Distribution
157. Exponential Distribution
158. Normal Distribution
159. Chi-squared Distribution
160. Student t Distribution
161. F Distribution
Descriptive Statistics
162. Using Base R to Generate Statistical Indicators
163. Descriptive Statistics with the psych Package
164. Descriptive Statistics with the pastecs Package
165. Determining the Skewness and Kurtosis
166. Computing Quantiles
167. Determining the Mode
168. Getting the Statistical Indicators by Group with DoBy
169. Getting the Statistical Indicators by Group with DescribeBy
170. Getting the Statistical Indicators by Group with stats
Section: 35
Creating Frequency Tables and Cross Tables
171. Frequency Tables in Base R
172. Frequency Tables with plyr
173. Building Cross Tables using xtabs
174. Building Cross Tables with CrossTable
Building Charts
175. Histograms
176. Cumulative Frequency Line Charts
177. Column Charts
178. Mean Plot Charts
179. Scatterplot Charts
180. Boxplot Charts
Checking Assumptions
181. Checking the Normality Assumption - Numerical Method
182. Checking the Normality Assumption - Graphical Methods
183. Detecting the Outliers
Performing Univariate Analyses
184. One-Sample T Test
185. Binomial Test
186. Chi-Square Test for Goodness-of-Fit
Section: 36
DataBase
Data Extraction, Filtering, and Aggregation
187. Getting Started
188. Writing your first query
189. Filters and Operands
190. Aggregate Functions
191. Grouping Aggregate Data with Group BY
Section: 37
Sorting, Conditional Filtering and Fuzzy Comparisons
192. rder By and Limit
193. Conditional Filtering with Case Statements
194. Comparisons using LIKE
195. Filtering the output of a query using HAVING
Section: 38
Multiple Tables and Dates
196. Joining tables together
197. Nested Queries
198. Working with Dates