Welcome to Introduction to R for Data Science Session 6: Linear Regression + EDA, and Normality tests [Linear Regression in R: Exploratory Data Analysis, assumptions of the simple linear model, correlation, and visualization. Predictions from the linear model. Confidence Intervals and Residuals. Inspecting the basic linear model. Infulential cases and the Influence Plot.]The course is co-organized by Data Science Serbia and Startit. You will find all course material (R scripts, data sets, SlideShare presentations, readings) on these pages.Check out the Course Overview to acess the learning material presented thus far.Data Science Serbia Course Pages [in Serbian]Startit Course Pages [in Serbian]Lecturersdipl. ing Branko Kovač, Data Analyst at CUBE, Data Science Mentor at Springboard, Data Science Serbia Goran S. Milovanović, Phd, DataScientist@DiploFoundation, Data Science Serbia Summary of Session 6, 02. June 2016 :: Linear Regression + EDA, and Normality tests.Linear Regression + EDA and Normality tests. Linear Regression in R: Exploratory Data Analysis, assumptions of the simple linear model, correlation, and visualization. Predictions from the linear model. Confidence Intervals and Residuals. Inspecting the basic linear model. Infulential cases and the Influence Plot.Session 6 SlideShare Session 6 R Script Readings for Session 7 Intro to R for Data Science SlideShare :: Session 6 Introduction to R for Data Science :: Session 6 [Linear Regression in R] from Goran S. Milovanovic R script :: Session 6######################################################## # Introduction to R for Data Science # SESSION 6 :: 2 June, 2016 # Simple Linear Regression in R # Data Science Community Serbia + Startit # :: Goran S. Milovanović and Branko Kovač :: ######################################################## # clear rm(list=ls()) #### read data library(datasets) data(iris) ### iris data set description: # https://stat.ethz.ch/R-manual/R-devel/library/iriss/html/iris.html ### Exploratory Data Analysis (EDA) str(iris) summary(iris) ### EDA plots # plot layout: 2 x 2 par(mfcol = c(2,2)) # boxplot iris$Sepal.Length boxplot(iris$Sepal.Length, horizontal = TRUE, xlab="Sepal Length") # histogram: iris$Sepal.Length hist(iris$Sepal.Length, main="", xlab="Sepal.Length", prob=T) # overlay iris$Sepal.Length density function over the empirical distribution lines(density(iris$Sepal.Length), lty="dashed", lwd=2.5, col="red") # boxplot iris$Petal.Length boxplot(iris$Petal.Length, horizontal = TRUE, xlab="Petal Length") # histogram: iris$Petal.Length, hist(iris$Petal.Length, main="", xlab="Petal Length", prob=T) # overlay iris$Petal.Length density function over the empirical distribution lines(density(iris$Petal.Length), lty="dashed", lwd=2.5, col="red")Created by Pretty R at inside-R.org# NOTE: Boxplot "fences" and outlier detection # Boxplot in R recognizes as outliers those data points that are found beyond OUTTER fences # Source: http://www.itl.nist.gov/div898/handbook/prc/section1/prc16.htm # Q3 = 75 percentile, Q1 = 25 percentile # IQ = Q3 - Q1; Interquartile range # lower inner fence: Q1 - 1.5*IQ # upper inner fence: Q3 + 1.5*IQ # lower outer fence: Q1 - 3*IQ # upper outer fence: Q3 + 3*IQ # A point beyond an inner fence on either side is considered a mild outlier # A point beyond an outer fence is considered an extreme outlier # plot variable density in general: Sepal Width # plot layout par(mfcol = c(1,2)) # NOTE: this is kernel density estimation in R. You are not testing any distribution yet. PLengthDensity