Welcome to Introduction to R for Data Science, Session 8: Intro to Text Mining in R, ML Estimation + Binomial Logistic Regression [Web-scraping with tm.plugin.webmining. The tm package corpora structures: assessing document metadata and content. Typical corpus transformations and Term-Document Matrix production. A simple binomial regression model with tf-idf scores as features and its shortcommings due to sparse data. Reminder: Maximum Likelihood Estimation with Nelder-Mead from optim().] The course is co-organized by Data Science Serbia and Startit. You will find all course material (R scripts, data sets, SlideShare presentations, readings) on these pages. Check out the Course Overview to acess the learning material presented thus far. Data Science Serbia Course Pages [in Serbian] Startit Course Pages [in Serbian] Lecturers dipl. ing Branko Kovač, Data Analyst at CUBE, Data Science Mentor at Springboard, Data Science Serbia Goran S. Milovanović, Phd, DataScientist@DiploFoundation, Data Science Mentor at Springboard, Data Science Serbia Summary of Session 8, 17. June 2016 :: Intro to Text Mining in R + Binomial Logistic Regression. Intro to Text Mining in R + Binomial Logistic Regression. Intro to Text Mining in R + Binomiral Logistic Regression: Web-scraping with tm.plugin.webmining. The tm package corpora structures: assessing document metadata and content. Typical corpus transformations and Term-Document Matrix production. A simple binomial regression model with tf-idf scores as features and its shortcommings due to sparse data. Reminder: Maximum Likelihood Estimation with Nelder-Mead from optim(). Session 8 SlideShare Session 8 R Script Further Readings Intro to R for Data Science SlideShare :: Session 8 Introduction to R for Data Science :: Session 8 [Intro to Text Mining in R, ML Estimation + Binomial Logistic Regression] from Goran S. Milovanovic R script :: Session 8 ######################################################## # Introduction to R for Data Science # SESSION 8 :: 16 June, 2016 # Binomiral Logistic Regression + Intro to Text Mining in R # Data Science Community Serbia + Startit # :: Goran S. Milovanović and Branko Kovač :: ######################################################## # clear rm(list=ls()) # libraries library(tm) library(tm.plugin.webmining) #### NOTE #### Suggestion: Skip WebCorpus() calls from the tm.plugin.webmining #### (skip everything before: # START HERE load) #### Data set is available from GitHub :: https://github.com/GoranMilovanovic/IntroRDataScience #### File: Session8.RData #### Download link: https://goo.gl/cgJ3J3 #### Part I Information Retrieval: ICT market # 2 categories: dotcom vs hardware companies # dotCom category: # NASDAQ:GOOGL is Alphabet Inc, NASDAQ:AMZN is Amazon, NASDAQ:JD is JD.com, NASDAQ:FB is Facebook, # NYSE:BABA is Alibaba # hardware category: # NYSE:HPQ is HP, NASDAQ:AAPL is Apple, KRX:005930 is Samsung Electronics, TPE:2354 is Foxconn, # NYSE:IBM is IBM # source: Google Finance # search queries: .com vs. hardware companies searchQueries