by Ali Zaidi, Data Scientist at Microsoft In previous post we showcased the use of the sparklyr package for manipulating large datasets using a familiar dplyr syntax on top of Spark HDInsight Clusters. In this post, we will take a look at the RxSpark API for R, part of the RevoScaleR package and the Microsoft R Server distribution of R on HDInsight. We'll use RxSpark to visualize a dataset of 140M taxi rides between boroughs in New York City. Dealing with data in distributed storage and programming with concurrent systems often requires learning complicated new paradigms and techniques. Statisticans and...