In the last post, we focused on the preparation of a tidy dataset describing consumer perceptions of beverages. In this post, I'll describe some analyses I've been doing of these data, in order to better understand how consumers perceive the beverage category. This type of analysis is often used in sensographics- companies who produce food products (chocolate, sauces, etc.) conduct research to understand the "product space," e.g. the way in which consumers understand the organization of a product category according to relevant perceptive dimensions, and the place that different products occupy within that space.In order to accomplish this goal, we will use PCA (principal components analysis) to analyze the beverage dataset produced in the previous post. Principal components analysis is a technique that tries to reduce a set of variables into a smaller dimensional space. In the current case, we have variables describing a number of consumer perceptive dimensions (e.g. happy, relaxed, etc.). PCA allows us to find a smaller number of independent components that describe the variation in these variables. Within this reduced-dimensional space, we will be able to better understand the relationship among the variables (e.g. the factors that underlie or group the consumer perceptions), and the attributes of the beverages (e.g. where the beverages lie within this lower-dimensional space).For an excellent introduction to PCA, I highly recommend these wonderfully clear videos from the Hastie and Tibshirani "Introduction to Statistical Learning" online course. François Husson, a developer of the R package we'll use to do the PCA, has a great open course about sensographics on YouTube (in French only), and some interesting tutorials about PCA in English.*The Data For a detailed description of the creation of this dataset, please see the previous post which describes the process in detail. In sum, we have a dataset with 1 row per beverage. For each beverage, we have information on the following consumer perceptions: Creative, Energetic, Joyous, Focused, Happy, Relaxed, Tired and Excited. We also have columns representing the category that each beverage falls into (there are 3 categories of beverages in these data), the drink name, the overall rating score, and the number of consumers who rated each beverage. For this analysis, I have selected all of the beverages (rows) in the dataset that had more than 100 consumer ratings. This was an analytical choice, made with the goal of only analyzing beverages for which we have a fair amount of consumer evaluations. The head of the dataset, which is called "feelings," looks like this: Name Category Rating Num_Obs Creative Energetic Joyous Focused Happy Relaxed Tired Excited Drink 268 Category 2 6.71 383 0.40 0.22 0.49 0.22 0.35 0.03 0.11 0.38 Drink 483 Category 2 8.42 105 0.31 0.30 0.50 0.30 0.58 0.57 0.16 0.49 Drink 327 Category 2 8.19 159 0.52 0.37 0.50 0.29 0.50 0.30 0.18 0.50 Drink 79 Category 3 6.24 119 0.39 0.18 0.43 0.18 0.34 0.01 0.26 0.26 Drink 419 Category 1 8.14 173 0.45 0.34 0.54 0.24 0.57 0.31 0.12 0.55 Drink 698 Category 2 8.33 124 0.15 0.23 0.23 0.44 0.33 0.73 0.09 0.34 As mentioned in the previous post, the consumer perceptive dimensions here represent the percentage of consumer reviews that flagged a given attribute as present. As an example, 40% of consumers said that Drink 268 (the first row in our dataset) makes them feel "creative." PCA We will use the FactoMineR package to compute the PCA. FactoMineR is a really great package for exploratory data analysis, and it provides a great deal of output that we can use to visualize the results of the PCA.Before we begin, let's go over the distinction between two important terms for the PCA implementation in FactoMineR. The first is the variables. These are the measured dimensions that we have information on; in the current example, the perceptive dimensions (e.g. happy, relaxed) are our variables, and they are contained in the columns of our dataframe. The second is the individuals. These are the observations, or the units for which we have measured the information contained in the variables; in the current example, the beverages (e.g. Drink 1, Drink 2, etc.) are the individuals, and they are contained in the rows of our dataframe.The PCA CodeWe first make the beverage names the row names of our dataframe. (This is useful if we want to plot the names of the beverages in the built-in FactoMineR graphs.) We specify that we want to standardize (or scale) our variables; it's important to do this in PCA because the size of the variation of the variables (directly influenced by their scaling) heavily influences their contribution to the analysis. By standardizing our variables prior to analysis, we ensure that the PCA is not dominated by variables purely because of the size of their variation. We specify that we want 5 principal components (via the ncp command, which stands for "nombre de composantes principales" - the package developers are French ?). Finally, I indicate the "Category" variable as a supplementary variable. In FactoMineR, supplementary variables are not used in the analysis itself, but rather are used to interpret the results of the analysis. We store the results of our PCA in an object called "PCA_feelings."# make the row names of conditions the beverage names# useful if you want to plot the beverage names# on the PCA plotrownames(feelings)