This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Machine learning & data science for beginners and experts alike.
It's the most wonderful time of the year - Santalytics 2020 is here! This year, Santa's workshop needs the help of the Alteryx Community to help get back on track, so head over to the Group Hub for all the info to get started!
There's a lot of talk about ggplot these days (we even wrote a Python version of it) and for good reason: it's a great plotting package that's easy to use. Despite this, I sometimes find myself wanting something even quicker than ggplot. When that's the case, I turn to base R plots. They're not as pretty and the syntax is a little unpleasant but they're very fast, work on just about anything, and are often used by the pros. In those regards, it's actually really similar to UNIX tools such as grep, sed, and awk.
So sit back, relax, and get ready to have some fun with R base plots!
We're using the iris dataset. It's a tried and true classic and while it's not the most exciting data in the world, it's built into R (so you don't need to download anything) and easy to understand.
The other dataset we'll be using is the USAccDeaths dataset which contains numbers on the accidental deaths in the U.S. from 1973 to 1978. It's also built into R and is a good example of a time series dataset. This will let us show off some of R's handy built-in features for working with time series data.
Ok first things first: the command to make plots is, you guessed it, plot. More good news: just about every data structure in R is plotable. That's not to say it'll look pretty or even make sense, but you can always try and find out.
You can add colors to your points by passing a value to the col parameter.
If you get tired of calling the iris data frame with the $ every time, you can "attach" data which will imply that everything from there forward is referencing the dataset you attach. Just don't forget to detach when you're done.
So as an example, let's say we want to plot specific values on the x and y axis. Instead of having to prefix our variables with iris$, we'll use attach.
You might have noticed there's a really weird circle with a cross in the middle of it on the points of our graph. You can assign different styles of points using the pch argument. Point styles can even be assigned to different categories (or "levels" in R) of a variable.
One of my very favorite things about R: histograms! When I made the switch from Excel to R, I had heard tales of mad sorcery where I could replace catalogs of frequency tables with one line of R code.
Histograms are great. They're a super easy way to get a quick feel for what your dataset looks like. So while it's one of the first things I learned in R, it's also one of the things I use the most.
Density Plots and Legends
To display distributions of different variables on the same plot, I recommend using density plots. density creates an estimate of the pdf (probability density function) of your variable. This basically gives you a nice, continuous line representing the distribution of your data. We'll use the lines function to add individual distributions with different colors to our plot.
virginica <- subset(iris,Species=="virginica")
versicolor <- subset(iris,Species=="versicolor")
setosa <- subset(iris,Species=="setosa")# plot distributions for each species
legend(2,1.2, c("virginica","versicolor","setosa"), c("blue","red","green"))
So there you have it: the basics about base plots in R. That's all I'll cover today, but if you're interested in learning more here are some other resources: