Introduction to R and RStudio

A Beginners Guide to Using the R Programming Language

Author

Stephen Colegate

Published

August 9, 2024

1 Introduction

R is a free, open-source software program that is available for Windows, Macs and Linux operating systems (R Core Team 2013). Because of this, many statisticians, biostatisticans, and data scientists have been using R to meet their data processing needs. The RStudio Desktop software operates the R programming language in a user-friendly environment.

In this tutorial, we will introduce how to install R and RStudio on your computer, try out basic R function commands, sourcing in files, saving and loading data, creating an R project, and learn how to install and load R packages.

2 Installing R and RStudio

R is a programming language that primarily functions as a command prompt. R operates by typing in and executing one command at a time. RStudio is an integrated development environment that runs on top of R. It is possible to run R without using RStudio itself, however RStudio comes with so many useful features that having it is almost essential. In this vignette, we will be using R entirely within the RStudio program. When you are using RStudio, you are working with the R programming language, just with the added benefits and quality of life features that you would expect with any computer software.

Warning

You must install the R programming language first and the RStudio program second. While it is possible to use R without having to install or use RStudio, you must have R installed on your computer before you can install and use RStudio.

2.1 Installing R

The latest version of R available as of the time of this writing is R-4.4.1. You can install R on your computer by going to https://cran.r-project.org/ and selecting the R version that is appropriate for your operating system.

Download the base distribution of R from the website and follow the prompts to install R on your computer. This is what you want to click on if you are installing R for the very first time.

Click base to install R for the first time.

This should then take you to a splash page where you can install the version of R specific to your operating system. Click Download R-4.4.1 to download the executable file.

The Download R page for the Windows operating system.

Go to your downloads folder or wherever you view files where you download from the internet and look for the R-4.4.1.exe file tailored for your operating system. Open the executable file to begin the installation process. Follow the on-screen prompts to install R on your computer.

Note

Older versions of R support 32-bit (denoted i386) and 64-bit (denoted x64) operating systems. This is because computers that are 32-bit can only run 32-bit software while 64-bit computers can run both 32-bit and 64-bit software. Since most computers today support 64-bit software, it is highly recommended the 64-bit version of R is used. Therefore as of R version 4.4.0, only the 64-bit version is supported.

2.2 Installing RStudio

After installing R on your computer, you will also need to install RStudio Desktop - an integrated development environment (IDE) to help data scientists be more productive with R (RStudio Team 2020). RStudio is a dashboard hosted by posit that allows ease of access to using R, especially if you are using R for the first time. When you open RStudio, you are also opening the R programming console within.

Go to the RStudio Desktop webpage. If you have not installed R yet, you can click Download and Install R on the webpage to begin the installation process. Click Download RStudio Desktop appropriate for your operating system. The posit website link should automatically direct you to the appropriate version of RStudio you will need to download and install on your computer. After clicking the download link, the executable file for RStudio will begin to download. After downloading, run the executable file and begin the installation process.

The posit site for downloading RStudio.

Once you have both R and RStudio installed, open up the RStudio software. Look for the RStudio icon in your list of programs and click the icon to open it.

3 Opening and Using R

A typical layout within RStudio has four different quadrants. Starting from the top-left clockwise:

The default RStudio layout for a typical R session.
  1. Source Pane: This panel displays all the R scripts you have opened. Code should be written in this panel. You can save R code within the panel to work on later. Any .R files you open with RStudio will be displayed in this panel. You can have multiple R scripts open at a time. R scripts that have changes that have not been saved will be written in red text. R scripts that have been saved will have their titles written in black text. You can change the name of the R script by right-clicking the R script tab name.

    Caution

    If you have opened RStudio for the very first time, no R scripts will be open. Therefore, the Source Pane (1) will not appear in the R environment. When you open an R script, the Source Pane will appear.

  2. Console Pane: This panel displays the actual R console, including all the code and output that has been conducted so far. The command prompt is designated by the > symbol. You can type in code here at the command prompt and then click Enter to execute what you type. Try this out by typing 2 + 2 at the command prompt and press Enter. When you click Run in the Source Pane, the line of code from the R script is pasted into the command prompt and is automatically executed for you.

  3. Environment Pane: This panel houses several tabs. The Environment tab lists all the R objects and variables available during the R session. The History tab keeps track of all code that has been executed in the Console. More tabs may be displayed depending on the type of project you may be working on.

  4. Output Pane: This panel includes a file viewer, plots that have been made, list of packages available and currently loaded, help documentation, a viewer, and a presentation tab.

3.1 R Scripts

You can save code commands by typing code into a .R file. This kind of file is called an R script. Putting code in R scripts allows one to save code commands. The major advantage of using the R programming language is its reproducibility - the user can write code to an R script, execute the code into the console to achieve output, save the R script, and return later to re-run the R code in the script to achieve the same results as before. This makes it easy for someone else to take code, run it, and obtain the same results on their computer. You can identify R files with the file extension name .R. Common file names R and RStudio use to save R commands include R scripts (.R), R Markdown documents (.RMD) and Quarto Documents (.QMD). This vignette, for example, was compiled from a .QMD file within RStudio.

To open a new R script in RStudio, click on File > New File > R Script. A new R script opens in the Source Pane. You can then write code into the R script and save it for later. To save your R script for the first time, click on the disk icon to bring up the navigation window. Type in a name of the file with the .R or .r suffix at the end of the name and click Save.

Important

Make sure you always include the .R file extension at the end of the file name when naming and saving R scripts. RStudio will not add this file extension on for you automatically. If you do not include the .R suffix, your computer may not recognize how to open the file.

Tip

As you make changes on the R script within RStudio, the file name in the tab turns red to let you know there are unsaved changes present. When you click on the disk icon to save your changes, the file name turns black.

If you close RStudio without saving, you will be asked if you’d like to save your changes to all your R scripts. RStudio will also ask if you’d like to save your Workspace Image - a temporary file that allows you to resume your R session after you close RStudio.

Tip

For most purposes in R, it is not necessary to save your Workspace Image since all code can easily be reproduced in a new R session so click No when prompted.

By default, RStudio should open a .R file if you click on it. If your computer does not recognize the .R file extension, you must tell your computer to open the file using RStudio. You can open an .R file within RStudio by following these steps:

  1. Open RStudio.
  2. Click on File > Open File. A navigation window should appear.
  3. Navigate to the file location where you want to open the .R file.
  4. Click Open.

3.2 Running R Code

R code is executed one line at a time. To run a line, place your cursor on the line you wish to run in the Source Pane and click on Run. The line of code is then sent to the Console Pane and is executed. This rendered HTML page of all the code and relevant output is hosted online on our GitHub page. All the R code that is available on this webpage can be copied and pasted into your R session. For each R code block, hover your mouse over the top-right corner and click Copy to Clipboard and then paste all the code from the code block into your R script. For example, try running the following code block below:

download.file("https://raw.githubusercontent.com/geomarker-io/purple_air_data_in_R/main/purpleair.R",
              destfile = "purpleair.R")

After copying and pasting this R code into the R script, run the entire line of code by putting your cursor at the beginning of the line and click Run at the top of the Source Pane.

Tip

The above R command is just one line of code, even though it appears to be two lines of code. R code that becomes too long to fit on one line may move down to the next line and so on. When the command is executed in the console, R understands that the comma does not end the line of code (the ) symbol does) and continues reading subsequent lines until it reaches a natural end.

The line of code is then immediately placed into the R console in the Console Pane and is executed. When you run this R command, R goes to this GitHub page, looks for a file on that page called purpleair.R and downloads it to your current working directory folder on your computer (see Section 4.1 for more details about working directories). You can then navigate to the Output Pane under the Files tab and look for the purpleair.R file there. Notice this is an R script because of the .R file extension! Clicking on this file within RStudio should open the R script in the Source Pane.

Tip

Try running all the available R code in this file to become familiar with the R programming language. Copy and paste each R code block within this webpage into an R script, then click Run after each command you paste in.

You can highlight multiple lines of code and run them all at once as well. Optionally, you can run lines of code by using the keyboard shortcut CTRL + ENTER (or equivalent). To get started, here are some simple lines of code:

# Assign values
x <- 2    # assign the letter 'x' the value 2
y <- 9    # assign the letter 'y' the value 9
Tip

If the first line of code begins with the special character symbol #, then that line is called a comment. Comments are useful to write down what certain lines of code perform. R will not execute lines of code that begin with #. Likewise, R will not execute any code following the # symbol. As a result, the # symbol can also be used to grey out lines of code that maybe do not work or you wish to not execute without having to delete the entire line of code. In RStudio, comments are easily identified because the lines of code are written in green.

Highlight the first line of code in the Source Pane x <- 2. The special character <- assigns the value provided on the right (2) to an object on the left (x). Hence, this line will set the variable x with the value of 2. The next line of code will assign the value 9 to the variable y. You can see that these values have been declared by clicking on the Environment tab in the Environment Pane.

With the variables x and y now declared, we can then use them to perform some simple operations:

# Perform simple operations
x + y     # equivalent to '2' + '9'
[1] 11
sqrt(y)   # sqrt = square-root
[1] 3
Warning

R is case sensitive and name sensitive so be very careful with how you name objects, files, options, and data frames within R. For example, even though PM25, pm25, pm2.5, and PM2.5 all appear the same, R classifies these as 4 different names. For convenience, we try to stick to lowercase letters when coding in R to avoid these discrepancies.

Since x and y have been declared, R then calculates x + y and sqrt(y). These results change depending on what the values of x and y are. As an exercise, change the values of x and y and rerun these lines again. The sqrt() line is a function - functions perform more complicated tasks within R.

Warning

Make sure you declare both x and y variables first before running x + y and sqrt(y) or else R will report an ERROR message.

R can work with data set objects, called data frames. R has a built-in data frame called mtcars. You can learn more information about any function or built-in data set by opening the help window using the help() function:

# Display help documentation for 'mtcars'
help(mtcars)

We display several first rows of this new R object with the head() function. By default, the first 6 rows are displayed. We can change this by declaring an option - additional arguments you can provide in an R function to modify its behavior. For example, the option n=10 in the head() function changes the number of rows to display to 10:

# Display first n=10 rows of the example cars data 
head(mtcars, n=10)
                   mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4         21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag     21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710        22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive    21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout 18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant           18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360        14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D         24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230          22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 280          19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4

All options must be specified within a function. Each option is then followed by an = operator with the argument specified afterwards to pass onto the function. You will see more examples of using options later on.

We can reference columns within a data frame by specifying the name of the data frame, using the $ symbol, and then the name of the column within. The $ operator is a special character in R. It references the column name within a data frame. In the example above, if we want to only examine miles per gallon of these cars, we can reference the name of this column mpg within mtcars as the following:

# Reference the 'mpg' column within 'mtcars'
mileage <- mtcars$mpg
head(mileage, n=20)
 [1] 21.0 21.0 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 17.8 16.4 17.3 15.2 10.4
[16] 10.4 14.7 32.4 30.4 33.9

4 Saving/Loading

R is useful because you can load in your own data, functions, and R scripts. In order for R to read in your files, it needs to know where to look for those files. It is also convenient to set a file location on your computer where you will create a file to save your work and/or load in files into your R session. In this section, we will explain how to set the working directory and to demonstrate how you can save and load files into your R session.

4.1 Working Directory

The first thing to do when starting a new R session is to set the working directory - a specific file path location on your computer. Whenever R fetches for a file on your computer, it will always check the working directory first. The default working directory may be different depending on how RStudio starts. You can verify the default working directory in RStudio by clicking on Tools > Global Options and looking under the Basic tab in the General section. However, if you click on an R script on your computer, RStudio should start (by default) with the R script displayed in the Source Pane but the working directory will be the folder where the R script used to open RStudio is located. Because the working directory may change depending on how RStudio opens, it is always a good idea to set and verify the working directory before running any code.

Checking your working directory also ensures that your files will be read and saved in the correct location. If you try to load in a file with the wrong working directly, R (probably) won’t find the file it is looking for and will print an error message in the console saying no such file exists there. In addition, saving files without verifying the working directory may cause those files to save in an unexpected location on your computer.

You can use the function getwd() to obtain the current working directory:

# Check the current working directory
getwd()
[1] "C:/Users/wiinu/OneDrive - cchmc/Documents/Rintro"

This should return a file path that should point to a root folder on your computer. All files created within the current R session will be saved to this file location. By default, R searches this directory whenever it is asked to read in contents from a file.

If the file path displayed by getwd() is incorrect, you can change the working directory by following these steps:

  1. In the RStudio window at the top, click Session > Set Working Directory > Choose Directory…. A navigation window should appear, showing you the contents of the root folder of the working directory.

    Changing the work directory at the beginning of an R session.
  2. Navigate to the root folder where you would like to set your working directory.

  3. Click Open.

  4. R automatically sets the new working directory using the function setwd() in the Console Pane.

    Tip

    Feel free to copy this line of code from the Console Pane to an R script. Make sure to remove the > symbol (the command prompt) at the beginning of the line when doing so. By copying in the setwd() statement into the R script, you can then set the working directory by simply running this line of code.

  5. Verify that you have set your working directory correctly using the getwd() function again.

Important

Always start a brand new R session before changing the working directory. The working directory should always be set first before performing any tasks.

4.2 Save/Load Data

Suppose we wish to save mtcars in a file named cardata.rds. Here, the .rds file extension references a R Data Serialization file - a convenient way to save and read in data frame objects in R.

# Save data frame as a file
saveRDS(mtcars, file = "cardata.rds")

We can then use the same mypath to load in the file as well.

# Load data frame into the R session
readRDS(file = "cardata.rds")
                     mpg cyl  disp  hp drat    wt  qsec vs am gear carb
Mazda RX4           21.0   6 160.0 110 3.90 2.620 16.46  0  1    4    4
Mazda RX4 Wag       21.0   6 160.0 110 3.90 2.875 17.02  0  1    4    4
Datsun 710          22.8   4 108.0  93 3.85 2.320 18.61  1  1    4    1
Hornet 4 Drive      21.4   6 258.0 110 3.08 3.215 19.44  1  0    3    1
Hornet Sportabout   18.7   8 360.0 175 3.15 3.440 17.02  0  0    3    2
Valiant             18.1   6 225.0 105 2.76 3.460 20.22  1  0    3    1
Duster 360          14.3   8 360.0 245 3.21 3.570 15.84  0  0    3    4
Merc 240D           24.4   4 146.7  62 3.69 3.190 20.00  1  0    4    2
Merc 230            22.8   4 140.8  95 3.92 3.150 22.90  1  0    4    2
Merc 280            19.2   6 167.6 123 3.92 3.440 18.30  1  0    4    4
Merc 280C           17.8   6 167.6 123 3.92 3.440 18.90  1  0    4    4
Merc 450SE          16.4   8 275.8 180 3.07 4.070 17.40  0  0    3    3
Merc 450SL          17.3   8 275.8 180 3.07 3.730 17.60  0  0    3    3
Merc 450SLC         15.2   8 275.8 180 3.07 3.780 18.00  0  0    3    3
Cadillac Fleetwood  10.4   8 472.0 205 2.93 5.250 17.98  0  0    3    4
Lincoln Continental 10.4   8 460.0 215 3.00 5.424 17.82  0  0    3    4
Chrysler Imperial   14.7   8 440.0 230 3.23 5.345 17.42  0  0    3    4
Fiat 128            32.4   4  78.7  66 4.08 2.200 19.47  1  1    4    1
Honda Civic         30.4   4  75.7  52 4.93 1.615 18.52  1  1    4    2
Toyota Corolla      33.9   4  71.1  65 4.22 1.835 19.90  1  1    4    1
Toyota Corona       21.5   4 120.1  97 3.70 2.465 20.01  1  0    3    1
Dodge Challenger    15.5   8 318.0 150 2.76 3.520 16.87  0  0    3    2
AMC Javelin         15.2   8 304.0 150 3.15 3.435 17.30  0  0    3    2
Camaro Z28          13.3   8 350.0 245 3.73 3.840 15.41  0  0    3    4
Pontiac Firebird    19.2   8 400.0 175 3.08 3.845 17.05  0  0    3    2
Fiat X1-9           27.3   4  79.0  66 4.08 1.935 18.90  1  1    4    1
Porsche 914-2       26.0   4 120.3  91 4.43 2.140 16.70  0  1    5    2
Lotus Europa        30.4   4  95.1 113 3.77 1.513 16.90  1  1    5    2
Ford Pantera L      15.8   8 351.0 264 4.22 3.170 14.50  0  1    5    4
Ferrari Dino        19.7   6 145.0 175 3.62 2.770 15.50  0  1    5    6
Maserati Bora       15.0   8 301.0 335 3.54 3.570 14.60  0  1    5    8
Volvo 142E          21.4   4 121.0 109 4.11 2.780 18.60  1  1    4    2
Important

By default, cardata.rds file is saved on your computer in the current working directory (see Section 4.1 to learn how to change your working directory to save the file in a different location). If you move the cardata.rds file to a different location, you should also move its associated files with it too, or else you will have to change your working directory during the R session - something to avoid doing in practice. To avoid complications with problems like this, it is always a good idea to keep only the cardata.rds and its associated files in a folder. Or better yet, creating a R project in RStudio will always designate the folder that it was created in as the current working directory. Whenever the R project is opened, the working directory will automatically redirect to that folder’s location. See Section 5.1 to learn more about creating R projects within RStudio.

R can work with a variety of data structures besides the .rds format. Common structures include .csv, .txt, and .spss. Excel spreadsheets can also be read in by using the readxl package. The easiest way to import a data file for the first time is using the Important Dataset feature in the Environment Tab in the Environment Pane. All code needed to read in data files for this vignette will be provided but you may have to alter the file path locally for the code to work properly on your computer.

4.3 Source R files

Setting the working directory first allows you to also read in other R scripts from that file location. This can be achieved by using the source() function. This is a special function in R - you cannot create a function that is named source because it is a special function to load in .R files. Special functions are easily identified in RStudio, as they are written with blue text.

Sourcing R files is a great way to save personal information in a separate R script. Whenever you need to pull sensitive information from another R script, you can then source (actually load in) the file and read in its entire contents. This keeps your sensitive information in a separate location, allowing you to keep your main R script free of personal data such as personal tokens or private URL you do not wish to share with others.

As an example, suppose we need an Application Programming Interface (API) Key to access personal information. This API Key is a unique character string (text in quotes) that grants the user who knows the API Key access to a database. Only those who know what the API Key is can get access to the data. Those who do not know what the API Key is will not be able to pull the data. API Keys are treated like passwords - they should not be shared with anyone else! When working on an R script, avoid specifying the API Key directly. That way, whenever the R script is shared with others, personal information like API Keys will not be included and shared with others.

Here’s an example how to source a R script:

  1. Create a new R script. Click File > New File > R Script. A new R script opens in the Source Pane.

  2. Copy and paste this one line into the new R script:

    API = "a1b2c3d4e5f6g7h8i9"

    The variable name API will contain our unique API Key a1b2c3d4e5f6g7h8i9. The = operator works just like the <- operator above. Do not add anything else into the R script.

    Tip

    There is an example API_KEY.R file on our GitHub page you can also download and save to your working directory. This example file has the same unique API Key as shown here, effectively bypassing this step.

  3. Save the R script with the file name API_KEY.R by clicking on File > Save As and typing in the file name API_KEY.R in your current working directory (as referenced by your current working directory; see Section 4.1).

  4. In the current R script, load in the API_KEY.R file using the source() function:

    # Load in the API_KEY.R file containing API Key
    source("API_KEY.R")
  5. When this line is run (assuming the working directory is set correctly and the file saved to that location), R should look inside the working directory and look for a file named API_KEY.R. When it finds it, R then reads all its contents. Examining the Environment Pane shows there is now a new variable API which contains the API Key as a character string. Now you can reference the variable API in any functions that require the use of an API Key.

Tip

Another advantage of using the source() function to read in API Keys is to avoid having to carefully specify the API Key directly in the R script. You can then share the R script with others without having sensitive information contained within.

4.4 The .Renviron File

It is not ideal to leave sensitive data in a R script. If you share any documents that include your personal information with someone else, that information will become compromised. A better solution is to save sensitive information in a separate location that keeps it private. Then, whenever you need that information, R already has it loaded in the global environment for you to use. You do not need to specify the sensitive information yourself.

A type of file that handles authentic requests is the .Renviron filetype. The .Renviron filetype works just like a R script, except this file does not appear in the working directory. When a new R session begins, either by opening RStudio or by restarting the R session manually, R will check the contents of the .Renviron file and source from it into the global environment. This means that any variables declared in a .Renviron file will be available in the global environment but will be hidden from the Environment Pane and the user’s current working directory. This setup is useful for API credentials, since the user is able to hide their API key while being able to use them within an R session as environment variables. Click here to learn more about setting up API credentials.

Follow these steps to setup a secure method of loading your API Key into your R environment:

  1. Open the .Renviron file in RStudio by running the following code:

    # Run this to open .Renviron file
    usethis::edit_r_environ()
    Caution

    You must have the usethis package installed in your Users Library to use this function. If you do not have this package installed, type install.packages('usethis') before trying this line of code. See Section 6.1 to learn more about installing R packages and Section 6.2 to learn more about loading R packages.

    The edit_r_environ() function from the usethis package will open a tab named .Renviron in the Source Pane (see next image below).

  2. Copy and paste the following R code into the .Renviron file that just opened. Click on the clipboard icon on the top right of the code block inside the vignette to copy the line of code.

    API = "a1b2c3d4e5f6g7h8i9"

    The .Renviron file when opened in RStudio. Copy the line of code provided above into this file (Step 2).
  3. Save the .Renviron file and then close the tab.

  4. Restart the R session by clicking Session > Restart R. You may be prompted to save any unsaved changes you have made to any R scripts you have opened or the .Renviron file.

    Caution

    Remember to reload any R packages that you may need using the library() function whenever you restart an R session. If a function within an R package is not found, it is likely because the R package that contains the function is not loaded. See Section 6.2 to learn how to load packages.

  5. When a new R session begins (either by restarting R or by closing and opening RStudio), R will check the contents of the .Renviron file and set the variable name API as the character string "a1b2c3d4e5f6g7h8i9". Verify that this is the case by running the following code:

    # Check that your API 
    Sys.getenv("API")
    [1] "a1b2c3d4e5f6g7h8i9"

    The Sys.getenv() function pulls the variable API that has been assigned "a1b2c3d4e5f6g7h8i9" you made in Step 2.

    Caution

    The API Key that you pasted in the .Renviron file will still work even if you are outside the R project. Be careful that any variables that you paste inside the .Renviron file are not also used in the current R script.

This process only needs to be completed once for each API Key that you use. That is, the PurpleAir API Key that you pasted in the .Renviron file will be set every time a new R session begins within the R project. If you wish to use a different API Key, repeat Steps 1-5, replacing the old API Key in Step 3 with the new API Key.

5 R Projects

If you are frequently reading in files and saving files to your computer, it is more convenient to create a R project to accomplish these tasks. R projects not only allows you to set your working directory every time the R project is opened but it also houses multiple R files, data files, and more all in one place. This is very convenient if you have multiple R scripts or data files you need to open and work with - opening the R project will automatically open up all these files from where you left off. R projects come as files with the file extension .Rproj. You can also easily switch back and forth between various R projects within RStudio - handy if you are working on multiple projects at a time.

5.1 New R Project

First open RStudio on your computer. At the top left, click File > New Project. A dialogue box should appear like the one shown below:

Create a new R project within RStudio.
Tip

If you already have the folder created where all project files will be kept, you can reference this folder by selecting Existing Directory. This loads a navigation box where you can select your folder.

Select a folder that already exists to house the R project.

Select New Directory from the message box. This brings up a list of various project types. Select New Project.

Click New Project to create a new R project.

On the next screen (shown below), enter in the name of the project (e.g. PurpleAir). Beneath the directory name, browse the location on your computer where to host the R project. This will create a subdirectory folder with the name of the project inside. All files created, saved, and loaded while working in this R project will come from this folder.

Type in the project name and select where to create the folder for the project.

Leave all the other fields untouched. Click Create Project to start the new R project. RStudio then restarts the current R session inside of the project folder.

Tip

You can verify the R project you are currently in by looking for the project name in the top-right corner of RStudio. Clicking on this icon brings up a menu to create new R projects, close the existing project, or switch to a new project.

6 Packages

R packages are extensions to the base R environment, allowing the user to import data, functions, code, documentation and objects other creators have compiled. These packages are hosted on a centralized software repository such as the Comprehensive R Archive Network (CRAN).

6.1 Installing R Packages

Installing packages is easy and usually requires a one-time installation process. In this tutorial, we will install the following packages to use with an R session:

Tip

A banner, like the one shown below, will appear in RStudio if there are packages that are being used that are not installed in the Users Library.

The usethis package is required but is currently not installed, with the option to install the missing package.

Click Install to download and install the missing packages.

To install the dplyr, ggplot2, devtools, usethis and tidyverse packages, simply use the install.packages() function below. Here, the names of the packages are specified as a list using the c() operator - the c standing for combine:

# Install the following packages - only need to run once
install.packages(c('dplyr', 'ggplot2', 'usethis', 'tidyverse'))
Warning

It can take several minutes for all the packages to download and install on your computer. Do not run any code or stop the process by clicking on the stop sign icon in the top right of the Console Pane. Doing so can cause problems with the installation.

Caution

You may need to restart the R session in order for these packages to successfully install (especially if the packages are already loaded). If this is the case, select Yes when prompted to restart the R session. You may have to rerun the R code chunks again after restarting the session.

R already comes with its own packages (e.g., data sets, graphics, etc.). You do not need to install these packages, as they are already preinstalled when R is first installed. You can view all the R packages installed on your computer by clicking on the Packages tab in RStudio on the Output Pane.

A package only needs to be loaded in once per R session. Reloading a package in the same R session will have no effect. You can check whether a package is successfully loaded by clicking on the Packages tab in the Output Pane. You can also load a package by selecting the open box next to the package name, browse package information on CRAN, and delete the package altogether from this list.

List of Packages installed in the Output Pane. Loaded packages are indicated with a check mark.

Loading and unloading packages can be accomplished here by clicking on the check mark box.

Caution

All open packages are unloaded when RStudio is closed or a new R session begins. Unlike the installation process where the packages are installed only once, you must load these packages every time you start a new R session. When you quit your R session and start a fresh session, you must reload the R packages again.

Note

Some R packages depend on items that are handled by other R packages in order to function properly. These extra dependencies will be installed automatically if they are not present in the User Library. For example, installing the tidyverse package will also install (and load) the following other packages: dplyr, forcats, ggplot2, lubridate, purrr, readr, stringr, tibble, and tidyr.

Caution

Sometimes, R will print a warning message if a package that is loaded could conflict with another package that is already loaded, like the one shown below:

A conflict in function names between the dplyr and stats packages.

This scenario appears if functions that share the same name but perform different actions from two or more packages. Functions from the latter package will mask functions that are included in packages loaded earlier by default. For example, the filter() function appears in both the dplyr package and the stats package. The stats package is loaded automatically when a new R session begins. When the dplyr package (a package that the tidyverse package depends on, or a dependency) is loaded into the session, this conflicts informs you that the fitler() function refers to the function in the dplyr package, not the stats package.

Tip

You can refer to functions included with an R package without having to load the package itself. Use the :: operator to refer to the package and the function with which you wish to use. From the previous caution box, dplyr::filter() refers to the filter() function in the dplyr package. This is handy if you require a specific function from a R package, but make sure you have the package installed first!

Note

Some packages may need to be loaded from binary sources and compiled before installing. If this happens, a dialogue box will appear, like the one shown below.

Click No to install packages that need compiling.

Select No to ensure the packages downloaded and installed properly.

Periodically, authors of R packages will update their package to fix bugs, add more functions and data sets, and to be better compatible with the latest R software and other packages. You can use the install.packages() function to update (actually re-install) the package to the latest version. By default, the latest version of the package available on CRAN will be installed.

Important

Packages need to only be installed one time. After a package has been installed, there is no reason to install the package again unless there is an updated version for it.

6.2 Loading R Packages

Once R packages have been installed, you must then load the R package in the R environment. Loading in R packages allows the user to gain access to data, functions, and help documentation that comes along with the R package. To load an R package, use the function library() along with the package name:

# Load required R packages
library(dplyr)
library(ggplot2)
library(usethis)
library(tidyverse)

You can check whether a package is successfully loaded by clicking on the Packages tab in the Output Pane. Loaded packages are indicated with a check mark. You can also load a package by selecting the open box next to the package name, browse package information on CRAN, and delete the package altogether from this list. Loaded packages can be unloaded by clicking on the check mark. Typically, however, packages are never unloaded, as functions that are part of a package will then become unavailable once the package is unloaded. All open packages are unloaded when R or RStudio is closed or a new R session begins.

List of Packages installed in the Output Pane. Loaded packages are indicated with a checkmark.
Warning

Unlike the installation process where the packages are installed only once, you must load these packages every time you start a new R session. When you quit your R session and start a fresh session, you must reload the R packages again.

Sometimes, R will print a warning message if a package that is loaded could conflict with another package that is already loaded. This scenario appears if, for example, functions that share the same name but perform different actions from two or more packages. Functions from the latter package will mask functions that are included in packages loaded earlier by default.

Function conflicts when loading the dplyr package.

In the above example, the dplyr package has functions filter and lag that also appear in the stats package (automatically loaded when R opens). When the dplyr package is loaded, the filter and lag functions from the stats package are masked with the same-named functions in the dplyr package.

To use a masked function from an earlier package, you must either unload the latter package or reference specifically the package. Indeed, the :: operator allows you to reference a function from a specific package without having to load that package first. This is handy if you will only use a package just to reference a function or data set sparingly.

Tip

You can specify precisely which function to use with the :: operator with the format package::function. For example stats::filter references the filter() function from the stats package and dplyr::filter references the filter() function from the dplyr package. Note that the packages do not have to be loaded for this to work.

7 Feedback

If you have any comments or suggestions on ways that this tutorial can be improved, I’d love to hear from you! Please email me your feedback at stephen.colegate@cchmc.org.

References

R Core Team, R. 2013. “R: A Language and Environment for Statistical Computing.”
RStudio Team. 2020. RStudio: Integrated Development Environment for r. Boston, MA: RStudio, PBC. http://www.rstudio.com/.
Wickham, Hadley. 2016. Ggplot2: Elegant Graphics for Data Analysis. Springer-Verlag New York. https://ggplot2.tidyverse.org.
Wickham, Hadley, Mara Averick, Jennifer Bryan, Winston Chang, Lucy D’Agostino McGowan, Romain François, Garrett Grolemund, et al. 2019. “Welcome to the tidyverse.” Journal of Open Source Software 4 (43): 1686. https://doi.org/10.21105/joss.01686.
Wickham, Hadley, Jennifer Bryan, Malcolm Barrett, and Andy Teucher. 2024. Usethis: Automate Package and Project Setup. https://CRAN.R-project.org/package=usethis.
Wickham, Hadley, Romain François, Lionel Henry, Kirill Müller, and Davis Vaughan. 2023. Dplyr: A Grammar of Data Manipulation. https://CRAN.R-project.org/package=dplyr.