Move Seamlessly from Excel to R -2 (opening large files)
In this video, I’ll address the number one complaint I hear from Excel users: the inability to load large files. While many users are generally satisfied with Excel, they often face the same frustration - trying to work with large files that either won’t open or are too slow to manage efficiently.
I recommend using R to process and reduce the file size, making it easier to work with in Excel. In this tutorial, I’ll walk you through four methods to do just that.
Method 1: Limit the Rows
Sometimes you only need a portion of your data. For example, you might want to keep the first 50 or 100 rows of a large file. This is a great option if you're just scanning through the data. In R, you can easily select and export the top rows to create a manageable file size.
Method 2: Select Specific Columns
If your file has hundreds of columns but you're only interested in a few, you can use R to discard the unnecessary ones. This way, you can open the reduced file in Excel, saving you time and effort.
Method 3: Filter Rows
In more complex situations, you might need to filter rows based on specific criteria. For example, if you have a dataset of the entire U.S. population but only need data for New York, you can filter out the irrelevant rows in R. This reduces the file size and makes it easier to work with in Excel.
Method 4: Create New Columns
Sometimes, you don’t need all the data, just a specific calculation. For instance, if you have students' scores across several subjects and only need the total score, you can create a new column with the totals and discard the rest of the data. This significantly reduces the file size while keeping only the information you need.
Let me show you how to apply these four methods with an example. For this exercise, I created a small file containing names, addresses, and zip codes of 500 people. While this is a relatively small file, the methods I’ll show you are applicable to larger datasets as well.
Step-by-Step Example
First, let’s change our working directory in R to where the file is located. In my case, it’s on the desktop. I’ll load the necessary libraries: tidyverse for general manipulation, and readxl for reading Excel files.
Now, let’s load the file into R. I’ll be using two file formats: .TSP and .XLS, but the process is the same for .CSV files. Here's how you read the .TSP file:
X <- read_tsv("data.tsp")
Once the file is loaded, I can view the first 10 rows with X, or view the entire dataset with X %>% as.data.frame(). For large datasets, this is really helpful as it allows you to quickly view a subset of the data without overwhelming your console.
Method 1: Limiting Rows
To reduce the dataset, you can keep just the first 50 rows (or any number you choose):
Y <- X %>% head(50)
You can then save this smaller dataset as a new file using:
write_tsv(Y, "data_reduced.tsv")
Now, you’ve got a manageable file that you can open easily in Excel.
Method 2: Selecting Specific Columns
Let’s say we’re only interested in the first, last, and zip columns. You can do this with the select() function:
Y <- X %>% select(first, last, zip)
Then, save the new dataset:
write_tsv(Y, "data_columns.tsv")
Method 3: Filtering Rows
If we only want data for people living in Maryland, we can filter based on the state column:
Y <- X %>% filter(state == "MD")
And save this filtered dataset:
write_tsv(Y, "data_maryland.tsv")
Method 4: Creating New Columns
Suppose we want to create a new column that combines the first and last names. You can do this using the mutate() function:
Y <- X %>% mutate(full_name = paste(first, last))
Now, let's keep just the new full_name and zip columns:
Y <- Y %>% select(full_name, zip)
Finally, save the new dataset:
write_tsv(Y, "data_fullname.tsv")
Sorting the Data
If you want to sort your data, say by zip code, you can use the arrange() function:
Y <- X %>% arrange(zip)
And save the sorted dataset:
write_tsv(Y, "data_sorted.tsv")
These four methods are simple yet powerful ways to reduce the size of large files, making them easier to handle in Excel. I’ll dive deeper into each of these commands and show you more advanced use cases in separate videos.

