Move Seamlessly from Excel to R - 3 (Vectors)
In the first video of this series, we covered the basics of R. In the second, we explored how to load large Excel files into R and make them more manageable by reducing their size. I introduced several methods for this, and now I want to dive deeper into these techniques. However, before we do that, there's an important concept we need to explore: vectors.
We briefly touched on vectors in the first video, but in this one, we'll go into more detail, discussing various ways to create them and how they’re connected to data frames. Vectors are central to R—they’re fundamental to everything you do in the language. While there are many ways to create vectors, I’ll focus on just a few key methods here.
1. Using the Colon Operator (:
)
The first method is using the colon operator. For example, 1:10
will generate a sequence of integers from 1 to 10. You can extend this to any range, like 45:75
, which will give you all integers from 45 to 75.
2. Using seq()
Next, you can use the seq()
function. For example, seq(1, 10, 0.1)
generates a sequence from 1 to 10 with increments of 0.1. This method allows you to specify non-integer steps, unlike the colon operator, which is limited to integer increments.
3. Using rep()
If you want to create a vector where a number is repeated multiple times, you can use the rep()
function. For example, rep(2, 10)
creates a vector where the number 2 is repeated 10 times.
4. Using c()
Finally, the c()
function (short for "combine") allows you to create a vector by combining individual elements. For example, c(21, 75, 12.3, 21)
will create a vector containing these four numbers. While this method works well for small vectors, it’s not practical for very large datasets where manually typing each number would be time-consuming. Later, I’ll show you how you can load numbers from a file and treat them as vectors.
Vector Operations in R
One of the advantages of vectors in R is that you can perform operations on entire vectors. For instance, if you have a vector x = 1:100
, you can compute the square root of all its elements using sqrt(x)
. Similarly, sign(x)
will return the sign (positive, negative, or zero) of each element in x
.
R also allows you to apply functions to entire vectors that wouldn’t make sense for individual numbers. For example:
sum(x)
calculates the sum of all elements inx
.plot(x)
generates a plot of the vectorx
, displaying each element's position along the y-axis, with the index as the x-axis.
You can also work with two vectors at once. For example, if you have y = sqrt(x)
, plot(x, y)
will create a scatter plot showing x
on the x-axis and sqrt(x)
on the y-axis.
More Vector Functions
There are several other useful functions you can apply to vectors. For example:
head(x)
gives you the first six elements of the vector.tail(x)
returns the last six elements.mean(x)
computes the mean of the vector.median(x)
finds the median.sd(x)
calculates the standard deviation.var(x)
gives the variance.
These functions are especially helpful when you’re working with vectors containing statistical data.
Vectors and Data Frames
Now, let’s talk about how vectors relate to data frames. A data frame in R is a table-like structure that can hold different types of data (such as numbers and strings) in columns. Vectors are the building blocks of data frames. You can create a data frame from vectors and extract vectors from an existing data frame.
Let’s create a data frame manually using vectors:
name <- c("Alex", "John", "Samantha")
weight <- c(80.2, 70.1, 92.3)
age <- c(20, 21, 22)
df <- data.frame(name, weight, age)
This creates a data frame df
where each column corresponds to one of the vectors.
If you have a data frame (whether created manually or loaded from a file), you can extract individual vectors from it. For example, if you have a data frame df
loaded from a CSV file, you can access the name
column as a vector by using df$name
.
Similarly, if you want to create a plot from data stored in a data frame, you can extract the vectors for plotting. For instance:
w <- df$weight
a <- df$age
plot(a, w)
This will plot weight versus age, with age on the x-axis and weight on the y-axis.
Conclusion
Vectors are foundational in R, and understanding how to work with them will help you efficiently manipulate and analyze data. Whether you’re creating vectors from scratch, performing operations on them, or using them within data frames, mastering vectors is key to mastering R.