Rでのデータフレーム - データの感覚をつかむ (Data frames in R - Getting a sense of your data)

字幕表動画を再生する

Alright, once we have our data, it would be good to know how to get a general sense of
it.
There are six essential functions we can use to grasp the shape of what we’re working
with.
I personally like to keep these close at hand because I use them often.
These are the nrow(), ncol(), colnames(), rownames(), str(), which you already know,
and the summary() function.
Alright, let’s see what insights each of these gives us.
I’ll use the Pokémon data from the past two lessons again.
It is big and full of wonder!
As you can guess, nrow() gives us the number of rows our data has, not counting the column
names.
Respectively, ncol() provides us the number of columns.
Let’s do these two together.
So, our data has 811 observations, and 14 variables.
Now we have an idea of how large our current data set is.
And that’s something!
Moving on, we know we have 14 columns, but it would be better if we knew what’s inside
these columns.
Let’s look at their names with colnames().
Cool, so we have id, Pokémon, what the species id of that Pokémon is (which is probably
just the same as id), and so on… knowing the names of our variables makes it a lot
easier when we need to slice and subset our data to do operations on specific values from
a column.
Right.
The rownames() function here is a little useless (and probably often will be, because it’s
not a common practice to name your rows, especially if your data set is large), but!
It is the natural counterpart for colnames() and I must mention it.
Finally, we have the str() and summary() functions.
You are already familiar with str(): it gives you the compact version of your data structure.
It really comes in handy when you want to have a quick look at your data and how it’s
organised.
R returns the structure of our data, with row and column numbers, and for each column,
or variable, the basic data type, as well as a couple of value instances.
Awesome.
If, when importing the data, we hadn’t set our stringsAsFactors = argument to FALSE,
we would have a bunch of factor where we have character data now.
Great.
And last but not least, the summary() function.
Now, this one is truly a multipurpose statistic and it should be one of the first things you
consult when starting to work on a new data set.
Summary() provides an excellent, well, summary, of the object you pass into it.
It is a bit more useful with numerical data, because it provides the essential descriptive
statistics (but we will go over this in a bit more detail in the statistics section
of the course).
Let’s see what it will tell us about my.pok.
Okay, so it took every variable in our data set and computed a bunch of useful things,
like the means and medians of each.
It also provided us with scope information like minimum and maximum values.
Awesome.
Our character variables, like the Pokémons and their types, are a bit less represented.
As you can see, all we get out of the function is information about the class and mode of
the objects.
Okay.
As usual, we can use summary() on a single variable or only a selection of variables,
if we are not interested in getting these basic descriptive stats for everything, en
masse, but we will learn how to slice through a data frame in the next lesson.
Alright, that’s it for this video, guys!
Have a play around with some of the data we have provided or with R’s pre-packaged data
sets, which you can find by calling data().
Alright!
I’ll see you next time!