- Save
- Run All Cells
- Clear All Output
- Runtime
- Download
- Difficulty Rating
Loading Runtime
Looping over a dataframe accesses column headers
When you loop over a dataframe in Pandas, the default behavior accesses the different column headers of the dataframe.
In this case, this isn't what we want.
a
b
This can be used to access the different columns of a dataframe using bracket syntax.
0 1
1 2
2 3
Name: a, dtype: int64
0 4
1 5
2 6
Name: b, dtype: int64
However, this is not always what we want. There are many occasions where we want to iterate over the rows of a dataframe rather than the columns.
How to iterate over the rows of a dataframe rather than the columns
Method 1: df.iterrows()
df.iterrows()
can be used to loop over the rows of a dataframe just as we would expect. In this instance the rows do not need to be accessed using any sort of index like we would have to do if we were looping over the the columns of a dataframe.
(0, a 1
b 4
Name: 0, dtype: int64)
(1, a 2
b 5
Name: 1, dtype: int64)
(2, a 3
b 6
Name: 2, dtype: int64)
Method 2: Use df.values
to access the numpy array version of the dataframe.
Every dataframe has an underlying two-dimensional numpy array equivalent. We can access this version of a dataframe by using df.values
. df.values
by default gives us a row vectors instead of column vectors which we can then easily loop over.
[[1 4]
[2 5]
[3 6]]
[1 4]
[2 5]
[3 6]
Side note, if you're using df.values and you don't want row vectors, you can just transpose it using .T
and you'll get the column vectors instead.
[1 2 3]
[4 5 6]