Difference between loc and iloc for Pandas DataFrame indexing

3 min read

Are you serious? loc and iloc?

As you see this topic, you might scoff at it. Isn't using loc and iloc one of the common subjects in Pandas?

Well, maybe.

As a whatever-you-might-you-call-it-practitioner who uses Pandas on a daily basis, we all know how to use loc and iloc.

Are you sure about that?

In most cases, many of us use each for separate purposes.

First, loc, more versatile tool is often used to subset or filter of the given DataFrame based on certain conditions. For iloc, it is obvious that it is used to get data with given range of indices.

In summary, we often use these two for separate reasons, not for the same reason. In other words, we can use loc to get data with given range of indices as iloc does.

Okay, so?

Unfortunately, these two doesn't handle the indexing in the same way.

iloc

As the name implies, iloc is specialized in indexing a DataFrame. Therefore, it works the same way as we deal with other data structures (e.g., lists).

The only caveat, or rather a difference from its counterpart , loc, is it handles index as if they are already resetted.

Look at the following example.

>>> a = pd.DataFrame({'a':[1,2,3],'b':[3,5,6]})
>>> b = a.loc[a.a != 2]
   a  b
0  1  3
2  3  6

What do you expect the result from b.iloc[1]?

>>> b.iloc[1]
a    3
b    6
Name: 2, dtype: int64

And, if run b.iloc[-1], it returns the last element of the DataFrame. (In this case, same as b.iloc[1])

>>> b.iloc[-1]
a    3
b    6
Name: 2, dtype: int64

Finally, let's do this seemingly obvious one before turning the page to loc.

>>> b.iloc[:0]
Empty DataFrame
Columns: [a, b]
Index: []

Now, let's see how loc works.

loc

Now, loc works very differently. First, it doesn't reset the index so you'll get the result by specifying the index from the subset DataFrame.

>>> a = pd.DataFrame({'a':[1,2,3],'b':[3,5,6]})
>>> b = a.loc[a.a != 2]
   a  b
0  1  3
2  3  6
>>> b.loc[2]
a    3
b    6
Name: 2, dtype: int64

Next, indexing by -1 doesn't work for loc.

>>> b.loc[-1]
Traceback (most recent call last):
  File "/Users/minpark/opt/miniconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 1625, in pandas._libs.hashtable.Int64HashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 1632, in pandas._libs.hashtable.Int64HashTable.get_item
KeyError: -1

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/minpark/opt/miniconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 895, in __getitem__
    return self._getitem_axis(maybe_callable, axis=axis)
  File "/Users/minpark/opt/miniconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 1124, in _getitem_axis
    return self._get_label(key, axis=axis)
  File "/Users/minpark/opt/miniconda3/lib/python3.7/site-packages/pandas/core/indexing.py", line 1073, in _get_label
    return self.obj.xs(label, axis=axis)
  File "/Users/minpark/opt/miniconda3/lib/python3.7/site-packages/pandas/core/generic.py", line 3738, in xs
    loc = index.get_loc(key)
  File "/Users/minpark/opt/miniconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3082, in get_loc
    raise KeyError(key) from err
KeyError: -1

Finally, a weird operation for indexing when we do index ranging by [int_a:int_b]. In Python indexing, we expect the data at the second range to be excluded (and in many other languages too).

However, with loc, it doesn't work in that way. it includes the data at that row as well.

>>> a.loc[:0]
   a  b
0  1  3

Therefore, using these for indexing purposes, if need to be more careful at least than subsetting, which is much more familiar to us.

CC BY-NC 4.0 © min park.RSS