Pandas: Selecting Rows (And Columns) With iloc[]
¶
import pandas as pd
persons = pd.DataFrame({
'firstname': ['Joerg', 'Johanna', 'Caro', 'Philipp' ],
'lastname': ['Faschingbauer', 'Faschingbauer', 'Faschingbauer', 'Lichtenberger' ],
'email': ['jf@faschingbauer.co.at', 'johanna@email.com', 'caro@email.com', 'philipp@email.com'],
'age': [56, 27, 25, 37 ],
})
Row By Number: iloc[]
¶
Note the index column
… has no explicit column name
Default index (unless configured explicitly): row numbers
⟶ integers
⟶
iloc
, for integer locationpersons.iloc[1]
firstname Johanna lastname Faschingbauer email johanna@email.com age 27 Name: 1, dtype: object
type(persons.iloc[1])
pandas.core.series.Series
Out-of-range access not possible (see Pandas: Adding Rows)
persons.iloc[4]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Cell In[4], line 1
----> 1 persons.iloc[4]
File ~/My-Environments/jfasch-home/lib64/python3.12/site-packages/pandas/core/indexing.py:1153, in _LocationIndexer.__getitem__(self, key)
1150 axis = self.axis or 0
1152 maybe_callable = com.apply_if_callable(key, self.obj)
-> 1153 return self._getitem_axis(maybe_callable, axis=axis)
File ~/My-Environments/jfasch-home/lib64/python3.12/site-packages/pandas/core/indexing.py:1714, in _iLocIndexer._getitem_axis(self, key, axis)
1711 raise TypeError("Cannot index by location index with a non-integer key")
1713 # validate the location
-> 1714 self._validate_integer(key, axis)
1716 return self.obj._ixs(key, axis=axis)
File ~/My-Environments/jfasch-home/lib64/python3.12/site-packages/pandas/core/indexing.py:1647, in _iLocIndexer._validate_integer(self, key, axis)
1645 len_axis = len(self.obj._get_axis(axis))
1646 if key >= len_axis or key < -len_axis:
-> 1647 raise IndexError("single positional indexer is out-of-bounds")
IndexError: single positional indexer is out-of-bounds
So What Is A Row, Then?¶
row = persons.iloc[1]
type(row)
pandas.core.series.Series
row
firstname Johanna
lastname Faschingbauer
email johanna@email.com
age 27
Name: 1, dtype: object
Series
Non-default index ⟶ column name
Best accessed using
loc[]
, using the column nameOr the column index/number (clumsy though)
row.loc['firstname']
'Johanna'
row.iloc[0]
'Johanna'
Selecting Multiple Rows¶
Using list of column numbers as
iloc[]
subscript parameterpersons.iloc[[0,1]] # <--- single list [0,1] inside []
firstname lastname email age 0 Joerg Faschingbauer jf@faschingbauer.co.at 56 1 Johanna Faschingbauer johanna@email.com 27 Note how the index is an integer again
⟶ two rows selected
⟶
DataFrame
, notSeries
Slicing¶
[0,1]
(contiguous range) is alternatively expressed as a slice[0:2]
0 … inclusive
2 … exclusive
persons.iloc[0:2] # <--- note: no double squares! 0:2 *is* [0,1]
firstname lastname email age 0 Joerg Faschingbauer jf@faschingbauer.co.at 56 1 Johanna Faschingbauer johanna@email.com 27
Selecting Rows And Columns¶
iloc[]
selects rows, primarilyCan select columns from those in the same step
Example: row 1, column 2 (which is
email
)persons.iloc[1, 2]
'johanna@email.com'
Example: rows 0 and 1 (i.e. two rows), column 2 (
email
)persons.iloc[[0,1], 2]
0 jf@faschingbauer.co.at 1 johanna@email.com Name: email, dtype: object
Example …
persons.iloc[[0,1], [0, 2]]
firstname email 0 Joerg jf@faschingbauer.co.at 1 Johanna johanna@email.com Example: slices … note that the end is exclusive with
iloc[]
(as opposed toloc[]
; see Pandas: Selecting Rows (And Columns) With loc[])persons.iloc[0:2, 0:3]
firstname lastname email 0 Joerg Faschingbauer jf@faschingbauer.co.at 1 Johanna Faschingbauer johanna@email.com
Summary¶
Works with integers only
Cannot even specify columns by their names
Efficient though