Pandas: Indexes¶
import pandas as pd
persons = pd.DataFrame({
'firstname': ['Joerg', 'Johanna', 'Caro', 'Philipp' ],
'lastname': ['Faschingbauer', 'Faschingbauer', 'Faschingbauer', 'Lichtenberger' ],
'email': ['jf@faschingbauer.co.at', 'johanna@email.com', 'caro@email.com', 'philipp@email.com'],
'age': [56, 27, 25, 37 ],
})
Default Index: Row Number¶
persons
firstname | lastname | age | ||
---|---|---|---|---|
0 | Joerg | Faschingbauer | jf@faschingbauer.co.at | 56 |
1 | Johanna | Faschingbauer | johanna@email.com | 27 |
2 | Caro | Faschingbauer | caro@email.com | 25 |
3 | Philipp | Lichtenberger | philipp@email.com | 37 |
See how rows are numbered
No column name given
⟶ default index
persons.index
RangeIndex(start=0, stop=4, step=1)
Setting Custom Index¶
Notice how
email
appears to be unique⟶ could be used as an index
persons.set_index('email')
firstname lastname age email jf@faschingbauer.co.at Joerg Faschingbauer 56 johanna@email.com Johanna Faschingbauer 27 caro@email.com Caro Faschingbauer 25 philipp@email.com Philipp Lichtenberger 37 This does not change anything
Returns modified copy (could be assigned to another variable that you continue to work with, for example)
persons
is still the same as beforepersons
firstname lastname email age 0 Joerg Faschingbauer jf@faschingbauer.co.at 56 1 Johanna Faschingbauer johanna@email.com 27 2 Caro Faschingbauer caro@email.com 25 3 Philipp Lichtenberger philipp@email.com 37
Setting Custom Index, inplace=True
¶
Many (but not all)
DataFrame
methods support aninplace
parameterDefault
False
⟶ no change
Returns a modified copy of the
DataFrame
object
Nice for trying around on a large dataset that we don’t want to damage
Add
inplace
if everything works⟶ No return value
persons.set_index('email', inplace=True)
Modified object in-place
persons
firstname lastname age email jf@faschingbauer.co.at Joerg Faschingbauer 56 johanna@email.com Johanna Faschingbauer 27 caro@email.com Caro Faschingbauer 25 philipp@email.com Philipp Lichtenberger 37 Index has changed
persons.index
Index(['jf@faschingbauer.co.at', 'johanna@email.com', 'caro@email.com', 'philipp@email.com'], dtype='object', name='email')
Custom Index, And loc[]
¶
loc[]
selects by row label (⟶ index)Row labels are not row numbers anymore ⟶ cannot be used as row labels
persons.loc[0]
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) File ~/My-Environments/jfasch-home/lib64/python3.12/site-packages/pandas/core/indexes/base.py:3791, in Index.get_loc(self, key) 3790 try: -> 3791 return self._engine.get_loc(casted_key) 3792 except KeyError as err: File index.pyx:152, in pandas._libs.index.IndexEngine.get_loc() File index.pyx:181, in pandas._libs.index.IndexEngine.get_loc() File pandas/_libs/hashtable_class_helper.pxi:7080, in pandas._libs.hashtable.PyObjectHashTable.get_item() File pandas/_libs/hashtable_class_helper.pxi:7088, in pandas._libs.hashtable.PyObjectHashTable.get_item() KeyError: 0 The above exception was the direct cause of the following exception: KeyError Traceback (most recent call last) Cell In[9], line 1 ----> 1 persons.loc[0] File ~/My-Environments/jfasch-home/lib64/python3.12/site-packages/pandas/core/indexing.py:1153, in _LocationIndexer.__getitem__(self, key) 1150 axis = self.axis or 0 1152 maybe_callable = com.apply_if_callable(key, self.obj) -> 1153 return self._getitem_axis(maybe_callable, axis=axis) File ~/My-Environments/jfasch-home/lib64/python3.12/site-packages/pandas/core/indexing.py:1393, in _LocIndexer._getitem_axis(self, key, axis) 1391 # fall thru to straight lookup 1392 self._validate_key(key, axis) -> 1393 return self._get_label(key, axis=axis) File ~/My-Environments/jfasch-home/lib64/python3.12/site-packages/pandas/core/indexing.py:1343, in _LocIndexer._get_label(self, label, axis) 1341 def _get_label(self, label, axis: AxisInt): 1342 # GH#5567 this will fail if the label is not present in the axis. -> 1343 return self.obj.xs(label, axis=axis) File ~/My-Environments/jfasch-home/lib64/python3.12/site-packages/pandas/core/generic.py:4236, in NDFrame.xs(self, key, axis, level, drop_level) 4234 new_index = index[loc] 4235 else: -> 4236 loc = index.get_loc(key) 4238 if isinstance(loc, np.ndarray): 4239 if loc.dtype == np.bool_: File ~/My-Environments/jfasch-home/lib64/python3.12/site-packages/pandas/core/indexes/base.py:3798, in Index.get_loc(self, key) 3793 if isinstance(casted_key, slice) or ( 3794 isinstance(casted_key, abc.Iterable) 3795 and any(isinstance(x, slice) for x in casted_key) 3796 ): 3797 raise InvalidIndexError(key) -> 3798 raise KeyError(key) from err 3799 except TypeError: 3800 # If we have a listlike key, _check_indexing_error will raise 3801 # InvalidIndexError. Otherwise we fall through and re-raise 3802 # the TypeError. 3803 self._check_indexing_error(key) KeyError: 0
New row label:
email
persons.loc['jf@faschingbauer.co.at']
firstname Joerg lastname Faschingbauer age 56 Name: jf@faschingbauer.co.at, dtype: object
persons.loc[['jf@faschingbauer.co.at', 'johanna@email.com']]
firstname lastname age email jf@faschingbauer.co.at Joerg Faschingbauer 56 johanna@email.com Johanna Faschingbauer 27
Custom Index, And iloc[]
¶
iloc[]
selects by row number⟶ still valid as before
persons.iloc[0]
firstname Joerg
lastname Faschingbauer
age 56
Name: jf@faschingbauer.co.at, dtype: object
persons.iloc[[0, 1]]
firstname | lastname | age | |
---|---|---|---|
jf@faschingbauer.co.at | Joerg | Faschingbauer | 56 |
johanna@email.com | Johanna | Faschingbauer | 27 |
Sorting DataFrame
Object By Index Column¶
DataFrame.sort_index()
: noninplace
by default ⟶ returns modified copypersons.sort_index(ascending=True)
firstname lastname age email caro@email.com Caro Faschingbauer 25 jf@faschingbauer.co.at Joerg Faschingbauer 56 johanna@email.com Johanna Faschingbauer 27 philipp@email.com Philipp Lichtenberger 37 Sorting in place
persons.sort_index(ascending=True, inplace=True)
persons
firstname lastname age email caro@email.com Caro Faschingbauer 25 jf@faschingbauer.co.at Joerg Faschingbauer 56 johanna@email.com Johanna Faschingbauer 27 philipp@email.com Philipp Lichtenberger 37
Links¶
Corey Schafer: Python Pandas Tutorial (Part 3): Indexes - How to Set, Reset, and Use Indexes
Data School: How do I use the MultiIndex in pandas?