Pandas: Filters¶
import pandas as pd
persons = pd.DataFrame({
'firstname': ['Joerg', 'Johanna', 'Caro', 'Philipp' ],
'lastname': ['Faschingbauer', 'Faschingbauer', 'Faschingbauer', 'Lichtenberger' ],
'email': ['jf@faschingbauer.co.at', 'johanna@email.com', 'caro@email.com', 'philipp@email.com'],
'age': [56, 27, 25, 37 ],
})
Basic Filter Example¶
persons.loc[persons['lastname'] == 'Faschingbauer']
firstname | lastname | age | ||
---|---|---|---|---|
0 | Joerg | Faschingbauer | jf@faschingbauer.co.at | 56 |
1 | Johanna | Faschingbauer | johanna@email.com | 27 |
2 | Caro | Faschingbauer | caro@email.com | 25 |
Comprehensive
More operators ⟶ later
How does that work?
Filter? Bitmap!¶
A filter is a boolean expression
Returns a
Series
ofTrue
andFalse
values ⟶ a bitmapGiven to
loc[]
to select rows of interestflt = (persons['lastname'] == 'Faschingbauer') flt
0 True 1 True 2 True 3 False Name: lastname, dtype: bool
type(flt)
pandas.core.series.Series
Content of
persons
persons
firstname lastname email age 0 Joerg Faschingbauer jf@faschingbauer.co.at 56 1 Johanna Faschingbauer johanna@email.com 27 2 Caro Faschingbauer caro@email.com 25 3 Philipp Lichtenberger philipp@email.com 37 Filter applied to
persons
withloc[]
persons.loc[flt]
firstname lastname email age 0 Joerg Faschingbauer jf@faschingbauer.co.at 56 1 Johanna Faschingbauer johanna@email.com 27 2 Caro Faschingbauer caro@email.com 25
Confusion: Filter/Column Addressing¶
User friendliness: filter can be directly applied to
DataFrame
persons[flt]
firstname lastname email age 0 Joerg Faschingbauer jf@faschingbauer.co.at 56 1 Johanna Faschingbauer johanna@email.com 27 2 Caro Faschingbauer caro@email.com 25 Looks like columns were selected
⟶ confusion
Column Selection And Filters¶
Filters select rows, just like
persons.loc[0]
selects row by labelConsequently, can be combined with column selection
persons.loc[flt, 'firstname']
0 Joerg
1 Johanna
2 Caro
Name: firstname, dtype: object
Boolean Expressions: Bitwise Operators! Braces!¶
flt = (persons['firstname'] == 'Joerg') & (persons['lastname'] == 'Faschingbauer') | (persons['firstname'] == 'Philipp')
Confusion: braces around
==
are important!Bitwise
&
binds stronger than==
Wish they had used the normal boolean operators instead
But those cannot be overloaded (they are more like control flow tools)
Anyway,
persons.loc[flt]
firstname | lastname | age | ||
---|---|---|---|---|
0 | Joerg | Faschingbauer | jf@faschingbauer.co.at | 56 |
3 | Philipp | Lichtenberger | philipp@email.com | 37 |
Negation: ~
¶
Consequently 🙄, bitwise negation was chosen to mean logical
NOT
persons.loc[~flt]
firstname | lastname | age | ||
---|---|---|---|---|
1 | Johanna | Faschingbauer | johanna@email.com | 27 |
2 | Caro | Faschingbauer | caro@email.com | 25 |
Neat Helpers¶
flt = persons['firstname'].isin(['Caro', 'Philipp'])
persons[flt]
firstname | lastname | age | ||
---|---|---|---|---|
2 | Caro | Faschingbauer | caro@email.com | 25 |
3 | Philipp | Lichtenberger | philipp@email.com | 37 |
flt = persons['firstname'].str.startswith('J')
persons[flt]
firstname | lastname | age | ||
---|---|---|---|---|
0 | Joerg | Faschingbauer | jf@faschingbauer.co.at | 56 |
1 | Johanna | Faschingbauer | johanna@email.com | 27 |