Pandas: Adding/Modifying Columns¶
Example 1: Lowercasing A Column Of Strings¶
Email addresses are case-insensitive, by law
The dataset has them mixed
import pandas as pd
persons = pd.DataFrame({
'firstname': ['Joerg', 'Johanna', 'Caro', 'Philipp' ],
'lastname': ['Faschingbauer', 'Faschingbauer', 'Faschingbauer', 'Lichtenberger' ],
'email': ['JF@faschingbauer.co.at', 'Johanna@email.com', 'Caro@email.com', 'PHILIPP@email.com'],
'age': [56, 27, 25, 37 ],
})
persons['email']
0 JF@faschingbauer.co.at
1 Johanna@email.com
2 Caro@email.com
3 PHILIPP@email.com
Name: email, dtype: object
Example 1: Modifying The email
Column¶
Pull out
email
email = persons['email']
Lowercase that, using vectorized string methods of
Series
email.str.lower()
0 jf@faschingbauer.co.at 1 johanna@email.com 2 caro@email.com 3 philipp@email.com Name: email, dtype: object
lower_email = email.str.lower()
Assign back into
persons
DataFrame
persons['email'] = lower_email
persons
firstname lastname email age 0 Joerg Faschingbauer jf@faschingbauer.co.at 56 1 Johanna Faschingbauer johanna@email.com 27 2 Caro Faschingbauer caro@email.com 25 3 Philipp Lichtenberger philipp@email.com 37 In short
persons['email'] = persons['email'].str.lower()
Example 2: Adding A normalized_email
Column¶
import pandas as pd
persons = pd.DataFrame({
'firstname': ['Guido', 'Joerg', 'Johanna', 'Caro', 'Philipp'],
'lastname': ['Rentner', 'Faschingbauer', 'Faschingbauer', 'Faschingbauer', 'Lichtenberger'],
'email': ['jf@old.com', 'JF@faschingbauer.co.at', 'Caro@email.com', 'Johanna@email.com', 'PHILIPP@email.com'],
'age': [69, 56, 27, 25, 37],
})
It’s as simple as assigning a column that does not yet exist
persons['normalized_email'] = persons['email'].str.lower()
What If No Prebuilt Functionality Exists? apply()
To The Rescue!¶
Simple example: Python’s built-in
len()
function: one parameter, and return values = 'Hello' len(s)
5
Apply that on a
Series
; e.g.firstname
fn = persons['firstname'] fn
0 Guido 1 Joerg 2 Johanna 3 Caro 4 Philipp Name: firstname, dtype: object
Length of each
firstname
fn.apply(len)
0 5 1 5 2 7 3 4 4 7 Name: firstname, dtype: int64
apply()
-ing Custom Functions¶
Write single-parameter function (just like
len()
)def is_palindrome(s): s = s.lower() return s == s[::-1]
persons
firstname lastname email age normalized_email 0 Guido Rentner jf@old.com 69 jf@old.com 1 Joerg Faschingbauer JF@faschingbauer.co.at 56 jf@faschingbauer.co.at 2 Johanna Faschingbauer Caro@email.com 27 caro@email.com 3 Caro Faschingbauer Johanna@email.com 25 johanna@email.com 4 Philipp Lichtenberger PHILIPP@email.com 37 philipp@email.com Apply it
persons['lastname'].apply(is_palindrome)
0 True 1 False 2 False 3 False 4 False Name: lastname, dtype: bool