CSV Files¶
CSV in its Origins¶
Simple in principle
Line-based file
Each line has a record
Each record has the same number of fields
Each field has a meaning
Here is a simple CSV file in its very original form
5,Joerg,Faschingbauer
3,Sepp,Huber
7,Elizabeth,Queen
No heading
Commas (“,”) as field separators
ASCII encoded
The csv
Module: csv.reader
¶
csv.reader
has sane defaults (documentation)Use it in its basic form
import csv
f = open('simple.csv')
rdr = csv.reader(f)
for row in rdr:
print(row)
$ python read-simple.py
['5', 'Joerg', 'Faschingbauer']
['3', 'Sepp', 'Huber']
['7', 'Elizabeth', 'Queen']
And Tuple Unpacking?¶
One row is a list
Same number of elements (in the sane CSV case)
Tuple unpacking useful
import csv
f = open('simple.csv')
rdr = csv.reader(f)
for num, firstname, lastname in rdr:
print(f'ID: {num} --- First name: {firstname} - Last name: {lastname}')
$ python read-simple-tupleunpacked.py
ID: 5 --- First name: Joerg - Last name: Faschingbauer
ID: 3 --- First name: Sepp - Last name: Huber
ID: 7 --- First name: Elizabeth - Last name: Queen
Going Insane: CSV? SCSV?¶
CSV stands for “Comma Separated Values”
One popular CSV producer produces SCSV (“SemiColon Separated Values”) files though.
5;Joerg;Faschingbauer
3;Sepp;Huber
7;Elizabeth;Queen
One would configure the delimiter in such a case …
import csv
f = open('simple-semicolon.csv')
rdr = csv.reader(f, delimiter=';')
for num, firstname, lastname in rdr:
print(f'ID: {num} --- First name: {firstname} - Last name: {lastname}')
More Tuning Needed: Quoting¶
But let’s use our original CSV (that with the commas)
5,Joerg,Faschingbauer
3,Sepp,Huber
7,Elizabeth,Queen
And let’s say, Queen Elizabeth would jump up and require us to spell her full name, “Queen Elizabeth, II”
The resulting file would suddenly have four fields in the “Queen” row
5,Joerg,Faschingbauer
3,Sepp,Huber
7,Elizabeth, II,Queen
A correctly escaped CSV would look like so:
5,Joerg,Faschingbauer
3,Sepp,Huber
7,"Elizabeth, II",Queen
Tuning Options¶
Perhaps redundantly (the delimiter
is ‘,’ by default, and the
quotechar
is ‘”’ by default), we would read
escaped.csv
like so,
import csv
f = open('escaped.csv')
rdr = csv.reader(f, delimiter=',', quotechar='"')
for num, firstname, lastname in rdr:
print(f'ID: {num} --- First name: {firstname} - Last name: {lastname}')
$ python read-escaped.py
ID: 5 --- First name: Joerg - Last name: Faschingbauer
ID: 3 --- First name: Sepp - Last name: Huber
ID: 7 --- First name: Elizabeth, II - Last name: Queen
Many more tuning options are available; for a complete list see the documentation.
And Headings?¶
Another common practice among CSV producers is to export
the first line to contain the field names
remaining lines to contain spreadsheet data
If the original spreadsheet would look like so,
ID |
Firstname |
Lastname |
---|---|---|
5 |
Joerg |
Faschingbauer |
3 |
Sepp |
Huber |
7 |
Elizabeth, II |
Queen |
, the exported CSV would be,
ID,Firstname,Lastname
5,Joerg,Faschingbauer
3,Sepp,Huber
7,"Elizabeth, II",Queen
Enter csv.DictReader
¶
This - the column names come with the CSV - is a case for
csv.DictReader
(see documentation).csv.DictReader
, when given no explicitfieldnames
parameter, does exactly that: take the information from the first line as keys into a dictionary, and spit the rest out as dictionaries with this shape.
import csv
from pprint import pprint
f = open('headers.csv')
rdr = csv.DictReader(f, delimiter=',', quotechar='"')
for record in rdr:
pprint(record, width=30)
$ python read-headers.py
{'Firstname': 'Joerg',
'ID': '5',
'Lastname': 'Faschingbauer'}
{'Firstname': 'Sepp',
'ID': '3',
'Lastname': 'Huber'}
{'Firstname': 'Elizabeth, II',
'ID': '7',
'Lastname': 'Queen'}