File I/O¶
Python 2 vs. Python 3¶
Encoding, again: incompatibility alert!
Python 2 already had types
str
andbytes
… it just didn’t make a difference
Files are inherently binary, at the lowest level
… and so were Python 2’s files
Python 3 won’t let you mix
str
andbytes
Hard rule: “Transform to string as early as possible”
⟹ Transformation must be done inside file I/O
⟹ Files know about their encoding
⟹ Python 2 vs. Python 3
Opening Files¶
Files are opened to obtain a handle
f = open('/etc/passwd')
f
refers to an open fileBuffered IO (as
stdio
in C)Read-only (the default)
Python 3: UTF-8 encoded (the default, unless otherwise specified)
⟶ I/O is done in units of strings
f = open('/etc/passwd', encoding='ascii')
Reading Files¶
|
reads entire file content |
|
reads |
|
reads a line, including the terminating linefeed |
|
reads entire file ⟶ list of lines |
while True:
line = f.readline()
if len(line) == 0:
break
print(line)
for line in f.readlines():
print(line)
Reading Files: Pythonic¶
Iteration is a central theme in Python
Readability
Iterable: anything that can be iterated
Many things can be iterated
Fine-tunable behaviour and performance
Why shoudn’t we iterate files?
for line in f:
print(line)
Writing Files (1)¶
f = open('/tmp/some-file', 'w')
f.write('arbitrary content')
f.writelines(['one\n', 'two\n'])
print('one line (with automatic linefeed)', file=f)
Writing Files (2)¶
The beauty of iteration (again) …
writelines()
does not add linefeed (probably a misnomer)Items can come from any iterable
⟶ Very cool!
src = open('/etc/passwd', 'r')
dst = open('/tmp/passwd', 'w')
dst.writelines(src)
File Modes¶
Available mode
characters
|
open for reading (default) |
|
open for writing, truncating the file first |
|
open for exclusive creation, failing if the file already exists |
|
open for writing, appending to the end of the file if it exists |
|
binary mode (no encoding and decoding) |
|
text mode (default) |
|
open a disk file for updating (reading and writing) |
Combinations and their meanings
|
read/write/truncate |
|
read/write (write pointer at beginning) |
|
read/write (write pointer at end) |
Text vs. Binary Mode¶
Python 3 is Unicode clean. For file I/O this means …
Cannot pass
bytes
to a file opened in text modeCannot pass
str
to a file opened in binary modeUnless otherwise specified (
mode='b'
), files are in text mode
Python 2 is not Unicode clean
mode='b'
means “No stupid CR/LF conversion on Doze”bytes
orstr
, noone cares
Standard Streams¶
Good Ol’ Unix …
Number |
POSIX Macro |
Python equivalent |
---|---|---|
0 |
|
|
1 |
|
|
2 |
|
|
Interaktive Shell: all three associated with terminal
Standard input and output used for I/O redirection and pipes
Standard error receives errors, warnings, and debug output
Important
Windows-Programmers: no errors, warnings, and debug output to standard output!!
print('An error occurred', file=sys.stderr)