Python Grundlagen (13.5.2020 - 14.5.2020 bei einer Firma in Graz)¶
Environment¶
We will try to follow a draft plan, based upon up-front discussion. Please don’t take this as a hard rule - we will take ourselves the freedom to spontaneously go deeper into one topic, at the cost of another.
Unit Testing and Test Driven Development¶
Part of the requirements was to spend a few words on unit testing and test driven development. I take the opportunity to kind of invert the training scenario, into something that comes into being using the basis of all agile methodologies. Exercises will not have textual descriptions, for example, but will be formulated as unit tests that initially fail (naturally).
Python Installation¶
The language itself consists of the Python interpreter itself, and a rather complete set of modules (one says, “Python comes with batteries included”). This - the python installation - is the primary focus of this training. We might look into NumPy and/or Pandas a bit.
Note
While the training material covers Python versions 2 and 3, time has come to consider version 2 obsolete.
Please choose Python 3 when installing!
For the matter of this training, for diadactical purposes, I suggest we use the standard Python installation,
Download Windows installer from here, and go through the installation process. Take care to check the “add python to path” box.
(For Linuxers, Python usually comes as part of your favorite distribution and is already installed.)
If there is the need to install packages that are not contained in Python’s own set of packages, we will install them using
pip
.
Data scientists often use a distribution named Anaconda which brings the standard Python installation and a large set of set of pre-packaged external extensions [1] . If you are already familiar with Anaconda, then I don’t object.
Programming Environment¶
As we are all programmers to a certain extent, we know what tools to use. For example, the training does not dictate which IDE (or editor) a participant uses. The exercises are not voluminous enough to justify that, after all; a simple text editor like Nodepad++ is sufficient.
That said, here’s a list of IDEs/editors that are frequently used for Python programming. It is in no particular order, and far from being complete.
Visual Studio Code. Not to be confused with Visual Studio, Visual Studio Code is actually a modern text editor, not an IDE. Together with its configurabilty, it can be turned into one, but by itself does not dictate anything upon the user.
PyCharm. I frequently see people use it, so it cannot be all that bad.
Eclipse and PyDev. Definitely a heavy weight (regarding memory footprint at least) among IDEs, Eclipse knows how to handle Python.
Spyder. It is used by data scientists a lot. Running code in it feels like a Jupyter Notebook execution in that there are seemingly strange “cell” like dependencies. (Take this into account when you decide to go with it.)
Emacs. (I had to say that.) Your trainer will use it to do occasional live hacking demos. Watching someone use it is ok, but learning how to use it requires a nontrivial amount of patience.
Topics¶
Day 1: Language Basics¶
Unit testing and Test Driven Development (preparing the basics for the remainder of the training)
Very basics: syntax, datatypes, variables
Control flow constructs:
if
,while
,for
Complex datatypes:
list
,set
,dict
Mutability and immutabiliy:
tuple
Functions and parameter passing
Closures
Iteration and Generators
Day 2: Advanced Topics¶
More about slicing, and about its use in NumPy
Exception handling
Modules and packages (“namespaces”)
Maybe a larger group exercise, to consolidate news from two days.
Wrap-Up¶
How Was It?¶
The training was done online on Zoom, due to the Corona crisis. This was my second online experience (first one is here), and I must say online is not much different from face-to-face. Questions were asked at a normal rate, nobody slept over (at least I did not see anybody falling from their chairs). I would have liked to see faces more, though, and I am definitely missing the off-topic communication during breaks and lunch. All in all, though, I definitely can say that there is no reason to not do trainings online.
That said, we probably tried to squeeze a little too much into only two days. To make the bigger part of the audience more happy, we should have probably explicitly agreed to strip basics (which the plan had dedicated day one to), at the cost of some in the audience who were not so advanced. Such things happen from time to time in trainings, it would appear that it’s the trainer’s job to detect such situations more early. My takeaway is that it is very important to state facts clearly and early, especially in settings where you cannot rely on your nonverbal antennae.
Topics¶
Being a stubborn greybeard though, I use to insist in bringing big
pictures (which Python’s iteration, (im)mutability, and exec()
belong to, among others), which I definitely did.
Day two was dedicated to a walk through the unittest
module,
together with a sketch of what Test Driven Development could do for
you. We thereby saw what Python modules and packages are, and how
modularization is done in Python. $PYTHONPATH
and such. To wrap
this up, the sketch ended with a discussion of distutils
. We saw
what a setup.py
file adds, and discussed what (possibly
continuous?) integration and deployment is at such a small
scale. (Probably Azure DevOps is a rather heavyweight solution to that
little local problem; it might solve problems that kept out of reach
of this little local training though.)
Later in the afternoon of day two, we were only able to scratch the surface of parallel programming (not among the agreed topics) by discussing how threading is done in Python. We saw how the Global Interpreter Lock (GIL) enables simplicity, but also makes true parallism nearly impossible.
Some topics have only been covered on their surface, others not at all. Clearly two days can’t have it all, so what follows is a list of YouTube links. Opinionated recommendations of mine to expand all those topics that would have been interesting to cover, but which we haven’t had the time for.
Links¶
My favorite Python videos are those that are both entertaining and informative, and long. Among those, many are by David Beazley (a freelance trainer who teaches Python) and Raymond Hettinger (same, in addition to being a Python core developer). I am slightly biased towards Beazley because I like his sense of humor.
For short and to-the-point tutorials, below, I recommend (and cite below) Corey Schafer (for general topics), and Keith Galli for data science.
Anyway, here a couple of links
Modern dictionaries: Raymond Hettinger emphasizing on dictionaries, even more than I did. (Hehe, I just discovered that he’s bringing my quick hash table explanation to a conclusive end. Hard stuff for the unaware though.)
Understanding the Python GIL: David Beazley dissecting the Global Interpreter Lock, explaining why multiprocessing is better. At around minute 45, in the questions/answers, there a mention that using NumPy operations in multiple threads is truly parallel.
Concurrency: Raymond Hettinger covering most if not all aspects and possiblities of concurrency. Very informative, very concise, covering
Multithreading
Multiprocessing
Async; I didn’t even mention that. asyncio. Me big fan.
Modules and Packages. David Beazley has a three hour (!) really cool and in-depth look into the seemingly simple
import
mechanism.Unit Testing: Corey Schafer (he has a number of really good and short tutorial videos; look out for him as you search).
Virtual Environments Tutorial: Corey Schafer again. Virtual environments are kind of an isolated development sandbox, solving a similar problem as containers do, but much more lightweight and Python only.
Packaging, Deployment, PyPI, and pip: Chris Wilcox (of Google) talking about packaging and deployment, and related topics
Generators Tutorial: Corey Schafer again, this time 11 minutes on generators.
Generators: The Final Frontier: David Beazley, again a bit (a whopping four hours) more precise on that topic.
Decorators Tutorial: Corey Schafer on decorators and closures.
NumPy Tutorial: Keith Galli has a number of good data science tutorials.
Here and here you might want to read up on another Python training I gave last year; me getting in touch with NumPy more closely.
Simulating COVID-19 using Python, NumPy & Matplotlib: finishing the list, I found that funny. Apart from that, it is a good live hacking session that brings it all together. You might like Matplotlib, btw.
Footnotes