Linear Regression¶
Topic Overview¶
Linear Regression serves as an introductory example to “data science”. The algorithm is simple to use, and the problems (predictions, usually) that it solves are easy to understand.
Here we use weather history data [1] to demonstrate how
a model is built
a model is verified
a model is used to predict
To keep the demo simple, the set of input features is one-dimensional - containing only the minimum day temperature -, as is the set of output features - maximum day temperature. Normally, at least the input set is multi dimensional, and carefully chosen. The bigger part of data science revolves around just this discipline - choosing the set of input features.
Also, you will agree that is complete nonsense to choose “minimum day temperature” as single input feature, and to predict the maximum day temperature from it. We do just that.
Jupyter Notebook¶
A Jupyter Notebook (download:
linear_regression.ipynb
) is the heart of the topic. It has
much of the code, together with explanations.
Running Code¶
While such notebooks are cute for research papers and for trying around, one should not use them as an IDE and/or runtime environments - they are not.
Here’s a running program that does the same, but does not show the fancy graphs that nobody needs.
Dataset¶
Both the notebook and the program use a test dataset (download: history_data.csv
).
Footnotes