Machine Learning: Concepts and Terminology¶
This topic is an introductory one. We will only scratch the surface; there is much terminology to learn, and this is what this topic is about. Data science, machine learning, and artificial intelligence are a very broad field, and there is much more to it.
See this video for a really good and thorough introduction. Beware though, it is a whopping 4 hours long.
Data scientists are not always known to be good (let alone diligent) programmers. I have seen people use Jupyter Notebooks as editor and runtime environment for their programs. While I like notebooks to play around, creating nicely looking web pages with plots and charts as a side effect (really cool), there is a point where one has to bite the bullet and start to program. The latter is the focus of these AI topics.
See Also¶
Concrete applications of the concepts and terminology introduced in this topics can be found here:
How Far Is Mankind from Creating God¶
Artificial Narrow Intelligence (ANI, Weak AI)
Stage that we are at now: can solve special problems
Weather forecast
Image recognition
Autonomous driving
Artificial General Intelligence (AGI, Strong AI)
By far not there: can do everything a human can
Artificial Super Intelligence (ASI)
Terminator and such
Basic Terminology: Algorithm and Model¶
Algorithm. For example …
Linear regression
Decision tree
Random forest
(many many more)
Model. Trained by using an algorithm.
Uses the algorithm
Takes the input and maps it to output
Built through training.
Basic Terminology: Features and Data¶
Input features or predictor variables
Set of variables used as input to the model
Output features or response/target variables
Set of variables calculated by the model, based on input features
Training Data.
Used to create the model (the more the better)
Divided (spliced) into two parts
Training data; used to actually create/train the model
Testing data; used to test the efficiency/accuracy of it
Types of Machine Learning¶
Supervised Learning.
Each input training datum has its known/desired output attached as a label.
Used for regression and classification
Unsupervised Learning.
Works on unlabeled data.
Creates clusters on its own, identifying features.
Used for association and clustering
Reinforcement Learning.
Agent learns from actions by measuring rewards. Rather advanced. No training. Trial and error.
Problems Solved¶
Regression
Output: continuous quantity (usually a forecast of something)
Solved by supervised learning algorithms like Linear Regression.
See topic: Linear Regression
Classification
Output: categorical quantity (“spam or not”)
Solved by supervised learning algorithms like
Support Vector Machines
Naive Bayes
Logistic Regression
K Nearest Neighbor
Clustering
Output: clusters of input data
Solved by unsupervised learning algorithms like K-means
See topic: K-Means