This lesson is in the early stages of development (Alpha version)

Introduction to Machine Learning in R: Glossary

Key Points

Introduction to machine learning
  • Machine learning is a set of tools and techniques to find patterns in data.

  • Some machine learning techniques are useful for predicting something given some input data.

  • Some machine learning techniques are useful for classifying input data and working out which class it belongs to.

  • Artificial Intelligence is a broader term that refers to making computers show human like intelligence.

  • Some people say Artificial Intelligence to mean machine learning

  • All machine learning systems have some kinds of limitations

Clustering
  • Clustering is a form of unsupervised learning

  • Unsupervised learning algorithms don’t need training

  • Kmeans is a popular clustering algorithm.

  • Kmeans struggles where one cluster exists within another, such as concentric circles.

  • Spectral clustering is another technique which can overcome some of the limitations of Kmeans.

  • Spectral clustering is much slower than Kmeans.

Dimensional Reduction
  • PCA is a linear dimensionality reduction technique for tabular data

  • t-SNE is another dimensionality reduction technique for tabular data that is more general than PCA

Regression
  • We can model linear data using a linear or least squares regression.

  • A linear regression model can be used to predict future values.

  • We should split up our training dataset and use part of it to test the model.

  • For non-linear data we can use logarithms to make the data linear.

day 1 practical
Non-Linear Classifiers
  • Learning powerful library’s to implement machine learning functions.

  • Used non-linear machine learning models to predict results

Neural Networks
  • Perceptrons are artificial neurons which build neural networks.

  • A perceptron takes multiple inputs, multiplies each by a weight value and sums the weighted inputs. It then applies an activation function to the sum.

  • A single perceptron can solve simple functions which are linearly separable.

  • Multiple perceptrons can be combined to form a neural network which can solve functions that aren’t linearly separable.

  • Training a neural network requires some training data to show the network examples of what to learn.

  • To validate our training we split the the training data into a training set and a test set.

  • To ensure the whole dataset can be used in training and testing we can train multiple times with different subsets of the data acting as training/testing data. This is called cross validation.

  • Several companies now offer cloud APIs where we can train neural networks on powerful computers.

Ethics and Implications of Machine Learning
  • Machine learning is often thought of as unbiased and impartial. But if the training data is biased the machine learning will be.

  • Many machine learning algorithms can’t explain how they arrived at a decision.

  • There is a lot of concern about how machine learning can be used for unethical purposes.

  • No machine learning system is 100% accurate, think about the implications of false positives and false negatives.

Find out more
  • This course has only touched on a few areas of machine learning.

  • Machine learning is a large and growing field.

  • This course is designed to teach you just enough to do something useful.

  • Machine learning is a rapidly developing field and new tools and techniques are constantly appearing.

day 2 practical

Glossary