Introduction to Machine Learning in R: Glossary

Key Points

Introduction to machine learning	Machine learning is a set of tools and techniques to find patterns in data. Some machine learning techniques are useful for predicting something given some input data. Some machine learning techniques are useful for classifying input data and working out which class it belongs to. Artificial Intelligence is a broader term that refers to making computers show human like intelligence. Some people say Artificial Intelligence to mean machine learning All machine learning systems have some kinds of limitations
Clustering	Clustering is a form of unsupervised learning Unsupervised learning algorithms don’t need training Kmeans is a popular clustering algorithm. Kmeans struggles where one cluster exists within another, such as concentric circles. Spectral clustering is another technique which can overcome some of the limitations of Kmeans. Spectral clustering is much slower than Kmeans.
Dimensional Reduction	PCA is a linear dimensionality reduction technique for tabular data t-SNE is another dimensionality reduction technique for tabular data that is more general than PCA
Regression	We can model linear data using a linear or least squares regression. A linear regression model can be used to predict future values. We should split up our training dataset and use part of it to test the model. For non-linear data we can use logarithms to make the data linear.
day 1 practical
Non-Linear Classifiers	Learning powerful library’s to implement machine learning functions. Used non-linear machine learning models to predict results
Neural Networks	Perceptrons are artificial neurons which build neural networks. A perceptron takes multiple inputs, multiplies each by a weight value and sums the weighted inputs. It then applies an activation function to the sum. A single perceptron can solve simple functions which are linearly separable. Multiple perceptrons can be combined to form a neural network which can solve functions that aren’t linearly separable. Training a neural network requires some training data to show the network examples of what to learn. To validate our training we split the the training data into a training set and a test set. To ensure the whole dataset can be used in training and testing we can train multiple times with different subsets of the data acting as training/testing data. This is called cross validation. Several companies now offer cloud APIs where we can train neural networks on powerful computers.
Ethics and Implications of Machine Learning	Machine learning is often thought of as unbiased and impartial. But if the training data is biased the machine learning will be. Many machine learning algorithms can’t explain how they arrived at a decision. There is a lot of concern about how machine learning can be used for unethical purposes. No machine learning system is 100% accurate, think about the implications of false positives and false negatives.
Find out more	This course has only touched on a few areas of machine learning. Machine learning is a large and growing field. This course is designed to teach you just enough to do something useful. Machine learning is a rapidly developing field and new tools and techniques are constantly appearing.
day 2 practical