"Machine Learning" , the buzzword used in a lot of applications nowadays : Stock Trading , Data Analysis , Data Prediction etc . So what exactly is machine learning ? Today we will see what machine learning is , Algorithms used for prediction in machine learning and a basic "Hello World " program of 10 lines in machine learning .
Pre-requisites :
- Basic Knowledge of Programming Language : Python
- Python (64 bit preferable ) installed on the system . Download it here
- Python IDE : Pycharm (Basic python IDLE can also be used but Pycharm is more preferable as python libraries can be easily installed ithrough it ) Download it here
- Python Libraries : pandas , scipy , numpy , sklearn ( Easily Installable through Pycharm)
Machine-Learning :
The term Machine Learning can be traced back to 1959 however we see that Machine Learning has come to light in recent 5-7 years . The reason for this is during the 19th century there was not much improvement in hardware to process a huge data-set and also the rate of data generation was very less as compared to present day.
Machine Learning is divided into two types : Supervised Machine Learning and Unsupervised Machine Learning. In supervised Machine Learning we provide the data as well as its outcome and then the machine find the pattern from the data and same pattern is applied to a new data to predict it's outcome. In unsupervised Machine Learning , only the data is provided and the machine figures out the pattern itself and generates the outcome. We will first study supervised Machine Learning.
In Machine Learning we train the machine by giving 'features' and 'labels' as the input. So what are these features and labels ? Consider an example that we write a machine learning program to distinguish a man and a woman , so the features for this type of example are : Height , Weight , Foot-size etc . Thus the important qualities which can be used to classify a particular data are features. Labels are in this case Man and Woman that is what we want to predict .
We make a collection of such data which can be called as data-set having features and labels and we input the data to the machine . The machine(system) analyses this data and uses algorithms to help classify the data and make predictions regarding it.
Machine Learning Algorithms :
So how does the machine know to classify the data ? Specific algorithms are used which help the machine do particular predictions.Some of the machine learning algorithms include :
- Linear Regression
- Logistic Regression
- SVM ( Support Vector Machine)
- Decision Tree
- Random Forest Classifier
Let us understand a couple of algorithms :
- Linear Regression : This algorithm is one of the basic algorithms in machine learning . It takes the data-set as input , plots the dataset on a graph and classifies the data-set by drawing a line between the dataset which help us to classify any new point on the graph. An example will make things clearer consider the previous example the distinguish whether a person is a man or a woman based on their heights and weights.
The data-set for men include : Man[height(cms),weight(kgs)] = [[170,75],[175,70],[168,90],[180,85]] .
The data-set for women include : Woman[height(cms),weight(kgs)] = [[155,60],[160,55],[158,59],[145,55]] . These entries can be plot on a graph as shown below
As we can see that Men are indicated in blue and women in Red . As these two can be easily seperated by a line , this algorithm is called Linear Regression . Suppose if we obtain a new point on the graph to predict whether that person is man or woman , we can see easily that on which side of the line the point is located .
As we can see the new Green Point is located on the left of the line so that person is classified as a Female.
2. Decision Tree : A decision tree is another type of algorithm to classify whether a particular entry or data belongs to a certain category or class . The decision tree takes input as the data-set and the makes a tree structure out of it based on the given data . As the decision tree is an actual tree the node of the tree acts like a 'test' for an attribute . And for every test there can be a solution on the right side or the left side. The leaves of the decision tree indicate the final decision taken .
As we can see from above example we can determine whether a person is fit based on age , pizza and exercise we can construct such type of decision tree for other data as well . Any complex type of data can also be classified with the help of such decision trees.
Basic Program of Machine Learning / Hello World of Machine Learning :
We will write a basic program to identify whether a particular fruit , given the weight and texture(whether smooth or bumpy) is Apple or Orange . To start with the basics , we can apply machine learning algorithms only on data having integer or float datatype . If the data contains a string then it has to be converted to integer first by assigning it a particular number and then it can be used . Consider the example below:
As we can see that the texture is in String and also the fruit type that is whether orange or apple is also in string . So we need to convert that into integer .
Let Texture-Rough = 0
Texture-Smooth = 1
And Fruit - Orange = 0
Fruit - Apple = 1 , so our new table will look like this :
So our Features will be Weight and Texture whereas Label will be the fruit type.
Now as our data is ready , we will start with program.
from sklearn import tree
This means that from the python library : Sci-kit we import 'tree' as we are using a decision tree to classify out data
features = [[100,0],[50,1],[110,0],[40,1],[120,0],[45,1],[130,0],[56,0],[130,1]]
This is an array in python which consists of data having Weight as first column and Texture as second column . It needs to be a numerical data , string is not allowed.
label = [[0],[1],[0],[1],[0],[1],[0],[1],[0]]
These are our labels which contain 0 and 1 , that is , whether a particular fruit is orange or apple'
pred = [[42,1]]
This is a prediction variable , that it we are going to train our machine on features and labels shown above and then give it some random data ([[42,1]]) that the machine has never seen which is a fruit having weight 42 gms and texture as smooth .
clf = tree.DecisionTreeClassifier()
From the above imported package 'tree' we choose Decision Tree as the classifier and we assign its value to a variable called 'clf''
clf.fit(features,label)
Now we train our machine by giving it the features and labels as input . The machine will automatically create a decision tree on it own in the background and it can thus be used to make predictions.
x=clf.predict(pred)
After we have trained our machine we need to test it by giving it something to predict , here we call the function 'predict' through our 'clf'' object , which indicates that we us Decision Tree Classifier and we pass the value of variable 'pred' that is the data from which we need to predict whether the fruit is orange or apple. We save the result in variable in 'x' . Thus if x=0 then we know that fruit is orange and is x=1 then its apple.
print(x)
We print the value of x to check our result . As we have given input as weight = 42 gms and Texture = Smooth , we know for sure that the result is going to be apple . So the value of 'x' should be '1'
Output : x = [1]
We can also take a more complex example by giving any weight value and texture value and let the machine do its prediction.
Note : the more the data the machine trains on , the better is its accuracy.
The complete code is uploaded to CodeCenter github page = https://github.com/codecenterorg/Machine_Learning/blob/master/orange.txt
Thank you
Good blog , interesting topic,more blogs expected
ReplyDeleteThanks!
DeleteThanks sir, awesome post
ReplyDeleteYour welcome🙂
Delete