Hi, Coders, In this blog let us see how we can create our own Machine Learning model to find if the credit card transaction is legit or not. We have used jupyter notebook for building this model. You can download it by clicking this link.
Dataset:
you can download the dataset used for this project, by clicking this link.
Code: Importing all the necessary Libraries
import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt
from sklearn.metrics import accuracy_score
Code: Loading the Data
df = pd.read_csv("creditcard.csv")
Use the pandas library to load the data from csv file.
Code: Understanding the Data
df.head(5)
Output:
Code: Describing the Data
print(df.shape)
df.describe()
Output:
Code: Imbalance in the data
df["Class"].value_counts()
Output:
0 284315
1 492
Name: Class, dtype: int64
0 -> Legit
1 -> Fraud
the data set has 284315 legit transactions and 492 fraudulent transactions. Only 0.17% of fraudulent transactions out of all the transactions. The data is heavily unbalanced.
Code: Balancing in the data
Let us separate the dataset into legit and fraud data, and then we will randomly select 492 rows of legit transactions data and concatenate these new legit data and fraud data.
legit = df[df["Class"]==0]
fraud = df[df["Class"]==1]
legit_sample = legit.sample(n=492)
new_data = pd.concat([legit_sample,fraud],axis=0)
new_data.head(5)
Output:
Code: Separating the X and the Y values
Dividing the data into inputs parameters and outputs value format.
x = new_data.drop("Class",axis=1)
y = new_data["Class"]
Training and Testing Data Bifurcation
We will be dividing the dataset into two main groups. One for training the model and the other for Testing our trained model’s performance.
x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2,stratify=y,random_state=2)
Building the Model:
For building this model, we will be using logistic regression for finding the outcome.
model = LogisticRegression()
model.fit(x_train,y_train)
Output:
LogisticRegression()
Accuracy from Training Dataset
accuracy_score(model.predict(x_train),y_train)
Output:
0.9237611181702668
Accuracy from Testing Dataset
accuracy_score(model.predict(x_test),y_test)
Output:
0.9238578680203046
both test and train accuracy are almost similar, therefore our model prediction is good. there is no overfitting or underfitting of data in the model.
Predicting Transactions:
model.predict(x_test)
Hi Coders!, If you have any kind of doubt feel free to ask us.
ReplyDelete