Credit Card Fraud Detection Project

Credit Card Fraud Detection Project

Hi, Coders, In this blog let us see how we can create our own Machine Learning model to find if the credit card transaction is legit or not. We have used jupyter notebook for building this model. You can download it by clicking this link.


Dataset:

you can download the dataset used for this project, by clicking this link.

Code: Importing all the necessary Libraries

import numpy as np
import pandas as pd
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import matplotlib.pyplot as plt 
from sklearn.metrics import accuracy_score

Code: Loading the Data

df = pd.read_csv("creditcard.csv")

Use the pandas library to load the data from csv file.

Code: Understanding the Data

df.head(5)

Output:









Code: Describing the Data

print(df.shape)
df.describe()

Output:












Code: Imbalance in the data

df["Class"].value_counts()

Output:

0    284315
1       492
Name: Class, dtype: int64

0 -> Legit 

1 -> Fraud

the data set has 284315 legit transactions and 492 fraudulent transactions. Only 0.17% of fraudulent transactions out of all the transactions. The data is heavily unbalanced.

Code: Balancing in the data

Let us separate the dataset into legit and fraud data, and then we will randomly select 492 rows of legit transactions data and concatenate these new legit data and fraud data.

legit = df[df["Class"]==0]
fraud = df[df["Class"]==1]
legit_sample = legit.sample(n=492)
new_data = pd.concat([legit_sample,fraud],axis=0)
new_data.head(5)

Output:









Code: Separating the X and the Y values

Dividing the data into inputs parameters and outputs value format.

x = new_data.drop("Class",axis=1)
y = new_data["Class"]

Training and Testing Data Bifurcation

We will be dividing the dataset into two main groups. One for training the model and the other for Testing our trained model’s performance.

x_train,x_test,y_train,y_test = train_test_split(x,y,test_size=0.2,stratify=y,random_state=2)

Building the Model:

For building this model, we will be using logistic regression for finding the outcome.

model = LogisticRegression()
model.fit(x_train,y_train)

Output:

LogisticRegression()

Accuracy from Training Dataset

accuracy_score(model.predict(x_train),y_train)

Output:

0.9237611181702668

Accuracy from Testing Dataset

accuracy_score(model.predict(x_test),y_test)

Output:

0.9238578680203046

both test and train accuracy are almost similar, therefore our model prediction is good. there is no overfitting or underfitting of data in the model.

Predicting Transactions:

model.predict(x_test)
teamcoderadmin

1 Comments

  1. Hi Coders!, If you have any kind of doubt feel free to ask us.

    ReplyDelete
Previous Post Next Post