Credit Card Fraud Detection Programming Task
Order ID:89JHGSJE83839 Style:APA/MLA/Harvard/Chicago Pages:5-10 Instructions:
Credit Card Fraud Detection Programming Task
As discussed in Lesson 27, can use a supervised or unsupervised algorithm to detect credit card fraud. This assignment is for you to try it yourself if you are interested. Also, It can replace your lowest quiz score if the % score of this assignment is higher than that.
Step 0: Clean Data
OBTAIN DATA FROM KAGGLE (LINKS TO AN EXTERNAL SITE.). REMOVE ALL DUPLICATES. FOR THE ATTRIBUTE “CLASS”, CHANGE 0 TO -1 SO THAT -1 REPRESENTS NORMAL AND +1 REPRESENTS FRAUD. AFTER THIS, THE RESULTING DATASET SHOULD CONTAIN 473 FRAUD AND 283253 NORMAL TRANSACTIONS.
Step 1: Scale Time & Amount
All other features were PCA transformed, except for Time & Amount. However, the ranges of these 2 features differ a lot. Therefore, the data under ‘Time’ and ‘Amount’ need to be scaled so that none of the features will weigh in a lot more. Choose an appropriate scaler (can refer to Compare the effect of different scalers on data with outliers (Links to an external site.)) to scale.
Step 2: Re-sample Data
THE DATA SET IS EXTREMELY UNBALANCED: ONLY 0.17% OF DATA ENTRIES ARE FRAUD TRANSACTIONS, WHICH IS EXPECTED SINCE FRAUD SHOULD BE ABNORMAL. PARTITION DATA SO THAT 20% OF DATA IS THE TESTING DATA AND 80% OF DATA IS THE TRAINING DATA (MAKE SURE BOTH PARTS HAVE FRAUD TRANSACTIONS!). OPTIONALLY, YOU CAN APPLY 5-FOLD CROSS-VALIDATION (CAN REFER TO SKLEARN.MODEL_SELECTION.KFOLD (LINKS TO AN EXTERNAL SITE.)). FOR THE TRAINING DATA ONLY, CHOOSE APPROPRIATE RESAMPLING TECHNIQUE(S) (CAN REFER TO UNDER-SAMPLING METHODS (LINKS TO AN EXTERNAL SITE.) AND OVER-SAMPLING METHODS (LINKS TO AN EXTERNAL SITE.)) TO RESAMPLE THE TRAINING DATA. DO NOT RESAMPLE TESTING DATA.
Step 3: Train Model
Pick a model that can be used on this problem (can refer to Credit Card + EDA + (25+) Models For Beginners (Links to an external site.)). Train the model using the resampled training data.
Step 4: Analyze Result
USE YOUR TESTING DATA TO SCORE YOUR MODEL. AT LEAST COMPUTE THE ACCURACY, PRECISION & RECALL. FEEL FREE TO EXPLORE MORE SCORING OPTIONS (LINKS TO AN EXTERNAL SITE.).
Finally, choose 1 thing to adjust, and compare the results. That is, keep everything except…
Scaler in step 1, to check the effect of different scalers (or the same scaler with different parameters);
Or resampling method in step 2, to check the effect of different resampling methods (or the same method with different parameters);
Or the model in step 3, to check the effect of different models (or the same model with different parameters).
Adjusting & testing one of them is enough. If you do more, you may get some extra credits. Also, try to explain the reason for the different results you see.
Resources to Use
You probably want to write code in Python since those pre-written machine learning packages (scikit-learn (Links to an external site.), imbalanced-learn, etc (Links to an external site.).) are written in Python. A Jupyter notebook should be enough for you to write and run your code
RUBRIC
Excellent Quality
95-100%
Introduction 45-41 points
The background and significance of the problem and a clear statement of the research purpose is provided. The search history is mentioned.
Literature Support
91-84 points
The background and significance of the problem and a clear statement of the research purpose is provided. The search history is mentioned.
Methodology
58-53 points
Content is well-organized with headings for each slide and bulleted lists to group related material as needed. Use of font, color, graphics, effects, etc. to enhance readability and presentation content is excellent. Length requirements of 10 slides/pages or less is met.
Average Score
50-85%
40-38 points
More depth/detail for the background and significance is needed, or the research detail is not clear. No search history information is provided.
83-76 points
Review of relevant theoretical literature is evident, but there is little integration of studies into concepts related to problem. Review is partially focused and organized. Supporting and opposing research are included. Summary of information presented is included. Conclusion may not contain a biblical integration.
52-49 points
Content is somewhat organized, but no structure is apparent. The use of font, color, graphics, effects, etc. is occasionally detracting to the presentation content. Length requirements may not be met.
Poor Quality
0-45%
37-1 points
The background and/or significance are missing. No search history information is provided.
75-1 points
Review of relevant theoretical literature is evident, but there is no integration of studies into concepts related to problem. Review is partially focused and organized. Supporting and opposing research are not included in the summary of information presented. Conclusion does not contain a biblical integration.
48-1 points
There is no clear or logical organizational structure. No logical sequence is apparent. The use of font, color, graphics, effects etc. is often detracting to the presentation content. Length requirements may not be met
You Can Also Place the Order at www.collegepaper.us/orders/ordernow or www.crucialessay.com/orders/ordernow Credit Card Fraud Detection Programming Task
Credit Card Fraud Detection Programming Task