R Programming Logistic/Linear Regression – 50 points (LO3)
Completion requirements
For this assignment you will write an R program to complete the tasks given below. You will hand in two files for this assignment.
- A File with your R program. This file should contain only the code (no output) and must have the typical r extension. No other file extensions will be accepted. The reason is that the assignment be graded based on your R code and not the output file. The output file will be used to verify the code commands. Also, please make sure that all comments, discussion, and conclusions regarding results are also annotated as part of your code.
- A PDF/DOC file with your output code. We are giving you more flexibility regarding how you want to present your output (tables, plots, etc.). You can either use RMD files that combine code, narrative txt, and plots or you can use word document with copy and paste from the R platform you are using. However, please remember that all output (tables, plots, comments, conclusions, etc.) shown in this file has to be generated by the same R code that you submit. This is important! Output shown that is generated using a separate code or output shown that is not supported by the submitted code will not be graded. Screenshots will not be accepted.
- Use the following file
Note: The HMEQ_Scrubbed.csv file is a simple scrubbed file from the previous week homework. If you did more advanced scrubbing of data for last week, you may use your own data file instead. You might get better accuracy! If you decide to use your own version of HMEQ_Scrubbed.csv, please hand it in along with the other deliverables.
This assignment is an extension of the Week 5 assignment. We will now incorporate Regression Analysis to the problem.
Step 1: Use the Decision Tree / Random Forest / Decision Tree code from Week 5 as a Starting Point
In this assignment, we will build off the models developed in Week 5. Now we will add Regression to the models.
Step 2: Classification Models
- Using the code discussed in the lecture, split the data into training and testing data sets.
- Do not use TARGET_LOSS_AMT to predict TARGET_BAD_FLAG.
- Create a LOGISTIC REGRESSION model using ALL the variables to predict the variable TARGET_BAD_FLAG
- Create a LOGISTIC REGRESSION model and using BACKWARD VARIABLE SELECTION.
- Create a LOGISTIC REGRESSION model and using a DECISION TREE and FORWARD STEPWISE SELECTION.
- List the important variables from the Logistic Regression Variable Selections.
- Compare the variables from the logistic Regression with those of the Random Forest and the Gradient Boosting.
- Using the testing data set, create a ROC curves for all models. They must all be on the same plot.
- Display the Area Under the ROC curve (AUC) for all models.
- Determine which model performed best and why you believe this.
- Write a brief summary of which model you would recommend using. Note that this is your opinion. There is no right answer. You might, for example, select a less accurate model because it is faster or easier to interpret.
Step 3: Linear Regression
- Using the code discussed in the lecture, split the data into training and testing data sets.
- Do not use TARGET_BAD_FLAG to predict TARGET_LOSS_AMT.
- Create a LINEAR REGRESSION model using ALL the variables to predict the variable TARGET_BAD_AMT
- Create a LINEAR REGRESSION model and using BACKWARD VARIABLE SELECTION.
- Create a LINEAR REGRESSION model and using a DECISION TREE and FORWARD STEPWISE SELECTION.
- List the important variables from the Linear Regression Variable Selections.
- Compare the variables from the Linear Regression with those of the Random Forest and the Gradient Boosting.
- Using the testing data set, calculate the Root Mean Square Error (RMSE) for all models.
- Determine which model performed best and why you believe this.
- Write a brief summary of which model you would recommend using. Note that this is your opinion. There is no right answer. You might, for example, select a less accurate model because it is faster or easier to interpret.
Step 4: Probability / Severity Model (Push Yourself!)
- Using the code discussed in the lecture, split the data into training and testing data sets.
- Use any LOGISTIC model from Step 2 in order to predict the variable TARGET_BAD_FLAG
- Use a LINEAR REGRESSION model to predict the variable TARGET_LOSS_AMT using only records where TARGET_BAD_FLAG is 1.
- List the important variables for both models.
- Using your models, predict the probability of default and the loss given default.
- Multiply the two values together for each record.
- Calculate the RMSE value for the Probability / Severity model.
- Comment on how this model compares to using the model from Step 3. Which one would your recommend using?
Essential Activities:
- Watch all the training videos
- Execute the example code while watching the training videos.
Notes:
- This assignment is due Sunday at 11:59 PM EST
|
February 4 2026, 3:13 PM |
Leave a Reply
You must be logged in to post a comment.