Python Question

Assignment Description

Please note: this is an individual assignment.

For the final course project, students should create a single replicable notebook building a ML model for predicting house prices in the test dataset located in the file Ames_test.csv by training the model in the train dataset that can be found in the file Ames_train.csv. In order to earn points, students must complete the tasks (listed below) and also explain it in the comments. You can reuse your own work from the activities and the Week 2 assignment.

Steps to Follow

Follow these steps to complete the assignment:

1. Train-test separation and any other subset needed (encoding, validation)

2. Variable transformation in both train and test:

  • Apply one-hot encoding to at least one variable
  • Create a new variable from scratch and explain why you think it can be useful
  • Apply target encoding to at least one variable
  • Correctly apply one of the algorithms seen in the course

3. Choosing two parameters (at least) to tune:

  • Explain the intuition: How is the algorithm affected by the parameter being higher or lower?
  • Explain how you chose the final values for that parameter
  • Explain the error metric used for performance assessment and the values you obtained in the different subsets
  • Apply your model to the test dataset and measure your error

4. Identify the top five most important variables in your model and provide your own interpretation:

  • Why are they important, and how do they impact your prediction? (Is it intuitive or counterintuitive? Is the predicted value higher or lower when this variables value increases?)

5. If possible, add some external data to solve this problem: what other factors not included in this dataset can help to predict our target?

Submission

The assignment must be delivered in a Python notebook.

Rubric

Assignment A3.1 Rubric

Criteria

Ratings

Pts

This criterion is linked to a Learning OutcomeData Preparation

Train-test separation and any other subset needed (encoding, validation).

20 ptsFull PointsThe student correctly separates the data into training and testing sets and any other subsets needed. The process is clearly explained in the comments.

10 ptsPartial PointsThe student separates the data, but the process is not clearly explained or there are minor errors.

0 ptsNo PointsThe student does not separate the data correctly or at all.

20 pts

This criterion is linked to a Learning OutcomeVariable Transformation

Apply one-hot encoding to at least one variable.

20 ptsFull PointsThe student correctly applies one-hot encoding to at least one variable and explains the process clearly in the comments.

10 ptsPartial PointsThe student applies one-hot encoding, but the process is not clearly explained or there are minor errors.

0 ptsNo PointsThe student does not apply one-hot encoding correctly or at all.

20 pts

This criterion is linked to a Learning OutcomeAlgorithm Application

Choose at least,two parameters to tune.

20 ptsFull PointsThe student correctly chooses at least two parameters to tune and explains the intuition behind their choice and how the algorithm is affected by the parameter being higher or lower.

10 ptsPartial PointsThe student chooses parameters to tune, but the explanation is not clear or there are minor errors.

0 ptsNo PointsThe student does not choose parameters to tune or does not provide an explanation.

20 pts

This criterion is linked to a Learning OutcomeModel Evaluation

Explain the error metric used for performance assessment and the values you obtained in the different subsets.

20 ptsFull PointsThe student correctly explains the error metric used, provides the values obtained in the different subsets, and applies the model to the test dataset.

10 ptsPartial PointsThe student explains the error metric and provides the values, but there are minor errors or the explanation is not clear.

0 ptsNo PointsThe student does not explain the error metric or does not provide the values.

20 pts

This criterion is linked to a Learning OutcomeInterpretation and Improvement

Identify the top five most important variables in your model and provide your own interpretation of why those are important and how they impact your prediction.

20 ptsFull PointsThe student correctly identifies the top five important variables and provides a clear interpretation of why they are important and how they impact the prediction.

10 ptsPartial PointsThe student identifies the important variables and provides an interpretation, but there are minor errors or the explanation is not clear.

0 ptsNo PointsThe student does not identify the important variables or does not provide an interpretation.

20 pts

Total Points: 100

WRITE MY PAPER

Comments

Leave a Reply