Python Question

Please note: this is an individual assignment.

Assignment Description

This week’s assignment involves a feature engineering challenge comprising three distinct components. Your task is to develop a unified and reproducible notebook that addresses all three parts.

Steps to Follow

Follow these steps to carry out the assignment:

Part 1: Using the train dataset, you must create at least three new variables:

  • a non-linear mathematical transformation
  • a business-knowledge, creative new variable
  • a target encoding transformation

Then, you must be able to apply it to replicate that feature engineering to the test dataset. You can reuse your work from Activity 5 (Creating New Variables) if it meets the requirements.

Part 2: Include your newly engineered features in your ML model and use variable importance to check whether they are helping you to predict better. Include a small paragraph in the notebook comments with your conclusion.

Part 3: Take your top four variables by importance and compare them to your business understanding. (For example, if in my model I find Number of bathrooms to be important, I could say: Based on my (real estate) business knowledge, it makes sense that the more bathrooms a house has, the more expensive it is.

I recommend including lots of comments since you will be able to reuse most of this for your course final project. And remember to make it replicable: every time you (or anyone else: a professor, a colleague, etc.) run your notebook, you must get the same results.

Submission

The assignment must be delivered on a Python notebook.

Rubric

Assignment A2.1 Rubric

Criteria

Ratings

Pts

This criterion is linked to a Learning OutcomeData Understanding and Preparation

Understanding the dataset and preparing it for feature engineering

20 ptsFull PointsThe student demonstrates a clear understanding of the dataset and prepares it correctly for feature engineering.

10 ptsPartial PointsThe student demonstrates some understanding of the dataset and makes some preparation for feature engineering.

0 ptsNo PointsThe student does not demonstrate an understanding of the dataset or does not prepare it for feature engineering.

20 pts

This criterion is linked to a Learning OutcomeFeature Engineering – Part 1

Creating at least three new variables (a non-linear mathematical transformation, a business-knowledge creative new variable, and a target encoding transformation)

20 ptsFull PointsThe student correctly creates all three new variables and explains their reasoning in the comments.

10 ptsPartial PointsThe student creates some of the new variables, but their reasoning is not clear or there are minor errors.

0 ptsNo PointsThe student does not create the new variables or does not provide reasoning in the comments.

20 pts

This criterion is linked to a Learning OutcomeFeature Engineering – Part 2

Including the newly engineered features in the ML model and using variable importance to check their effectiveness

20 ptsFull PointsThe student correctly includes the new features in the model, checks their importance, and provides a clear conclusion in the comments.

10 ptsPartial PointsThe student includes the new features and checks their importance, but their conclusion is not clear or there are minor errors.

0 ptsNo PointsThe student does not include the new features in the model or does not check their importance.

20 pts

This criterion is linked to a Learning OutcomeModel Interpretation

Taking the top four variables by importance and comparing them to business understanding

20 ptsFull PointsThe student correctly identifies the top four variables, provides a clear comparison to their business understanding, and explains their reasoning in the comments.

10 ptsPartial PointsThe student identifies the top four variables and provides some comparison to their business understanding, but their reasoning is not clear or there are minor errors.

0 ptsNo PointsThe student does not identify the top four variables or does not provide a comparison to their business understanding.

20 pts

This criterion is linked to a Learning OutcomeReplicability and Documentation

Making the notebook replicable and including lots of comments

20 ptsFull PointsThe notebook is fully replicable, and the student includes clear and helpful comments throughout.

10 ptsPartial PointsThe notebook is somewhat replicable, and the student includes some comments, but they are not always clear or helpful.

0 ptsNo PointsThe notebook is not replicable, or the student does not include comments.

20 pts

Total Points: 100

WRITE MY PAPER

Comments

Leave a Reply