Please note: this is an individual assignment.
Assignment Description
This week’s assignment involves a feature engineering challenge comprising three distinct components. Your task is to develop a unified and reproducible notebook that addresses all three parts.
Steps to Follow
Follow these steps to carry out the assignment:
Part 1: Using the train dataset, you must create at least three new variables:
- a non-linear mathematical transformation
- a business-knowledge, creative new variable
- a target encoding transformation
Then, you must be able to apply it to replicate that feature engineering to the test dataset. You can reuse your work from Activity 5 (Creating New Variables) if it meets the requirements.
Part 2: Include your newly engineered features in your ML model and use variable importance to check whether they are helping you to predict better. Include a small paragraph in the notebook comments with your conclusion.
Part 3: Take your top four variables by importance and compare them to your business understanding. (For example, if in my model I find Number of bathrooms to be important, I could say: Based on my (real estate) business knowledge, it makes sense that the more bathrooms a house has, the more expensive it is.
I recommend including lots of comments since you will be able to reuse most of this for your course final project. And remember to make it replicable: every time you (or anyone else: a professor, a colleague, etc.) run your notebook, you must get the same results.
Submission
The assignment must be delivered on a Python notebook.
Rubric
Assignment A2.1 Rubric
|
Criteria |
Ratings |
Pts |
|---|---|---|
|
This criterion is linked to a Learning OutcomeData Understanding and Preparation Understanding the dataset and preparing it for feature engineering |
20 ptsFull PointsThe student demonstrates a clear understanding of the dataset and prepares it correctly for feature engineering. 10 ptsPartial PointsThe student demonstrates some understanding of the dataset and makes some preparation for feature engineering. 0 ptsNo PointsThe student does not demonstrate an understanding of the dataset or does not prepare it for feature engineering. |
20 pts |
|
This criterion is linked to a Learning OutcomeFeature Engineering – Part 1 Creating at least three new variables (a non-linear mathematical transformation, a business-knowledge creative new variable, and a target encoding transformation) |
20 ptsFull PointsThe student correctly creates all three new variables and explains their reasoning in the comments. 10 ptsPartial PointsThe student creates some of the new variables, but their reasoning is not clear or there are minor errors. 0 ptsNo PointsThe student does not create the new variables or does not provide reasoning in the comments. |
20 pts |
|
This criterion is linked to a Learning OutcomeFeature Engineering – Part 2 Including the newly engineered features in the ML model and using variable importance to check their effectiveness |
20 ptsFull PointsThe student correctly includes the new features in the model, checks their importance, and provides a clear conclusion in the comments. 10 ptsPartial PointsThe student includes the new features and checks their importance, but their conclusion is not clear or there are minor errors. 0 ptsNo PointsThe student does not include the new features in the model or does not check their importance. |
20 pts |
|
This criterion is linked to a Learning OutcomeModel Interpretation Taking the top four variables by importance and comparing them to business understanding |
20 ptsFull PointsThe student correctly identifies the top four variables, provides a clear comparison to their business understanding, and explains their reasoning in the comments. 10 ptsPartial PointsThe student identifies the top four variables and provides some comparison to their business understanding, but their reasoning is not clear or there are minor errors. 0 ptsNo PointsThe student does not identify the top four variables or does not provide a comparison to their business understanding. |
20 pts |
|
This criterion is linked to a Learning OutcomeReplicability and Documentation Making the notebook replicable and including lots of comments |
20 ptsFull PointsThe notebook is fully replicable, and the student includes clear and helpful comments throughout. 10 ptsPartial PointsThe notebook is somewhat replicable, and the student includes some comments, but they are not always clear or helpful. 0 ptsNo PointsThe notebook is not replicable, or the student does not include comments. |
20 pts |
Total Points: 100
Leave a Reply
You must be logged in to post a comment.