Principal Component Analysis
TASK
Your task is to conduct PCA on the numeric features of the MLData2026 dataset,
demonstrating the process and interpreting the results in a video presentation.
The due date for this assessment is Friday of Week 7 on 17 April 2026 before midnight.
Perform PCA and Visualise Data
(i) Download the Assignment 2.ipynb file, along with MLData2026.csv from the
Assignment 1 folder on Canvas. To help you begin your assignment, the Assignment 2
Google Colab file contains some starter code (the same as for Assignment 1) to
1. Mount your drive
2. Import the relevant packages. You may need to add more packages as your
assignment progresses.
3. Upload the MLData2026.csv to your Google Colab storage folder, then read and
convert the data into a dataFrame. Consider this as the Master data set.
4. Randomly select 600 sub-samples from the Master data set. Make sure to use
your Student ID to set the random set. This means that every student should
have their own unique set of sub-samples, i.e. mydata, to work on.
You are required to perform PCA on the relevant features in mydata using Python on
Google Colab and report your findings.
(ii) Extract only the numeric features and the Class feature from mydata and store them
in a dataFrame. Refer to Assignment 1 for the feature description if needed.
(iii)Clean the extracted data based on the feedback received from Assignment 1.
(iv) Remove the incomplete cases, e.g. using dropna(), to make it usable for PCA.
(v) Perform PCA in Python using Google Colab, but only on the numeric features (i.e.
ignore Class in this step).
– Explain why you believe the data should or should not be scaled, i.e.
standardised, when performing PCA.
– Display and describe the individual and cumulative proportions of variance (3
decimal places) explained by each of the principal components.
– Outline how many principal components are adequate to explain at least 50%
of the variability in your data.
Assignment 2 – Video Presentation (35%)
2 | P a g e
ECU Internal Information
– Display and interpret the coefficients (or loadings) to 3 decimal places for PC1,
PC2 and PC3. Describe which features (based on the loadings) are the key
drivers for each of these three principal components.
(vi) Create and display the biplot for PC1 vs. PC2 to visualise the PCA results in the first
two dimensions. Colour-code the points based on the Class feature. Explain the biplot
by commenting on the PCA plot and the loadings plot individually, and then both
plots combined (see Slides 2829 of Module 3 notes). Finally, comment on and justify
which (if any) features can help distinguish malicious activity.
(vii) Based on the results from parts (v) and (vi), describe
– whether PC1 or PC2 (choose one) best assists in classifying malicious activity.
Hint: Project all points in the PCA plot onto the PC1 axis (i.e. consider the PC1
scores only) and assess whether there is a clear separation between malicious
and non-malicious samples. Then, project onto the PC2 axis (i.e. consider the
PC2 scores only) and evaluate whether the separation is better than in PC1.
– the key features in this dimension that can drive this process (Hint: based on
your decision above, examine the loadings from part (v) of your chosen PC and
choose those whose absolute loading (i.e. disregard the sign) is greater than
0.3).
Video Presentation Checklist
1. In your video presentation, you must
a. Run your code in Google Colab corresponding to parts (i) to (vii)
b. Display the relevant output on screen
c. Interpret the output in response to parts (i) to (vii)
2. Your video presentation must include a camera shot of yourself in the video
capture, unless there is an exceptional reason and is supported by a Learning
Assessment Plan (LAP). 20% is automatically deducted from your final mark if
this is not included in your video presentation. If you choose to record with
another application, you must make sure that this feature is included.
3. Your video presentation must be between 6-8 minutes long.
4. Submit the recording via the Panopto link on Canvas. Please ensure you follow
the instructions carefully.
5. Submit your Python code (Google Colab) using a separate submission link.
Assignment 2 – Video Presentation (35%)
3 | P a g e
ECU Internal Information
Marking Rubrics
Criteria Fail 2
<30%
Fail 1
30-49%
Pass
50-59%
Credit
60-69%
Distinction
70-79%
High Distinction
80-100%
Working Code
(7%)
Code is substantially
incomplete, does not
run or contains
major flaws,
preventing any
meaningful PCA
analysis.
Documentation is
minimal or absent
Code does not run
properly or contains
major flaws, preventing
meaningful PCA
analysis.
Documentation is
minimal
Code has significant
errors or omissions
that affect PCA output.
Poor documentation
and some redundancy.
Code has a few errors
and/or does not fully
achieve intended PCA
and relevant analyses.
Documentation is
present but could be
improved.
Code runs with minor
issues but still
performs PCA and
relevant tasks
correctly. Minimal
redundancy and good
documentation.
Code runs flawlessly,
correctly performs
PCA and relevant
tasks, and produces
meaningful outputs.
No errors, redundant
code, or inefficiencies.
Interpretation of
results (18%)
Fails to interpret the
PCA results
meaningfully or
demonstrate any
understanding of
PCA, and draws
incorrect
conclusions.
Fails to interpret the
PCA results
meaningfully for the
most part or provides
incorrect conclusions.
Interpretation is vague,
lacks depth, and/or
has major inaccuracies
or errors.
Provides a basic
interpretation with
some inaccuracies or
missing key insights.
Provides a strong and
mostly accurate
interpretation of PCA
results with minor
omissions or
inaccuracies.
Provides an in-depth,
clear, and accurate
interpretation of PCA
results, including the
significance of
principal components
and key loadings.
Justifies conclusions
with evidence.
Presentation
skills (7%)
The presentation is
unclear and does
not engage the
audience. The
presenter makes
little to no attempt at
expression, and the
pace and tone
require substantial
improvement.
The presentation is
unclear. The presenter
made an attempt at
expression, but the
pace and tone need
improvement to better
engage the audience.
The presentation lacks
structure. Presenter
made a good attempt,
but the expression,
pace, and tone could
be improved.
The presentation is
understandable and
delivered at a good
pace. However, there
is minimal confidence
in the presentation
style.
Clear and structured
presentation with minor
pacing or engagement
issues. Presenter was
fluent and displayed
good confidence.
The presenter was
dynamic, natural, and
persuasive, with an
appropriate tone.
Delivery was clear,
confident, and well-
structured, with
effective pacing and
engagement that
maintained a high
level of confidence
throughout.
Timing (3%) Presentation is less
than 3 minutes or
more than 12
minutes.
Presentation is
between 3 and 4
minutes, or between 11
and 12 minutes
Presentation is
between 4 and 5
minutes, or between
10 and 11 minutes
Presentation is
between 5 and 6
minutes, or between 9
and 10 minutes
Presentation is
between 8 and 9
minutes
Presentation is
between 6 and 8
minutes
Assignment 2 – Video Presentation (35%)
4 | P a g e
ECU Internal Information
Academic Misconduct
Edith Cowan University regards academic misconduct of any form as unacceptable.
Academic misconduct, which includes but is not limited to, plagiarism; unauthorised
collaboration; cheating in examinations; theft of other students work; collusion;
inadequate and incorrect referencing; will be dealt with in accordance with the ECU Rule
40 Academic Misconduct (including Plagiarism) Policy. Ensure that you are familiar with
the Academic Misconduct Rules.
Assignment Extensions
Instructions to apply for extensions are available on the ECU Online Extension Request
and Tracking System to formally lodge your assignment extension request. The link is
also available on Canvas in the Assignment section.
Normal work commitments, family commitments and extra-curricular activities are
NOT accepted as grounds for granting you an extension of time because you are
expected to plan ahead for your assessment due dates.
Where the assignment is submitted not more than 7 days late, the penalty shall, for each
day that it is late, be 5% of the maximum assessment available for the assignment.
Where the assignment is more than 7 days late, a mark of zero shall be awarded.
Leave a Reply
You must be logged in to post a comment.