Principal Component Analysis

Principal Component Analysis

TASK

Your task is to conduct PCA on the numeric features of the MLData2026 dataset,

demonstrating the process and interpreting the results in a video presentation.

The due date for this assessment is Friday of Week 7 on 17 April 2026 before midnight.

Perform PCA and Visualise Data

(i) Download the Assignment 2.ipynb file, along with MLData2026.csv from the

Assignment 1 folder on Canvas. To help you begin your assignment, the Assignment 2

Google Colab file contains some starter code (the same as for Assignment 1) to

1. Mount your drive

2. Import the relevant packages. You may need to add more packages as your

assignment progresses.

3. Upload the MLData2026.csv to your Google Colab storage folder, then read and

convert the data into a dataFrame. Consider this as the Master data set.

4. Randomly select 600 sub-samples from the Master data set. Make sure to use

your Student ID to set the random set. This means that every student should

have their own unique set of sub-samples, i.e. mydata, to work on.

You are required to perform PCA on the relevant features in mydata using Python on

Google Colab and report your findings.

(ii) Extract only the numeric features and the Class feature from mydata and store them

in a dataFrame. Refer to Assignment 1 for the feature description if needed.

(iii)Clean the extracted data based on the feedback received from Assignment 1.

(iv) Remove the incomplete cases, e.g. using dropna(), to make it usable for PCA.

(v) Perform PCA in Python using Google Colab, but only on the numeric features (i.e.

ignore Class in this step).

– Explain why you believe the data should or should not be scaled, i.e.

standardised, when performing PCA.

– Display and describe the individual and cumulative proportions of variance (3

decimal places) explained by each of the principal components.

– Outline how many principal components are adequate to explain at least 50%

of the variability in your data.

Assignment 2 – Video Presentation (35%)

2 | P a g e

ECU Internal Information

– Display and interpret the coefficients (or loadings) to 3 decimal places for PC1,

PC2 and PC3. Describe which features (based on the loadings) are the key

drivers for each of these three principal components.

(vi) Create and display the biplot for PC1 vs. PC2 to visualise the PCA results in the first

two dimensions. Colour-code the points based on the Class feature. Explain the biplot

by commenting on the PCA plot and the loadings plot individually, and then both

plots combined (see Slides 2829 of Module 3 notes). Finally, comment on and justify

which (if any) features can help distinguish malicious activity.

(vii) Based on the results from parts (v) and (vi), describe

– whether PC1 or PC2 (choose one) best assists in classifying malicious activity.

Hint: Project all points in the PCA plot onto the PC1 axis (i.e. consider the PC1

scores only) and assess whether there is a clear separation between malicious

and non-malicious samples. Then, project onto the PC2 axis (i.e. consider the

PC2 scores only) and evaluate whether the separation is better than in PC1.

– the key features in this dimension that can drive this process (Hint: based on

your decision above, examine the loadings from part (v) of your chosen PC and

choose those whose absolute loading (i.e. disregard the sign) is greater than

0.3).

Video Presentation Checklist

1. In your video presentation, you must

a. Run your code in Google Colab corresponding to parts (i) to (vii)

b. Display the relevant output on screen

c. Interpret the output in response to parts (i) to (vii)

2. Your video presentation must include a camera shot of yourself in the video

capture, unless there is an exceptional reason and is supported by a Learning

Assessment Plan (LAP). 20% is automatically deducted from your final mark if

this is not included in your video presentation. If you choose to record with

another application, you must make sure that this feature is included.

3. Your video presentation must be between 6-8 minutes long.

4. Submit the recording via the Panopto link on Canvas. Please ensure you follow

the instructions carefully.

5. Submit your Python code (Google Colab) using a separate submission link.

Assignment 2 – Video Presentation (35%)

3 | P a g e

ECU Internal Information

Marking Rubrics

Criteria Fail 2

<30%

Fail 1

30-49%

Pass

50-59%

Credit

60-69%

Distinction

70-79%

High Distinction

80-100%

Working Code

(7%)

Code is substantially

incomplete, does not

run or contains

major flaws,

preventing any

meaningful PCA

analysis.

Documentation is

minimal or absent

Code does not run

properly or contains

major flaws, preventing

meaningful PCA

analysis.

Documentation is

minimal

Code has significant

errors or omissions

that affect PCA output.

Poor documentation

and some redundancy.

Code has a few errors

and/or does not fully

achieve intended PCA

and relevant analyses.

Documentation is

present but could be

improved.

Code runs with minor

issues but still

performs PCA and

relevant tasks

correctly. Minimal

redundancy and good

documentation.

Code runs flawlessly,

correctly performs

PCA and relevant

tasks, and produces

meaningful outputs.

No errors, redundant

code, or inefficiencies.

Interpretation of

results (18%)

Fails to interpret the

PCA results

meaningfully or

demonstrate any

understanding of

PCA, and draws

incorrect

conclusions.

Fails to interpret the

PCA results

meaningfully for the

most part or provides

incorrect conclusions.

Interpretation is vague,

lacks depth, and/or

has major inaccuracies

or errors.

Provides a basic

interpretation with

some inaccuracies or

missing key insights.

Provides a strong and

mostly accurate

interpretation of PCA

results with minor

omissions or

inaccuracies.

Provides an in-depth,

clear, and accurate

interpretation of PCA

results, including the

significance of

principal components

and key loadings.

Justifies conclusions

with evidence.

Presentation

skills (7%)

The presentation is

unclear and does

not engage the

audience. The

presenter makes

little to no attempt at

expression, and the

pace and tone

require substantial

improvement.

The presentation is

unclear. The presenter

made an attempt at

expression, but the

pace and tone need

improvement to better

engage the audience.

The presentation lacks

structure. Presenter

made a good attempt,

but the expression,

pace, and tone could

be improved.

The presentation is

understandable and

delivered at a good

pace. However, there

is minimal confidence

in the presentation

style.

Clear and structured

presentation with minor

pacing or engagement

issues. Presenter was

fluent and displayed

good confidence.

The presenter was

dynamic, natural, and

persuasive, with an

appropriate tone.

Delivery was clear,

confident, and well-

structured, with

effective pacing and

engagement that

maintained a high

level of confidence

throughout.

Timing (3%) Presentation is less

than 3 minutes or

more than 12

minutes.

Presentation is

between 3 and 4

minutes, or between 11

and 12 minutes

Presentation is

between 4 and 5

minutes, or between

10 and 11 minutes

Presentation is

between 5 and 6

minutes, or between 9

and 10 minutes

Presentation is

between 8 and 9

minutes

Presentation is

between 6 and 8

minutes

Assignment 2 – Video Presentation (35%)

4 | P a g e

ECU Internal Information

Academic Misconduct

Edith Cowan University regards academic misconduct of any form as unacceptable.

Academic misconduct, which includes but is not limited to, plagiarism; unauthorised

collaboration; cheating in examinations; theft of other students work; collusion;

inadequate and incorrect referencing; will be dealt with in accordance with the ECU Rule

40 Academic Misconduct (including Plagiarism) Policy. Ensure that you are familiar with

the Academic Misconduct Rules.

Assignment Extensions

Instructions to apply for extensions are available on the ECU Online Extension Request

and Tracking System to formally lodge your assignment extension request. The link is

also available on Canvas in the Assignment section.

Normal work commitments, family commitments and extra-curricular activities are

NOT accepted as grounds for granting you an extension of time because you are

expected to plan ahead for your assessment due dates.

Where the assignment is submitted not more than 7 days late, the penalty shall, for each

day that it is late, be 5% of the maximum assessment available for the assignment.

Where the assignment is more than 7 days late, a mark of zero shall be awarded.

WRITE MY PAPER

Comments

Leave a Reply