Category: Python

Why Python is great?
1. Very beginner-friendly

Python reads almost like English, so its easy to learn compared to many other languages.

2. Extremely versatile
You can use it for:
- Web development
- Game development
- Automation
- Data science / AI (this is where it really shines)
3. Huge community
If you get stuck, there are tons of tutorials, forums, and libraries to help you.

4. Powerful libraries
Tools like numpy, pandas, and tensorflow make complex tasks much easier.

Python is amazing for learning and building real projects fast.
If your goal is:
- beginner programming perfect
- automation / scripts perfect
- AI / data one of the best
But if youre aiming for:
- high-performance systems
- mobile apps
  you might eventually need other languages too.
March 29, 2026
Python Question

Rewrite to Lower Similarity
Fix the References
Clean Up the Code
Proofread for Typos and Clarify Sampling

The code is 850 words long and report is 1375 words.
the changes is very critical and will upload after assigning

Task

Download the Assignment 1.ipynb file, along with MLData2026.csv from the Assignment

1 folder on Canvas. To help you begin your assignment, the Assignment 1 Google Colab

file contains some starter code to

1. Mount your drive

2. Import the relevant packages. You may need to add more packages as your

assignment progresses.

3. Upload the MLData2026.csv to your Google Colab storage folder, then read and

convert the data into a dataFrame. Consider this as the Master data set.

4. Randomly select 600 sub-samples from the Master data set. Make sure to use

your Student ID to set the random set. This means that every student should

have their own unique set of sub-samples, i.e. mydata, to work on.

You are required to perform basic descriptive analysis on the relevant features in mydata

in Python on Google Colab and report your findings.

Exploratory Data Analysis and Data Cleaning

(i)

For each categorical variable, determine the frequency N and percentage (%)

of instances in each category and summarise the results in a table as follows.

You do not need to recreate the table in Python; your code only needs to

generate the statistics required to populate it. You may export or copy the

values to Microsoft Excel and format the table as shown in the next page. State

all percentages to 1 decimal places. 4 | P a g e

ECU Internal Information

Categorical

Feature

Category

N (%)

Feature 1

Category 1

10 (10.0%)

Category 2

30 (30.0%)

Category 3

50 (50.0%)

Missing

10 (10.0%)

Feature 2

Yes

75 (75.0%)

No

25 (25.0%)

Missing

0 (0.0%)

Feature k

Category 1

25 (25.0%)

Category 2

25 (25.0%)

Category 3

15 (15.0%)

Category 4

30 (30.0%)

Missing

5 (5.0%)

(ii) Summarise each of your numeric variables in a table as follows. State all decimal

values to 1 decimal place.

Continuous

Feature

N (%)

missing

Min

Max

Mean

Median Skewness

Feature 1

Feature2

.

.

.

.

.

.

.

Feature k

N (%) missing = Number and percentage of missing values

Note: The tables for parts (i) and (ii) should be based on the original sub-sample

of 600 uncleaned observations.

(iii)Examine the value in the tables in parts (i) and (ii). Are there any invalid

categories/values for the categorical variables? If so, how will you deal with them

and why? Is there any evidence of outliers for any of the numeric variables? If so, how

many and what percentage are there and how will you deal with them? Justify your

decision in the treatment of outliers (if any).

Note: You may use plots/graphs to further support your observations/decisions.

5 | P a g e

ECU Internal Information

What to Submit

1. A single report (standard margins, minimum required font size is 11, not

exceeding 4 pages, does not include cover page, contents page and reference page,

if there is any) containing:

a. Two summary tables of all the feature in the dataset

b. A list of data issues (if any) with appropriate actions

2. A copy of your Python code as a Google Colab notebook AND in pdf format.

The report must be submitted through TURNITINand checked for originality. The Google

Colab file is to be submitted via a separately Canvas submission link.

Note that no marks will be given if the results you have provided cannot be confirmed by

your code. Any use of generative AI must be acknowledged and used responsibly and

ethically.

Marking Criteria

Criterion

Contribution to

assignment mark

Correct implementation of descriptive analysis in Python (Google

Colab)
Working code
Good documentation/commentary
External sources referenced in APA 7 referencing style (if

applicable)

Acknowledgement of use of Gen AI (if applicable)

Note: At least 80% of the code must aligned with unit content.

Otherwise, a mark of zero will be awarded for this component.

Tabulation of descriptive statistics

Properly formatted tables (NO direct screenshots from the

output in Google Colab)

Features are correctly placed in the appropriate table

Tables are populated with the correct statistics

Tables are appropriate captioned and referenced in-text
Relevant decimal values are rounded to the correct

number of decimal places

Correct explanation and justification in the identification and

treatment of missing and/or invalid observations in the data

Justifications should be initially based on the values in the

tables You may use plot/graphs to further support your

observations and/or decisions. Screenshots of graphs are

acceptable

Provide appropriate actions to treat problematic values
Spelling and grammatical errors should be kept to a

minimum.

7%6 | P a g e

ECU Internal Information

Relevant sources referenced in APA 7 referencing style (

March 23, 2026

Project Description & Submission Guidelines

Project Overview

This project is an individual project and involves cleaning a dataset and creating visualizations using Python. You will select a dataset from Kaggle.com or another reputable source, download it, and use it as the foundation for your project.

The primary objective of this assignment is to enhance your skills in data cleaning and visualization using Python. Through this process, you will gain hands-on experience in handling real-world datasets, identifying and resolving data inconsistencies, and effectively presenting insights through visualizations.

Minimum Requirements for Submission

To successfully complete this project, you must submit the following:

Project Summary Document (13 paragraphs)
- Provide a clear and concise summary of your data selection, cleaning process, and visualization techniques.
- Start by mentioning the name and description of the dataset you chose.
- Highlight challenges encountered (e.g., missing values, data inconsistencies) and explain how you resolved them.
- Summarize the visualization techniques used and how they contribute to understanding the data.
Google Colab Notebook
- Submit a link to your Google Colab notebook, ensuring that it is accessible.
- Include clear and detailed comments (# comments) within your code to explain your thought process and methodologies.
CSV Files
- Upload all CSV files that are used within your code.
- Ensure that the original dataset (as downloaded) and any cleaned versions are included.
Project Explanation Video (57 minutes)
- Record a 57 minute video explaining your project.
- Walk through your dataset, data cleaning process, key challenges, and visualizations.
- Provide a brief code walkthrough, highlighting important parts of your notebook.

Presentation Requirements

No in-class presentation is required for this project.

Submission Instructions

Organize your project files in a single folder labeled as:
MIS315_YourLastName_YourFirstName_Project1
Include all necessary components:
- Google Colab Notebook (with comments)
- CSV files (original & processed data)
- Project Summary Document
- Visualization Images (if applicable)
- 57 minute project explanation video
Compress the folder into a ZIP file.
Upload the zipped folder to Canvas under the designated project submission section.

IMPORTANT NOTE:

Ensure your folder is named correctly before compressing it.
Submissions that do not follow the correct naming format will NOT be graded.

This project is an opportunity to demonstrate your ability to clean, analyze, and visualize data effectively using Python. Be sure to follow all submission guidelines carefully to receive full credit.

1,000 Points Possible

Category: Python

Why Python is great?

Python Question

Python Question

Difference between deep copy and shallow copy

MIS315-01-Spr2026 Project 1 _ Data Cleaning and Visualizatio…

Project Description & Submission Guidelines

Project Overview

Minimum Requirements for Submission

Presentation Requirements

Submission Instructions

IMPORTANT NOTE:

Array based

PYTHON HANDWRITTEN NOTES

write a python program of Pyramid

Solve the problems

Text Preprocessing and N-Gram Language Modeling using Python