Bright Academic Experts

Category: Data science

midterm
For this project, you will conduct a comprehensive descriptive analysis of a real-world dataset using Microsoft Excel. You will organize, summarize, visualize, and interpret data to tell a compelling story about what the data reveals. This project allows you to apply the skills learned in Weeks 1-6 and demonstrate your ability to extract meaningful insights from data.

Weight: 30% of final grade (300 points)

Submission: Excel workbook (.xlsx) + Written Report (PDF or Word document)

This midterm project assesses your mastery of the following course learning outcomes:
1. Organize, summarize, and visualize data using Excel to extract meaningful patterns and insights
2. Apply fundamental descriptive statistical techniques (measures of central tendency, variation, frequency distributions) to analyze real-world datasets
3. Ethically interpret and communicate data insights using appropriate visual and written forms, including consideration of data limitations and responsible use
By completing this project, you will demonstrate your ability to conduct a complete data analysis workflowfrom data cleaning and exploration through statistical analysis and professional communication of findings.

STEP 1: Select Your Dataset (Week 5)

Choose ONE dataset from the approved options below. All datasets are available for free download. You may need to create a free Kaggle account to access some datasets.

STEP 2: Create Your Excel Workbook (Weeks 5-6)

Your Excel file must include five sheets with the following content:

Sheet 1: Raw Data
- Import your original dataset exactly as downloaded
- Do NOT modify this sheet
- Label tab clearly as “Raw Data”
Sheet 2: Data Cleaning & Organization
- Copy raw data and document all cleaning steps:Handle missing values (delete, fill with average, mark as “Unknown”)
- Remove duplicates
- Create new calculated fields if needed
- Rename variables for clarity
- Filter to relevant subset if dataset is very large
- Add text boxes or comment cells explaining what you did and why
- Example: “Removed 15 rows with missing graduation rate data because this variable is essential to my analysis”
Sheet 3: Summary Statistics Create at least THREE summary tables including:
- Numerical variables: Mean, median, standard deviation, min, max, range, count
- Categorical variables: Frequency counts, percentages
- Grouped summaries: Use pivot tables or formulas to summarize by categories (e.g., average temperature by city, median tuition by public/private)
- Use Excel formulas (AVERAGE, MEDIAN, STDEV.S, COUNT, COUNTIF, SUMIF, etc.) – no manual calculations
Sheet 4: Visualizations Create at least FIVE different charts including:
- At least one histogram or bar chart (distribution/comparison)
- At least one scatterplot (relationship between variables)
- At least three different chart types total
- Every chart must have: descriptive title, axis labels with units, legend (if needed), appropriate scale, professional colors, data source note
Examples: histogram of graduation rates, bar chart of average sales by product, scatterplot of tuition vs. graduation rate with trendline, line chart of temperature trends, pie chart of public vs. private proportions

Sheet 5: Analysis Notes (Optional but Recommended)
- Document interesting findings, surprises, patterns, or questions as you work
- Helps you write your report later
STEP 3: Write Your Report (Week 6-7)

Submit a written report of 3-5 pages, double-spaced, 12-point font (Times New Roman or Arial) with these sections:

1. Introduction (0.5-1 page)
- What dataset did you choose and why?
- What is the data source? Is it credible?
- What research question(s) are you exploring? Be specific.
- What do you hope to learn?
2. Data Description (0.5-1 page)
- How many observations (rows) and variables (columns)?
- What are your key variables? Define them clearly.
- What time period or population does the data represent?
- What data cleaning steps did you perform and why?
3. Descriptive Analysis & Findings (2-3 pages) MOST IMPORTANT SECTION
- Organize by findings, NOT by methods
- Good: “Graduation Rates Vary Widely Across Institutions”
- Poor: “First I Made a Histogram”
- For each finding: state it clearly, present specific statistics as evidence, reference visualizations, interpret what it means
- Required: Discuss central tendency for 2+ variables, variation/spread, group comparisons, relationships between variables
- Reference all five visualizations from your Excel file
- Use specific numbers but explain them in context
4. Limitations & Ethical Considerations (0.5 page)
- Data limitations: What’s missing? How representative? How current?
- Interpretation cautions: What can you NOT conclude?
- Ethical issues: Privacy concerns? Potential biases? Who’s represented/missing? How might findings be misused?
5. Conclusion (0.5 page)
- Summarize 2-3 most important findings
- What “story” does your data tell?
- What questions remain unanswered?
- How might this analysis be useful in real-world decision-making?
6. References
- Cite your dataset source in APA format (with URL and access date)
- Include any additional sources consulted
- Excel WorkbookFilename: LastName_FirstName_Midterm.xlsx
- All 5 sheets properly labeled
- Formulas intact (NOT pasted as values)
- Charts professional and formatted
- Written ReportFilename: LastName_FirstName_MidtermReport.pdf or .docx
- 3-5 pages, double-spaced, 12-point font
- Proofread for grammar and clarity
- Charts embedded OR clearly referenced by figure number
- Excel WorkbookFilename: LastName_FirstName_Midterm.xlsx
- All 5 sheets properly labeled
- Formulas intact (NOT pasted as values)
- Charts professional and formatted
- Written ReportFilename: LastName_FirstName_MidtermReport.pdf or .docx
- 3-5 pages, double-spaced, 12-point font
- Proofread for grammar and clarity
- Charts embedded OR clearly referenced by figure number
February 23, 2026
Business Research Assessment

Instructions attached

Attached Files (PDF/DOCX): Assignment_ResearchMethods.pdf

Note: Content extraction from these files is restricted, please review them manually.

February 22, 2026
Applied analytic methods on a policy case: Urban Air Quality…

*Please read the attached instruction files as they are very crucial for accomplishing the assignment.

*the csv file is optional to open if you want to analyse data yourself or any of my data seems off to you.

*writing tone should not be too academic as this is a policy document, and it is for people who are not experts in this area to read in a short period of time. Making sentences simple doesn’t mean dumbing down here.

Aim

Apply, appraise and recommend a range of analytic methods in informing

decisions about a complex science, technology and public policy issue.

Objectives

Design and undertake exploratory data analysis using quantitative and/or

qualitative techniques.

Generate evidence describing the behaviour of policy systems influencing air

quality.

Evaluate how uncertainty affects your findings and propose ways to

communicate this in policy contexts.

Use data visualisation to enhance understanding and communication of data

insights.

Judge the suitability of analytic methods used for analysis.

Marking

This assignment is 50% of the module. The marking scheme is

available on school website and covers the following relevant marking criteria:

Conceptual Understanding (40%)

Reasoning & Critical Analysis (40%)

Communication, Structure & Clarity (20%)

Required Length

The word limit is 2,000 words. The penalties apply for going more than

10% above or below this limit are outlined in the MPA Handbook.

References and footnotes solely containing references are not included in

the word count.

1Project context

Imagine that London has recently joined a newly established ‘Attractive Cities’

network – a global network of the mayors of capital cities taking action to become

attractive places to both live and work. A small group of core member cities have been

asked to take the lead for developing shared research and policy action on public

policy themes for the network’s members. Yesterday the news came through that

London has been assigned as the thematic city lead on “Air Quality and Citizen

Health”.

This means that London’s Mayor now has responsibility for an entirely new policy

portfolio, as well as in essence overnight become an influential and expert global voice

on science, technology and innovation around air quality and health policy issues and

their interrelationship. The Mayor is very conscious that while London has its own

challenges and experiences with managing air quality, the real challenge at hand will

be the development of evidence-based insights into air quality improvement as a

common policy agenda, yet one with many different local realities and considerations.

As it turns out, air quality is a topic the current Mayor knows little about, nor has

previously had much policy exposure to. By background, she is a lawyer. She is,

however, a keen advocate for evidence-based approaches to policy analysis and has

often been frustrated when there has been insufficient rigour in analyses used to

design and inform policy decisions. She has informally been warned that her Analysis

team has recently had a few issues with analysis credibility. She has therefore asked

a special analysis adviser be appointed immediately to support her over the next

weeks in preparing for this new role. Your CV impressed and this is now your role.

Part 1. Exploratory data analysis [50%]

The Mayor would like to have insight into any potential interrelationships between air

quality and citizen health to best understand the agenda of the network she has joined.

She also wants to understand whether and how context matters and how any

interrelationships vary across cities. As a first step, she has asked you to undertake

some exploratory data analysis to understand, where possible:

a. What are key trends, patterns and/or anomalies in the air quality and citizen

health of cities?

b. How do the phenomena of air quality and citizen health interact or interrelate?

Are there potential causal relationships?

c. How do air quality and health differ between cities with different

characteristics (e.g., by population size, exercise levels and/or modal split)?

d. How does London compare to other cities around the world?

e. Are there any immediate issues with data uncertainties, outliers, and/or

results confidence?

2The Data

Your predecessor has left you a data file containing the latest data that was about to

be analysed to brief the Mayor before they switched roles. This file

‘aqhealthcities.csv’ is available for download under the Assignment 2 Moodle

page. Your predecessor also forwarded an annotated data dictionary/codebook as

well as a short annotated set of references. These two resources are included in

Appendix A as Tables 1 and 2 respectively.

Given the incredibly tight timeframes, the Mayor has asked you to focus your analysis

in the next two weeks on the contents of the inherited ‘aqhealthcities.csv’ data. She is

happy for you to seek out other theories, knowledges, data sets, etc. beyond this data

set if they help you with your analysis, but she has stressed she wants you to do this

proportionately – i.e., her priority is analysis of the .csv file data and she is not expecting

you to spend much time looking at other sources. Further analysis of other data

sources is welcome, but depending on your capacity may have to wait until the start

of the new year.

Q1. Produce for the mayor [~50%]:

1. Exploratory data analysis informing her interests (a.-e.) outlined

above. Explore the patterns, trends and possible observations and

inferences to make from the data. Choose a set of final summative

information points and messages you want to communicate to the mayor.

Include at least two visualisations, though feel free to use more than this

if appropriate.

2. A short opening or closing summary of key recommendations derived

from the exploratory data analysis. You may, for example, want to

highlight some of the similarities and differences between and within cities

she should be aware of whilst steering this network. Or you may have ideas

for policy action based on your analysis. Or, you may may want to raise key

issues about data, analysis and confidence you think the Mayor should be

aware of at this point. For any recommendation you make, make clear

your assessment of the quality and reliability of the data.

Communicate your work as a short analysis report in a way appropriate for quick

reference and understanding. This means you can use a mixture of prose, bulleted

text, tables, figures, headers, labels, etc. Use data visualisation effectively to

support your analytical narrative.

Part 2: Informing policy decisions [50%]

Following your initial exploratory analysis, the Mayor has asked for your advice on

some analytical and methodological questions that have emerged as she prepares to

lead the Attractive Cities network.

Q2. Interpreting Probabilistic Analysis [5%]

The Mayor is evaluating two policy interventions to improve air quality and health

outcomes in London. As she now leads the Attractive Cities network’s air quality

theme, she knows other network cities will be watching London’s choices with interest,

but her primary responsibility is to make the best decision for London. An external

consultancy has undertaken a Monte Carlo simulation comparing the two options:

Option A: Comprehensive Air Quality Monitoring Network

o Deploy high-density monitoring infrastructure across all boroughs

o Estimated 5-year cost: ?50-90M

o Provides real-time data for enforcement and public information

Option B: Clean Air Zones Expansion with Technology Fund

o Expand Ultra Low Emission Zone (ULEZ) to all Londo n boroughs

o Create innovation fund for clean transport solutions

o Estimated 5-year cost: ?30-120M

o Includes enforcement tech and electric vehicle charging infrastructure

The simulation results show probability distributions for total costs over 5 years:

Fig 1. Monte Carlo Simulation of total total cost profile for 2 policy options from the consultants report

6Based on these simulation results:

a. b. What do the distributions tell you about the relative uncertainty of each option?

Which option would you advise the Mayor to pursue for London, and why?

c. What caveats should the Mayor be aware of when interpreting these

probabilistic estimates?

Q3. Additional Analytical Advice [5%]

Before the Mayor makes her final decision between Option A and Option B for London,

what one additional piece of analysis or evidence would you recommend she

commission, and why?

Q4. Deliberative Approaches to Policy Evaluation [10%]

Beyond the immediate monitoring vs expansion decision, the Mayor faces ongoing

choices about how to allocate London’s air quality budget across multiple competing

priorities. She has ?5M available for the next financial year and must decide between

investments such as:

School street expansions (car-free zones around schools during drop-off)

Green infrastructure (green walls, urban greening for pollution absorption)

Low-emission zone enforcement technology upgrades

Public transport fare subsidies to encourage modal shift

Community air quality monitoring and engagement programmes

Support for small businesses to transition to low-emission vehicles

These priorities have different beneficiaries, different evidence bases, different time

horizons, and involve different values and trade-offs. They cannot all be funded fully.

Her team has suggested using Multi-Criteria Analysis (MCA) with an Expert

Advisory Board to evaluate these options. The Mayor has read briefings on MCA and

understands the methodology. She is also conscious that her approach to this decision

may inform how other Attractive Cities network members handle similar allocation

challenges.

Provide advice to the Mayor addressing:

a) Should she use MCA for this London budget allocation decision?

Consider: Is MCA appropriate for this type of decision? Why or why not? What

are the key strengths of MCA that make it suitable (or not) for managing this

budget allocation for London?

b) If she proceeds with MCA, identify and discuss THREE critical design

choices from the following areas:

Expert Board composition (who participates?)

Deliberative process structure (how is it organixed?)

7 Criteria selection and weighting (how are priorities determined?)

Bias management (how to mitigate biases?)

Stakeholder inclusion (which London communities represented?)

Transparency and documentation (how is process communicated?)

For each of your three chosen areas, explain the specific choice she must make and

why it matters for quality.

Note: The Mayor does not need you to explain how MCA works procedurally. She

needs practical, context-specific advice on whether and how to use it effectively for

this London budget allocation challenge.

Q5. Critical Reflexivity on Your Analytical Choices [15%]

Throughout this STEP0020 module, we have explored how analytical methods are

not neutral tools but embody particular worldviews, values, and assumptions about

what counts as knowledge and how it should be produced.

Reflecting specifically on your exploratory data analysis in Part 1, reflect on

your key methodological choices: which methods did you use for your analysis

(e.g., visualisation types, statistical approaches, ways of grouping or comparing

data); which variables did you focus on and which did you set aside; how you

defined and measured relationships between air quality and health.

a. Share a reflection on what your choices privileged and what they

obscured. You can consider:

What did your analytical approach emphasise or make visible? What

might it have obscured or marginalised?

What assumptions were embedded in your choices? (e.g., about

causality, about what’s measurable, about relationships between

variables)

Whose perspectives were centred in your analysis? What voices or

experiences were excluded?

b. Consider whose knowledge matters.

Your analysis relied on aggregate city-level data. What might be missed

or obscured when relying primarily on quantitative, aggregate data?

Given that the Attractive Cities network represents diverse cities with

different contexts, capacities, and knowledge traditions, what additional

approaches (drawing on methods from this course) might complement

your quantitative analysis to incorporate broader perspectives?

c. Reflect on your own positionality

How might your own background, training, and assumptions have shaped

what you chose to analyse and how you interpreted it?

If someone with a different background or perspective (e.g., a community

health worker, an environmental justice advocate, a city official from a

8Global South city) were to analyse this data, what might they emphasise

differently?

Use specific examples from your Part 1 analysis to ground your reflections.

Move beyond generic critiques to engage substantively with the actual analytical

choices you made and their implications for a global network on air quality & health.

Q6. Managing Deep Uncertainty in Policy Analysis [15%]

In her role as thematic lead for the Attractive Cities network, the Mayor recognises

that significant deep uncertainties affect the network’s collective approach to air

quality and health policy:

Future air quality trends are uncertain (climate change impacts on pollution,

changing mobility patterns post-pandemic, technological disruptions in

transport and energy systems)

Health impacts are uncertain (emerging evidence on pollution exposure

pathways, changing population health trends, new epidemiological

understanding)

Political and economic contexts vary enormously across network cities and

are themselves changing (policy commitment, resource availability,

governance capacity)

Policy effectiveness is uncertain (what works in one city may not work in

others, unexpected implementation challenges, behaviour change

uncertainties)

These uncertainties cannot easily be quantified probabilistically. The Mayor cannot

assign reliable probabilities to different futures, nor can she rely on historical data to

predict unprecedented changes.

Advise the Mayor on using scenarios to manage these deep uncertainties:

a. b. What are scenarios and why are they useful for this type of uncertainty?

How could scenarios help her make more robust decisions for the

network?

c. What are key limitations or challenges of scenario-based approaches?

Ground your advice in the Mayor’s specific context – leading a diverse global

network of cities facing climate, health, and technological uncertainties. Explain how

scenarios would work as a practical analytical tool, not just a conceptual framework.

Attached Files (PDF/DOCX): Sample essay for ass2.pdf, Instructions Data analysis.docx, AMP class notes.docx, Instructions Writing Guide.docx

Note: Content extraction from these files is restricted, please review them manually.

February 22, 2026
Problem Set 1

Answer the problem set with r. Upload your code as a .R file. This file should include every line of code you wrote for the assignment. For full credit, you must leave thorough comments in the code explaining what you are doing. Thorough documentation skills are important for data scientists to have, and is something employers will look for! Write your code so that it can be easily understood by someone who reads it later.

Attached Files (PDF/DOCX): ECON_4970_Problem_Set_1_S26.pdf

Note: Content extraction from these files is restricted, please review them manually.

February 21, 2026
DATA ANALYTICS Discussion Responses (Classmate Feedback)

**Must understand coding/RStudio and the data preparation phase of analytics. I am attaching my paper for reference if needed as well as the data sets related to each student’s paper for review. I only need a response to each student’s paper. This is a peer review discussion, giving positive feedback and offering suggestions. 2 – 3 full paragraphs per student should be plenty.

Assignment:

After completing the Data Understanding and Data Preparation phases for your analytic plan, post your milestone two draft to the Analytic Plan Peer Review topic in a new thread by Thursday of Module Three. (This part has already been completed).

Then, select two of your peers’ drafts to review as follow-up discussion topic posts that you should submit by Sunday of Module Three in order to give yourself time to reflect upon the peer review experience in this module’s discussion. Select drafts that have not been reviewed or drafts with the fewest reviews.

Attached Files (PDF/DOCX): Student 2.docx, Student 1.docx, DAT 690 Milestone Two.docx

Note: Content extraction from these files is restricted, please review them manually.

February 20, 2026
Data visualization and analysis using Tableau

Instructions Proof of Data Selection and Upload into Tableau (40 Points Total) In this assignment, you will demonstrate successful upload of your selected dataset into Tableau and begin to analyze the dataset’s structure and potential for visualization. Part 1: Upload and Screenshot (10 Points) Select a dataset that contains clear and measurable Key Performance Indicators (KPIs) suitable for analysis. Recommended sources include Data.gov and Kaggle, though you are welcome to use other reputable sources. Upload your chosen dataset into Tableau Desktop. Submit a screenshot clearly showing the data loaded into Tableau. This serves as proof of successful upload. If you encounter any issues during the upload (e.g., formatting problems, missing values), provide a brief written summary describing the problem and how you resolved it. Note: References are not required for this portion. Part 2: Data Visualization Planning (30 Points) In a well-written summary, respond to the following: Visualization Techniques (15 Points) What types of visualizations will you use to explore and present your data (e.g., bar charts, scatter plots, line graphs, heat maps)? Justify your choices based on the type of data and intended audience. Include at least one scholarly or professional reference to support your use of visualization techniques. Key Data Items (15 Points) Identify and describe the key data fields or variables in your dataset. Explain how these variables relate to your KPIs or business research question. Indicate any planned calculated fields, groupings, or filters you anticipate using in Tableau. Deliverables: Screenshot of data uploaded into Tableau Summary of upload issues (if applicable) Written discussion of: Visualization techniques (with references) Key data items for analysis 10/10 Upload your selected data to Tableau (Show screenshot) 15/15 Discuss visualization techniques you will use on this data? 15/15 Discuss the key data items in the data that you will use?

Attached Files (PDF/DOCX): Assignment Instructions data visualisation.docx

Note: Content extraction from these files is restricted, please review them manually.

February 19, 2026
Configuring and securing a Linux Server

Using Virtualbox download and use the most updated version of Ubuntu to configure essential server services and implement basic security measures. In the document attached, it states that either an Apache or Nginx server can be configured. I do not have a personal preference

Attached Files (PDF/DOCX): Project 2 Submission Template.docx, Project 2 Configuring and Securing a Linux Server.docx

Note: Content extraction from these files is restricted, please review them manually.

February 17, 2026
CAPSTONE Milestone 1 part 2 FLOWCHART Creation!
**ABSOLUTELY must understand and know how to use RStudio. I did 2 pages and graphic add ons just to make the pricing more fair, since there was not an option for creating a flowchart. I will provide as many helpful resources as I can, including my final project from a previous class that aligns with this capstone.

Resources (if needed):

Draw.io tutorial – https://drawio-app.com/tutorials/interactive-tutorials/

Draw.io step by step guide – https://drawio-app.com/tutorials/step-by-step-guides/

Draw.io – https://app.diagrams.net/

Flowcharts on Draw.io – https://drawio-app.com/blog/flowcharts-in-draw-io-how-to-go-with-the-flow/

Overview

Having completed the analytic plan, it is now time to map out the initial steps needed to create a clean, well-described, analytic data set from which you will eventually build your final model.

Prompt

Using PowerPoint, Draw.io, Word, or a similar flowcharting tool, create a visual diagram of the steps to be taken for examining the source data. Indicate the source(s) of the data, quality checks, and data cleaning to be performed. Additionally, include written notes explaining the flowchart and capturing your data approach.

If you have any questions after reading through the feedback on this assignment, reach out to your instructor. Remember that your instructor is a resource you should utilize throughout the course.

Make sure to include the following critical elements in your flowchart:
- Identify source for each data set
- Identify any necessary steps for importing or converting data
- Indicate steps for checking data quality, including missing or invalid data
- Indicate steps for exploring distributions of numeric variables
- Indicate steps for exploring levels of categorical variables
- Indicate any steps in which alterations to the data may be performed
What to Submit

Submit your flowchart in whichever flowcharting tool you prefer (e.g., PowerPoint, Draw.io, Word). Include the image of the flowchart along with the written notes as either a Microsoft Word document or a PDF.
Attached Files (PDF/DOCX): DAT650 Final Project.docx, CREDIT RISK CASE.docx, GE Culture and Analytics.pdf, INSTRUCTOR NOTES FLOWCHART.docx

Note: Content extraction from these files is restricted, please review them manually.
February 10, 2026
CAPSTONE – Data Analytics Milestone One
**ABSOLUTELY must know how to use RStudio and understand data analytics. I have provided several resources along with my previous final project which relates to this Capstone project.

Resources:

Data Cleaning Techniques – https://www.upgrad.com/blog/data-cleaning-techniques/

Types of Data – https://builtin.com/data-science/data-types-statistics

Overview

You are a data analyst and your manager has assigned you a project to develop a predictive model that will support a business problem and be implemented into production. You are responsible for taking the project through the phases of CRISM-DM methodology.

Prompt

In this milestone, you will write your project summary and analytic plan. The project summary will identify the business problem, state the research question being modeled, and discuss how the solution will help the business. The analytic plan will describe each CRISP-DM phase and the activities that will be performed for each step in the project. Note that your audience is your data analytic team and data analytic manager. Refer to the CRISP-DM graphic in this weeks Module Overview for clarification of the phases.

If you have any questions after reading through the feedback on this milestone, reach out to your instructor. Remember that your instructor is a resource you should utilize throughout the course.

While you may reflect on your prior coursework, your submission must consist only of DAT 690 coursework to avoid self-plagiarism. Make sure to include the following critical elements in your paper:
- Describe the CRISP-DM Business Understanding Phase: Identify the business problem
- Describe the CRISP-DM Business Understanding Phase: State the research question
- Describe the CRISP-DM Business Understanding Phase: Discuss how the solution will help the business
- Describe the CRISP-DM Data Understanding Phase: describe, explore, and verify the data
- Describe the CRISP-DM Data Preparation Phase: select, clean, construct, and integrate the data
- Describe the CRISP-DM Modeling Phase: select, generate, build, and assess the model
- Describe the CRISP-DM Evaluation Phase: evaluate the results, review the process, and determine next steps
- Describe the CRISP-DM Deployment Phase: how the model will work in production
- Clear Communication rows: Submission has no major errors related to citations, grammar, spelling, syntax, or organization
What to Submit

Your paper must be submitted as a two- to three-page Microsoft Word document with double spacing, 12-point Times New Roman font, and one-inch margins. Be sure to cite any sources in APA format.
February 10, 2026
Touchstone 6
A. Analysis of TechGear Inc.

Step 1: Read the Scenario

SCENARIO: As a data analyst at TechGear Inc., a company specializing in electronic gadgets and accessories, your task is to analyze historical sales data, build predictive models, and use prescriptive analytical methods to provide actionable insights for improving decision-making. The company has been experiencing fluctuating sales and aims to optimize its marketing strategies and production processes to maximize profits and enhance customer satisfaction. Your analysis will help TechGear Inc. understand the factors influencing its sales, forecast future sales trends, assess financial risks associated with different business scenarios, and determine the optimal allocation of its marketing budget and production resources. Ultimately, your work will enable the company to make data-driven decisions, enhancing its sales and marketing strategies, and leading to improved profitability and customer satisfaction.

Step 2: Look Over the Data
- Questions 1-5 (Linear Regression) and 7 (Machine Learning): Use the data in the techgear_sales_data.xlsx Excel file, which can be found at the following GitHub link:
- Question 6 (Forecasting): Use the data in the techgear_sales_data_monthly.xlsx Excel file, which is available at this GitHub link:
- This file contains the same data as techgear_sales_data.xlsx, but the last row only includes a date with missing values for all other columns. These missing values are intended for you to apply forecasting methods for the upcoming time period.
- Questions 8 and 9: Since Question 8 focuses on Monte Carlo simulations and Question 9 focuses on linear programming, all necessary data is provided in the problem statement.
This dataset contains monthly sales and advertising spend data for TechGear from January 2020 to December 2024. It includes the following columns:

Column NameDescriptionUnit/FormatDateThe month and year for each data entryMM/DD/YYYYSalesThe total sales generated in that monthNumber of SalesAd_Spend_FacebookThe amount of money spent on Facebook advertising in that monthDollarsAd_Spend_InstagramThe amount of money spent on Instagram advertising in that monthDollarsDiscount_RateThe discount rate applied to sales in that monthPercentage

A snapshot of the first few rows of the dataset is provided below:

Step 3: Read TechGear Inc. Questions

Question 1: Exploring Data Structures and Averages in Advertising Spend and Discounts

Before conducting an analysis, use Python to create a pandas DataFrame named sales from the dataset.
- What key features of the dataset can you summarize, such as the number of rows and columns?
- What is the average amount spent on advertising for each social media platform (Facebook and Instagram)?
- What is the average discount provided to customers?
- What insights can you draw from this summary regarding advertising spend and discount trends?
Question 2: Visualizing Relationships
- How can you visualize the relationships between sales and each advertising spend variable (Facebook and Instagram) as well as discount rates?
- What types of plots (e.g., scatter plots, line plots, or histograms) would be most effective in identifying patterns or correlations between these variables?
- What do these visualizations reveal about the impact of advertising spend and discount rates on sales?
Question 3: Simple Linear RegressionTechGear wants to optimize its marketing strategy.
- How can you develop a simple linear regression model in Python to predict sales based on Facebook ad spend?
- What do the coefficients of the model indicate?
- Specifically, how does the slope describe the relationship between Facebook ad spend and sales?
- What does the R2 value tell you about how well the model explains the variability in sales?
- How does the regression output from Python support your interpretation of the models performance?
Question 4: Assessing the Fit of the Simple Linear Regression Model
- How can you evaluate the performance of your simple linear regression model by analyzing residuals?
- What insights do residual plots provide about the models accuracy?
- Do they suggest any patterns, heteroscedasticity, or violations of linear regression assumptions?
- How might these findings impact the reliability of the models predictions?
Question 5: Multiple Linear Regression ModelThe simple linear regression model provides insights into Facebook ad spend.
- How can you develop a multiple linear regression model to predict monthly sales using Facebook ad spend, Instagram ad spend, and discount rates?
- How do the coefficients of this model compare to the simple linear regression model? What do they reveal about the combined influence of these factors on sales?
- Which model performs better in predicting sales?
- How can you compare the effectiveness using statistical metrics (such as R2 and RMSE)?
- Based on this comparison, what recommendations can you provide to TechGear for optimizing its advertising strategy?
Question 6: ForecastingUsing historical sales data, how can you construct:
- A 3-month moving average forecast for January 2025?
- An exponential smoothing forecast with a smoothing parameter of 0.80 for January 2025?
Given TechGears preference for emphasizing recent sales trends:
- Which forecasting method provides the most reliable prediction for January 2025?
- What key differences exist between the two forecasting methods, and what do they imply for forecasting accuracy?
Based on your analysis, consider:
- What actionable recommendations can you provide to TechGear to improve its marketing strategies and production planning?
Question 7: TechGear needs a reliable model to predict future sales.
- How can you build and compare different predictive models to achieve this?
- How can you develop a multiple linear regression model using 5-fold cross-validation to predict future sales?
- How can you develop a decision tree model using 5-fold cross-validation to predict future sales?
- How do the two models compare in terms of RMSE, and which model should TechGear choose?
TechGear requires a minimum of $6,500 in sales each month to remain profitable.
- If the best model predicts sales of $4,200, how can the RMSE value be used to determine the range within which actual sales may fall?
- What are the implications of this for decision-making and risk assessment?
Question 8: Monte Carlo SimulationsTechGear has experienced significant fluctuations in sales, making accurate predictions challenging.
- How can you use Monte Carlo simulations to estimate future sales?
- How can you estimate the average and median monthly sales by running 1,000 simulations?
- What visuals (e.g., histograms or box plots) can you generate to summarize the results?
- If daily sales are assumed to follow a uniform distribution between the minimum and maximum observed sales over the past 60 months, how does this impact the simulation results? You can assume that the value for minimum sales observed over 60 months is 2,299 and the maximum value is 7,702.
- How can you interpret the standard deviation of simulated sales, and what does it reveal about TechGears sales variability?
- How can TechGear use these insights to improve budgeting, sales forecasting, and operational decision-making?
Question 9: Linear Programing

TechGear wants to optimize its advertising spend across Facebook and Instagram to maximize its monthly sales. They have a fixed advertising budget and need to determine the optimal allocation of this budget to achieve the highest possible sales. The sales generated from advertising on each platform are influenced by the amount spent on that platform.

TechGear has a monthly advertising budget of $10,000. The estimated sales generated from advertising on Facebook and Instagram are given by the following linear equations:
- Sales from Facebook advertising: where F is the amount spent on Facebook advertising
- Sales from Instagram advertising: where I is the amount spent on Instagram advertising
TechGear has a monthly advertising budget of $10,000. They must spend at least $2,000 on Facebook advertising to maintain its presence on the platform. Additionally, they must spend a minimum of $1,000 and no more than $7,000 on Instagram advertising due to platform-specific constraints. The amount spent on Instagram advertising should be at least 50% of the amount spent on Facebook advertising to ensure balanced marketing efforts.
- What is the optimal budget allocation for Facebook and Instagram, and what is the maximum sales revenue TechGear can achieve under these conditions?
Step 4: Using the PowerPoint Template, Analyze Data for TechGear Inc.
- Your task is to analyze historical sales data for TechGear Inc. using various analytical techniques.
- Youll apply concepts from linear regression, forecasting, machine learning, and prescriptive analytics.
- The goal is to provide actionable insights to help TechGear make data-driven decisions.
- Include Python code snippets in your slides for data exploration, regression models, forecasting, machine learning, Monte Carlo simulation, and linear programming tasks.
- Your Python code should be accurate and well-documented to demonstrate how each analysis step was performed.
- Your findings will be presented in a PowerPoint presentation, with speaker notes explaining your approach and insights.
Review each question and then follow the directions outlined on each slide to summarize and present your findings for each question.

Step 5: Review the Grading Rubric to Ensure All Criteria are Met

Review the rubric to ensure that you understand how you will be evaluated. Also review the requirements to ensure that your Touchstone is complete.

Step 6: Submit Your Touchstone

Submit your completed Touchstone (as a .pptx file) using the blue button at the top of this page.

B. Rubric

Advanced (100%)Proficient (85%)Acceptable (75%)Needs Improvement (50%)Non-Performance (0%)Python Analysis (Shown at Key Steps)

The inclusion of well-documented, accurate Python code for data exploration, regression models, forecasting, machine learning, Monte Carlo simulation, and linear programming. (5%)

Python code is shown for all major steps, including data exploration, visualization, regression models, forecasting, machine learning, Monte Carlo simulation, and linear programming. Code is well-documented and accurate.Python code is shown for most key steps. Minor issues with code documentation or accuracy.Python code is shown for some steps, but critical components are missing or incomplete.Python code is partially shown but lacks key analyses or is significantly incorrect.No Python code is provided.Data Exploration and Summary (Slide 2)

Clear summary of data structure, accurate calculation of averages, and key insights from data exploration. Python analysis is included and well-integrated. (10%)

There is a comprehensive summary of data structure with accurate calculation of averages and clear insights from the exploration. Python analysis is included and well-integrated.Data summary is mostly accurate, with minor errors or missing insights. Python analysis is included.Basic summary provided, but some key features are missing or inaccurate. Python analysis is incomplete.Minimal data exploration with several inaccuracies and no significant insights. Python analysis is missing or incorrect.No data exploration is provided.Visualizing Relationships (Slide 3)

Accurate and clear visualizations showing relationships between sales, ad spend, and discount rate. Proper interpretation of patterns and correlations. (10%)

Clear and accurate visualizations for all specified variables with detailed insights into patterns and correlations. Python-generated plots are used.Visualizations are mostly accurate and provide useful insights. Minor errors in interpretation or plot generation.Basic visualizations are provided, but significant patterns or correlations are overlooked. Python plots are incomplete.Visualizations are unclear or inaccurate with limited analysis. Missing Python plots.No visualizations are provided.Simple Linear Regression & Model Fit (Slides 4 & 5)

Well-implemented regression model with correct interpretation of coefficients and R2 value. Assessment of model fit through residual analysis. (10%)

Accurate regression model with clear interpretation of coefficients and R2 value. Residual plots are well-explained, and the fit is thoroughly assessed. Python output included.Regression model and assessment are mostly accurate, with minor errors or incomplete explanations.Basic model output provided, but interpretations and model fit assessments are incomplete or contain errors.Model is poorly developed, with incorrect interpretations and no reliable assessment of fit.No regression model or assessment is provided.Multiple Linear Regression (Slide 6)

Complete multiple regression analysis, including variable interpretation and comparison to simple regression. Python output included. (10%)

Complete and accurate multiple linear regression analysis, with well-explained coefficients and comparison to the simple linear regression model. Python output included.Multiple regression analysis is mostly accurate, with minor errors or incomplete comparisons.Basic multiple regression is provided, but interpretations and comparisons are incomplete or partially inaccurate.Incomplete or incorrect multiple regression model with minimal explanation.No multiple regression model is provided.Forecasting (Slide 7)

Implementation of both forecasting methods, clear comparison, and justified selection of the best method based on business needs. (10%)

Both forecasting methods are accurately implemented and compared. The recommendation is well-justified and aligned with TechGears preferences. Python output included.Forecasting analysis is mostly accurate, with minor errors or incomplete justification of the chosen method.Basic forecasting analysis is provided, but one method may be missing, or justification is unclear.Minimal forecasting analysis with significant errors and no clear recommendation.No forecasting analysis is provided.Machine Learning Models (Slide 8)

Accurate implementation of multiple regression and decision tree models with RMSE comparison and well-supported model selection. (10%)

Both models are accurately built and compared using RMSE. Clear model recommendation with actionable insights. Python output included.Machine learning analysis is mostly accurate, with minor errors in the comparison or recommendation.Basic models are provided, but the comparison and recommendation are incomplete or unclear.Models are incomplete or contain major errors. Limited or no comparison is provided.No machine learning analysis is provided.Monte Carlo Simulations (Slide 9)

Simulation correctly executed with proper assumptions, visualizations, and interpretation of results. Actionable insights are provided. (10%)

Simulation is well-executed with clear visualizations and interpretation of results. Actionable insights are provided. Python output included.Simulation is mostly accurate, with minor errors or incomplete insights.Basic simulation is provided, but interpretation is incomplete or unclear.Simulation is incomplete or incorrect with minimal explanation.No simulation is provided.Linear Programing (Slide 10)

Accurate optimization model that meets constraints and clearly explains the best budget allocation for maximum sales. (10%)

Linear programming solution is accurate and fully meets all constraints. Clear explanation of the optimal budget allocation and maximum achievable sales. Python output included.Solution is mostly accurate, with minor errors in constraints or explanation.Basic linear programming solution is provided but contains errors or incomplete explanations.Incomplete or incorrect solution with minimal explanation.No linear programming solution is provided.Presentation Quality & Speaker Notes

Well-organized slides with readable formatting and professional layout. Speaker notes effectively explain analysis and insights. (15%)

Slides are visually appealing and well-organized, with clear speaker notes that thoroughly explain the analysis and findings.Slides are mostly clear and organized. Speaker notes are informative but may lack detail.Basic slides with limited visual appeal. Speaker notes are incomplete or too brief.Poorly organized slides with missing or unclear speaker notes.No presentation or speaker notes provided.

C. Requirements

The following requirements must be met for your submission:
- Hand in a .pptx file with slides listed above.
- Use a readable 11- or 12-point font.
- All writing must be appropriate for an academic context. Follow academic writing conventions (correct grammar, spelling, punctuation, and formatting).
- Plagiarism of any kind is strictly prohibited.
- Submission must include your name and the date (included in the template).
This assignment provides a practical experience in business analytics, honing skills essential for data-driven decision-making in business environments. Your analysis and recommendations will help TechGear optimize its operations,

Good luck, and enjoy uncovering insights for TechGear!
February 8, 2026