Category: Statistics

  • Binomial Probability Distribution, Normal Curves, and Sampli…

    MODULE 4: THE BINOMIAL PROBABILITY DISTRIBUTION, NORMAL CURVES, AND SAMPLING DISTRIBUTIONS ASSIGNMENT Questions are taken directly from Brase, Brase, Dolor, and Seibert Chapters 5 and 6, pages 225 and 301, respectively. In case an eBook page number differs, the questions are listed below: Chapter 5: 1. Discuss what we mean by a binomial experiment. As you can see, a binomial process or binomial experiment involves a lot of assumptions! For example, all the trials are supposed to be independent and repeated under identical conditions. Is this always true? Can we always be completely certain that the probability of success does not change from one trial to the next? In the real world, there is almost nothing we can be absolutely sure about, so the theoretical assumptions of the binomial probability distribution often will not be completely satisfied. Does that mean we cannot use the binomial distribution to solve practical problems? Looking at this chapter, the answer seems to be that we can indeed use the binomial distribution even if not all the assumptions are exactly met. We find in practice that the conclusions are sufficiently accurate for our intended application. List three applications of the binomial distribution for which you think, although some of the assumptions are not exactly met, there is still adequate reason to apply the binomial distribution. 2. Why do we need to learn the formula for the binomial probability distribution? Using the formula repeatedly can be very tedious. To cut down on tedious calculations, most people will use a binomial table such as the one found in Appendix II of this book. a. However, there are many applications for which a table in the back of any book is not adequate. For instance, compute P(r = 3) where n = 5 and p = 0.735. Can you find the result in the table? Do the calculation by using the formula. List some other situations in which a table might not be adequate to solve a particular binomial distribution problem. b. The formula itself also has limitations. For instance, consider the difficulty of computing P(r285) where n = 500 and p. 0.6. What are some of the difficulties you run into? Consider the calculation of P(r = 285). You will be raising c. 0.6 and 0.4 to very high powers; this will give you very, very small numbers. Then, you need to compute C500,285, which is a very, very large number. When combining extremely large and extremely small numbers in the same calculation, most accuracy is lost unless you carry a huge number of significant digits. If this isnt tedious enough, consider the steps you need to compute P(r285) = P(r = 285) + P(r = 286) + … + P(r285)? Does it seem clear that we need a better way to estimate P(r285)? In Chapter 6, you will learn a much better way to estimate binomial probabilities when the number of trials is large. 3. In Chapter 3, we learned about means and standard deviations. In Section 5.1, we learned that probability distributions also can have a mean and standard deviation. Discuss what is meant by the expected value and standard deviation of a binomial distribution. How does this relate back to the material we learned in Chapter 3 and Section 5.1? 4. In Chapter 2, we looked at the shapes of distributions. Review the concepts of skewness and symmetry, then categorize the following distributions as to skewness or symmetry: a. A binomial distribution with n = 11 trials and p = 0.50. b. A binomial distribution with n = 11 trials and p = 0.10. c. A binomial distribution with n = 11 trials and p = 0.90. In general, does it seem true that binomial probability distributions in which the probability of success is close to 0 are skewed right, whereas those with probability of success close to 1 are skewed left? Chapter 6: 1. If you look up the word empirical in a dictionary, you will find that it means relying on experiment and observation rather than on theory. Discuss the empirical rule in this context. The empirical rule certainly applies to the normal distribution, but does it also apply to a wide variety of other distributions that are not exactly (theoretically) normal? Discuss the terms mound-shaped and symmetric. Draw several sketches of distributions that are mound-shaped and symmetric. Draw sketches of distributions that are not mound-shaped or symmetric. To which distributions will the empirical rule apply? 2. Why are standard z values so important? Is it true that z values have no units of measurement? Why would this be desirable for comparing data sets with different units of measurement? How can we assess differences in quality or performance by simply comparing z values under a standard normal curve? Examine the formula for computing standard z values. Notice that it involves both the mean and the standard deviation. Recall that in Chapter 3, we commented that the mean of a data collection is not entirely adequate to describe the data; you need the standard deviation as well. Discuss this topic again in light of what you now know about normal distributions and standard z values. 3. Most companies that manufacture a product have a division responsible for quality control or quality assurance. The purpose of the quality-control division is to make reasonably certain that the products manufactured are up to company standards. Write a brief essay in which you describe how the statistics you have learned so far could be applied to an industrial application (such as control charts and the Antlers Lodge example). 4. Most people would agree that increased information should give better predictions. Discuss how sampling distributions actually enable better predictions by providing more information. Examine Theorem 6.1 again. Suppose that x is a random variable with a normal distribution. Then x the sample mean based on random samples of size n, also will have a normal distribution for any value of n = 1, 2, 3… What happens to the standard deviation of the x distribution as n (the sample size) increases? Consider the following table for different values of n. Due to space, see the book to complete this question. 5. In a way, the central limit theorem can be thought of as a kind of grand central station. It is a connecting hub or center for a great deal of statistical work. We will use it extensively in Chapters 7, 8, and 9. Put in a very elementary way, the central limit theorem states that as the sample size n increases, the distribution of the sample mean x will always approach a normal distribution, no matter where the original x variable came from. For most people, it is the complete generality of the central limit theorem that is so awe-inspiring: It applies to practically everything. List and discuss at least three variables from everyday life for which you expect the variable x itself not to follow a normal or bell-shaped distribution. Then, discuss what would happen to the sampling distribution x if the sample size were increased. Sketch diagrams of the x distributions as the sample size n increases. General Instructions As doctoral students, your assignments are expected to follow the principles of high-quality scientific standards and promote knowledge and understanding in the field of criminal justice. You should apply a rigorous and critical assessment of a body of theory and empirical research, articulating what is known about the phenomenon and ways to advance research about the topic under review. Research syntheses should identify significant variables, a systematic and reproducible search strategy, and a clear framework for studies included in the larger analysis. Assignments may be written in first person (I). All assignments should be clearly and concisely written, with technical material set off. Please do not use jargon, slang, idioms, colloquialisms, or bureaucratese. Use acronyms sparingly and spell them out the first time you use them. Please do not construct acronyms from phrases you repeat frequently in the text. Structure of Assignment Paper For purposes of this assignment, there is no layout structure required as far as the setup of this paper, with one exception. I would appreciate it if you used separate headers for question 1 and question 2. Sub-headers are also allowed but not required. Questions should strive to be no less than 250 words each with no maximum limit. I expect all papers to be in the latest APA edition, properly cited, and all tables attached. Note: Your assignment will be checked for originality via the Turnitin plagiarism tool.

    Attached Files (PDF/DOCX): The Binomial Probability Distribution Normal Curves.docx

    Note: Content extraction from these files is restricted, please review them manually.

  • Statistics Question

    Attached.

    Requirements:

  • Statistics Question

    The purpose of this assignment is to:

    • Create and interpret a scatterplot
    • Calculate and interpret the line of best fit (linear regression)
    • Evaluate the relationship between two variables

    Requirements: however needed

  • Statistics Question

    Requirements: follow instructions

  • Project 5

    Please submit your assignment on Canvas.

    Eastern vs. Western States Spending on K-12 Education in 2014

    Using the data below from the US Census Bureau in 2014, compare how much the states East of the Mississippi spend per year on their K-12 students with how much states West of the Mississippi spend on their K-12 students. Make sure to create a numerical display (make a picture!), describe each distribution (in terms of shape, center and spread) and answer the question: who spends more?

    Data in .

    Data in .

    Eastern State

    $ per Student

    Western State

    $ per Student

    Alabama

    9,028

    Alaska

    18,416

    Connecticut

    17,745

    Arizona

    7,528

    Delaware

    13,938

    Arkansas

    9,616

    D.C.

    18,485

    California

    9,595

    Florida

    8,755

    Colorado

    8,985

    Georgia

    9,202

    Hawaii

    12,485

    Illinois

    13,077

    Idaho

    6,621

    Indiana

    9,548

    Iowa

    10,688

    Kentucky

    9,312

    Kansas

    9,972

    Maine

    12,707

    Louisiana

    10,749

    Maryland

    14,003

    Minnesota

    11,464

    Massachusetts

    15,087

    Missouri

    9,875

    Michigan

    11,110

    Montana

    11,017

    Mississippi

    8,263

    Nebraska

    11,726

    New Hampshire

    14,335

    Nevada

    8,414

    New Jersey

    17,907

    New Mexico

    9,734

    New York

    20,610

    North Dakota

    12,358

    North Carolina

    8,512

    Oklahoma

    7,829

    Ohio

    11,354

    Oregon

    9,945

    Pennsylvania

    13,961

    South Dakota

    8,881

    Rhode Island

    14,767

    Texas

    8,593

    South Carolina

    9,732

    Utah

    6,500

    Tennessee

    8,630

    Washington

    10,202

    Vermont

    16,988

    Wyoming

    15,797

    Virginia

    10,973

    West Virginia

    11,260

    Wisconsin

    11,186

    Data based on US Census Bureau. Source:

    1. Develop a thesis statement: You can complete the statement below or develop one of your own. You can also change your thesis statement later if your observations about the data warrant it.
    • Example: The Eastern States spend more on K-12 students than the Western States.
    1. Include a description of the data that you are using. What is source of the data? What is the the variable of interest?
    2. Include graphs of both Eastern and Western States distributions with clear and accurate descriptions of each. Make observations about shape, center, spread and outliers (if there are any), for both the eastern states and the western states. Here is of these sort of observations. Include any other observations that will be useful in supporting your thesis.
    3. Write the analysis: Write one or more paragraphs that use your observations to analyze whether or not your thesis may be true.

    Use the rubric to make sure that you include all the required parts of this project. Please ask if you have any questions.

    How-To Video:

  • Height data analysis

    Begin with the 10 heights provided to you by your instructor. To add 10 more values to your lab data, survey or measure 10 people to find their heights. Add these heights to the Week 5 Lab Template along with the 10 provided by your instructor. Determine the mean and standard deviation for the 20 values (your instructor’s 10 from Step 1A above plus your 10) by using the Week 3 Excel Spreadsheet. Post a screenshot of the portion of the spreadsheet that helped you determine these values. How does your height compare to the mean (average) height of the 20 values? Is your height taller, shorter, or the same as the mean of the sample? Give some background information on the group of people you used in your study. Use the following questions to guide your answers. How did you choose the participants for your study? What was the sampling method: systematic, convenience, cluster, stratified, simple random? What part of the country did your study take place in? What are the age ranges of your participants? How many of each gender did you have in your study? What are other interesting factors about your group? Use the Week 5 Excel Spreadsheet for the following: Use the Empirical Rule tab from the spreadsheet. Then, determine the 68%, 95%, and 99.7% values of the Empirical Rule in terms of the 20 heights in your height study. Post a screenshot of your work from the Week 5 Excel spreadsheet. What do these values tell you? Post another screenshot of your work from the Normal Probability tab from the Week 5 Excel spreadsheet. Based on your study results, what percent of the study participants are shorter than you? What percent are taller than you? (Example: If my height is 73 inches, then 20.86% of the relevant population is shorter. The other 79.14%, of course, is taller).

    Attached Files (PDF/DOCX): 2024SEP_MATH225_Week_5_Lab_Assignment_Template.docx

    Note: Content extraction from these files is restricted, please review them manually.

  • Discussion: Confidence Intervals

    The B&K Real Estate Company sells homes and is currently serving the Southeast region. It has recently expanded to cover the Northeast states. The B&K realtors are excited to now cover the entire East Coast and are working to prepare their southern agents to expand their reach to the Northeast.

    B&K has hired your company to analyze the Northeast home listing prices in order to give information to their agents about the mean listing price at 95% confidence. Your company offers three analysis packages: one based on a sample size of 100 listings, one based on 1,000 listings, and another based on a sample size of 4,000 listings. Because there is an additional cost for data collection, your company charges more for the package with 4,000 listings than for the package with 100 listings.

    Bronze Package – Sample size of 100 listings:

    • 95% confidence interval for the mean of the Northeast house listing price has a margin of error of $24,500
    • Cost for service to B&K: $2,000

    Silver Package – Sample size of 1,000 listings:

    • 95% confidence interval for the mean of the Northeast house listing price has a margin of error of $7,750
    • Cost for service to B&K: $10,000

    Gold Package – Sample size of 4,000 listings:

    • 95% confidence interval for the mean of the Northeast house listing price has a margin of error of $3,900
    • Cost for service to B&K: $25,000

    The B&K management team does not understand the tradeoff between confidence level, sample size, and margin of error. B&K would like you to come back with your recommendation of the sample size that would provide the sales agents with the best understanding of northeast home prices at the lowest cost for service to B&K.

    In other words, which option is preferable?

    • Spending more on data collection and having a smaller margin of error
    • Spending less on data collection and having a larger margin of error
    • Choosing an option somewhere in the middle

    For your initial post:

    • Formulate a recommendation and write a confidence statement in the context of this scenario. For the purposes of writing your confidence statement, assume the sample mean house listing price is $310,000 for all packages. “I am [#] % confident the true mean . . . [in context].”
    • Explain the factors that went into your recommendation, including a discussion of the margin of error

    For your response posts to your peers, choose two different packages for your responses. Do you think the agents would prefer a different confidence interval than their management? What advantages and disadvantages would there be in having different confidence intervals for the agents? Explain your thought process and reasoning in your response.

    Undergraduate Discussion Rubric

    Overview

    Your active participation in the discussions is essential to your overall success this term. Discussion questions will help you make meaningful connections between the course content and the larger concepts of the course. These discussions give you a chance to express your own thoughts, ask questions, and gain insight from your peers and instructor.

    Directions

    For each discussion, you must create one initial post and follow up with at least two response posts.

    For your initial post, do the following:

    • Write a post of 1 to 2 paragraphs.
    • In Module One, complete your initial post by Thursday at 11:59 p.m. Eastern.
    • In Modules Two through Eight, complete your initial post by Thursday at 11:59 p.m. of your local time zone.
    • Consider content from other parts of the course where appropriate. Use proper citation methods for your discipline when referencing scholarly or popular sources.

    For your response posts, do the following:

    • Reply to at least two classmates outside of your own initial post thread.
    • In Module One, complete your two response posts by Sunday at 11:59 p.m. Eastern.
    • In Modules Two through Eight, complete your two response posts by Sunday at 11:59 p.m. of your local time zone.
    • Demonstrate more depth and thought than saying things like I agree or You are wrong. Guidance is provided for you in the discussion prompt.
  • Module Five Assignment

    Scenario

    You have been hired by the Regional Real Estate Company to help them analyze real estate data. One of the companys Pacific region salespeople is working to design a new advertisement. The initial draft of the advertisement states that the average cost per square foot of home sales in the (Pacific region) is $280. The salesperson claims that the average cost per square foot in the Pacific region is less than $280. He wants you to make sure he can make that statement (that the average cost per square foot is less than $280) before asking for the advertisement text to be changed. In order to test his claim, you will generate a random sample size of 750 using data for the (Pacific region) and use this data to perform a hypothesis test.

    Prompt

    Generate a sample size of 750 houses using data for the (Pacific region). Then, design a hypothesis test and interpret the results using significance level = .05. You will work with this sample in this assignment. Briefly describe how you generated your random sample.

    Use the document attached below to help support your work on this assignment. You may also use the and tutorials for support.

    Specifically, you must address the following rubric criteria:

    • Introduction: Describe the purpose of this analysis and how you generated your random sample size of 750 houses.
    • Hypothesis Test Setup: Define your population parameter, including hypothesis statements, and specify the appropriate test.
    • Define your population parameter.
    • Write the null and alternative hypotheses.
    • Specify the name of the test you will use.
    • Identify whether it is a left-tailed, right-tailed, or two-tailed test.
    • Data Analysis Preparations: Describe sample summary statistics, provide a histogram and summary, check assumptions, and identify the test significance level.
    • Provide the descriptive statistics (sample size, mean, median, and standard deviation).
    • Provide a histogram of your sample.
    • Summarize your sample by writing a sentence describing the shape, center, and spread of your sample.
    • Check whether the assumptions to perform your identified test have been met.
    • Identify the test significance level. For example, = .05.
    • Calculations: Calculate the p value, describe the p value and test statistic in regard to the normal curve graph, discuss how the p value relates to the significance level, and compare the p value to the significance level to reject or fail to reject the null hypothesis.
    • Calculate the sample mean and standard error.
    • Determine the appropriate test statistic, then calculate the test statistic.
    • Note: This calculation is (mean target)/standard error. In this case, the mean is your regional mean (Pacific), and the target is 280.
    • Calculate the p value using one of the following tests.
    • Choose your test from the following:
    • =T.DIST.RT([test statistic], [degree of freedom]): right-tailed test
    • =T.DIST([test statistic], [degree of freedom], 1): left-tailed test
    • =T.DIST.2T([test statistic], [degree of freedom]): two-tailed test
    • Note: The degree of freedom is calculated by subtracting 1 from your sample size.
    • Using the normal curve graph as a reference, describe where the p value and test statistic would be placed.
    • Test Decision: Compare the relationship between the p value and the significance level, and decide to reject or fail to reject the null hypothesis.
    • Compare the relationship between the p value and significance level.
    • Decide to reject or fail to reject the null hypothesis.
    • Conclusion: Discuss how your test relates to the hypothesis and discuss the statistical significance.
    • Explain in one paragraph how your test decision relates to your hypothesis and whether your conclusions are statistically significant.

    Attached Files (PDF/DOCX): MAT 240 Module Five Assignment Template (2).docx

    Note: Content extraction from these files is restricted, please review them manually.

  • Computing T-tests and ANOVAS.

    Instructions

    t-tests:

    • Run your independent samples t-test examining the differences between men and women in much hurt they reported experiencing immediately after a breakup occurred. (columns A & B in the dataset)
    • Summary paragraph: explain the result of your t-test in a paragraph while addressing the following information:
    • Was the t-test significant or not significant? And what does that mean for a t-test
    • What were the group means for each group? Include them as you explain what you found.
    • Create a graph or chart to help visualize this data and paste it in the same worksheet.
    • Reference your graph in your summary.

    Oneway ANOVA:

    • Run a oneway ANOVA (single factor) examining how the medium used to enact a breakup (in-person/phone call/email/text message) impacts the current levels of happiness in regard to the breakup. (columns C-F).
    • paste your output in the same workbook to the right of your t-test output
    • If ANOVA is significant, Run a post hoc tests.
    • calculate the effect size for the anova (equation: Between group SS/Total SS)
    • Summary paragraph: Explain the results of the anova in a short paragraph while addressing the following:
    • Was it significant or not significant?
    • If it was significant, what did the post hoc results tell you?
    • Make sure to explain which pairs of groups differed significantly while including the group means as you explain the results
    • What was the effect size? What does that mean?
    • Create a graph or chart to help visualize the data and make reference to it in your write up.
    • Submit your assignment as an excel file.
    • Steps for running an independent samples t-test in Excel:
    • Make sure that
    • Choose t-test: Two sample assuming equal variances from Data Analysis tab
    • Click the labels box so they will be included
    • Enter the range of variable 1 (group 1s scores) and range of variable 2 (group 2s scores)
    • Choose a cell for the output table to go
    • Steps for running oneway ANOVA in Excel:
    1. Format columns for IV groups with their DVscores (if not done already)
    2. Choose Anova: Single Factor from the Data Analysis tab
    3. Click box for labels in first row
    4. Grouping: make sure columns is selected
    5. Input range: Highlight the data in the columns with your different groups scores (including headings)
    6. Output options: select the output range option and select the cell you want the results table to show up
    7. Examine table to determine if F-test/ANOVA was significant.
    8. If significant, continue with the following steps:
    • Post hoc results if ANOVA is significant (if ANOVA was not significant, dont proceed)
    1. Calculate Number of pairwise comparisons needed using this formula: k*(k-1)/2
    2. K = # of groups
    3. Run “ind samples t-tests for unequal variances” from data analysis tab for each pair of groups
    4. Example: group 1-group 2, group 1-group3, and group2-group3 = 3 tests
    5. Keep output range to same worksheet and paste output tables near each other (nearby cells)
    6. (Should end up with a t-test table for each pair of groups)
    7. Next, Calculate Bonferroni to adjust the alpha level for running multiple of the same test (increases our chance of committing Type 1 error)
    8. Formula: Divide .05 by the number of t-tests (pairwise comparisons) you did
    9. Compare the Bonferroni corrected alpha value to the p value (2 tailed) for each t-test,
    10. example: if you ran 3 t-tests to compare your groups, then you would divide .05/3 =.02
    11. If the p value (two-tailed) of each ind samples t-test is equal to or less than corrected alpha, that pair of groups are significantly different from each other
    • Please make sure to read chapters 12 and 14 from the textbook as well.

    Grading Criteria (20pts)

    Grading Rubric

    • Content (20pts)
    • Content of t-test (10pts)
    • Was the test run with the correct variables? (1pt)
    • Was the significance of the test addressed correctly? (2pt)
    • Did they create a graph or chart to help explain results? (2pts)
    • t-test Summary paragraph:
    • Did the results get explained while including the group means (4pts)
    • Was the effect size mentioned? (1pts)
    • ANOVA analyses (5pts)
    • Correct variables were used to compute results (1pt)
    • Post hoc tests were computed if F test was significant (1pts)
    • effect size was computed (1pt)
    • A graph was created to help explain results (1pt)
    • ANOVA summary paragraph (5pts):
    • Significance of the anova addressed? (1pt)
    • post hoc results used to describe significant pairwise differences? (2pts)
    • accurately described group differences found (while including group means) (1pt)
    • effect size mentioned? (1pt)

    Submission details:

    • Assignment is submitted as an Excel file