MUST HAVE SAS software
Instructions included in the attachment and below:
The dataset generated from a recent class survey is posted on Blackboard as an Excel workbook named
Survey. Use this dataset to answer the items listed below.
Before answering the questions below, you will need to clean the data in the Excel spreadsheet to
prepare it for import to SAS. Clean it according to the specific instructions listed here:
Delete the rows corresponding to respondents who answered fewer than 80% of the questions
(i.e., 6 questions). The percent of questions answered is in column B, and the number answered
is in column C.
Clean columns F through J (the columns where the survey question requested numbers only) as
follows:
o IF you find nonnumeric data in these columns, replace the cell contents with the numeric
value that is indicated if possible (e.g., for age, 24 years becomes just the number 24; a
height of 51 would be read as 5 feet 1 inch, which in inches is just 61).
If conversion is necessary, round to the nearest 0.1 (e.g., 200cm would be 78.7
in the height column that requested inches). Conversion values are in the
question text (original row 1 values).
If a number in the specified units cannot be identified, delete the contents of
those cells.
o If any reported weights are less than 70 (pounds), assume that the entry is in error and
delete the value.
o [Note: column D does not need to be made numeric as more than 2 was an option on a
multiple choice question.] Replace text in row 1 (which currently has the full text of survey questions) with short names to
be used as variable names for your SAS dataset. Do not include spaces or special characters other
than underscore (_).
After cleaning the spreadsheet, upload it to SAS Studio and import it into a SAS dataset. Verify that all
numeric columns (F,G,H,I,J) have been imported as variables with the number type (and not as
text). If that is not the case, try to correct the spreadsheet and then upload and import again.
Use SAS to answer all of the questions below. For questions that require you to create a new variable
(questions #3 and #4), you can do this in SAS (if you know how) or you can create a new column in
Excel with the new variable and import the data again. But you must use SAS to produce the answers to
the questions.
1. Provide these descriptive statistics:
i. number of nonmissing responses (n);
ii. mean;
iii. standard deviation;
iv. median;
v. first quartile; and
vi. third quartile
for each of the following variables:
a. b. Number of hours of exercise per week
Number of hours of TV/movie viewing per week
2. Produce a frequency distribution for the following variables, ordered from the most frequent to
the least frequent:
a. Place of residence (Where do you live?)
b. Pet prefence (Cat person/dog person?)
3. Create a new variable for place of residence, classifying everyone as residing in Manhattan or
somewhere else (i.e., group Queens, Brooklyn, etc., into a single Other category). Then, use this
new variable to answer the following questions.
a. b. c. What is the mean number of hours of TV and movie viewing of those living in Manhattan?
What is the mean number of hours of TV and movie viewing of those living somewhere
other than Manhattan?
Assuming that our respondents are a representative sample of graduate students at Mount
Sinai, test whether students who are Manhattan residents have a different mean TV/movie
viewing time than students living elsewhere. Assume that viewing time is normally
distributed within each of these populations. Use a significance level of =0.05. Be sure to
state the following:
(i) null and alternative hypotheses
(ii) the value of the test statistic
(iii) the p value for the test4. 5. (iv) the conclusion, both in terms of whether or not to reject the null hypothesis, and in
terms of the study variables.
Body mass index (BMI) is one measure of body size commonly used in research. It is calculated as
( )
=
( )2 or ( )
( )2 703. From reported weight and height in the class
survey dataset, calculate each respondents BMI (note: BMI should be recorded as missing (.) if
either height or weight is missing, and respondents with missing BMI should be excluded from
this analysis). Use the calculated BMI variable to answer the following (among those with a
calculable BMI):
a. What proportion of respondents had BMI < 25?
b. What proportion had BMI of at least 25 but less than 30?
c. What proportion had BMI of 30 or greater?
Produce a table of (Pearson) correlation coefficients representing the correlation of age (column
H of the original spreadsheet) with all other numeric variables (columns F, G, I, and J). Using this
table, answer the following:
a. Which variable(s), if any, is/are significantly correlated with age? (That is, for which
variables would you reject the null hypothesis of a zero population correlation? Use an
alpha level of 0.05. Just list the variable(s) that meet this condition, you neednt do more.)
Find the variable with the largest direct (positive) correlation with age (largest positive r,
regardless of significance level).
(i) Which variable is it?
(ii) For this variable, what is the value of r, the Pearson correlation (with age)?
(iii) For this variable, what is the p value for testing a null hypothesis of zero
correlation? (Do not write out the whole hypothesis test, just provide the p
value from your output.)
Requirements: Short and Concise
Leave a Reply
You must be logged in to post a comment.