SAS Biostats Project

MUST HAVE SAS software

Instructions included in the attachment and below:

The dataset generated from a recent class survey is posted on Blackboard as an Excel workbook named

Survey. Use this dataset to answer the items listed below.

Before answering the questions below, you will need to clean the data in the Excel spreadsheet to

prepare it for import to SAS. Clean it according to the specific instructions listed here:

Delete the rows corresponding to respondents who answered fewer than 80% of the questions

(i.e., 6 questions). The percent of questions answered is in column B, and the number answered

is in column C.

Clean columns F through J (the columns where the survey question requested numbers only) as

follows:

o IF you find nonnumeric data in these columns, replace the cell contents with the numeric

value that is indicated if possible (e.g., for age, 24 years becomes just the number 24; a

height of 51 would be read as 5 feet 1 inch, which in inches is just 61).

If conversion is necessary, round to the nearest 0.1 (e.g., 200cm would be 78.7

in the height column that requested inches). Conversion values are in the

question text (original row 1 values).

If a number in the specified units cannot be identified, delete the contents of

those cells.

o If any reported weights are less than 70 (pounds), assume that the entry is in error and

delete the value.

o [Note: column D does not need to be made numeric as more than 2 was an option on a

multiple choice question.] Replace text in row 1 (which currently has the full text of survey questions) with short names to

be used as variable names for your SAS dataset. Do not include spaces or special characters other

than underscore (_).

After cleaning the spreadsheet, upload it to SAS Studio and import it into a SAS dataset. Verify that all

numeric columns (F,G,H,I,J) have been imported as variables with the number type (and not as

text). If that is not the case, try to correct the spreadsheet and then upload and import again.

Use SAS to answer all of the questions below. For questions that require you to create a new variable

(questions #3 and #4), you can do this in SAS (if you know how) or you can create a new column in

Excel with the new variable and import the data again. But you must use SAS to produce the answers to

the questions.

1. Provide these descriptive statistics:

i. number of nonmissing responses (n);

ii. mean;

iii. standard deviation;

iv. median;

v. first quartile; and

vi. third quartile

for each of the following variables:

a. b. Number of hours of exercise per week

Number of hours of TV/movie viewing per week

2. Produce a frequency distribution for the following variables, ordered from the most frequent to

the least frequent:

a. Place of residence (Where do you live?)

b. Pet prefence (Cat person/dog person?)

3. Create a new variable for place of residence, classifying everyone as residing in Manhattan or

somewhere else (i.e., group Queens, Brooklyn, etc., into a single Other category). Then, use this

new variable to answer the following questions.

a. b. c. What is the mean number of hours of TV and movie viewing of those living in Manhattan?

What is the mean number of hours of TV and movie viewing of those living somewhere

other than Manhattan?

Assuming that our respondents are a representative sample of graduate students at Mount

Sinai, test whether students who are Manhattan residents have a different mean TV/movie

viewing time than students living elsewhere. Assume that viewing time is normally

distributed within each of these populations. Use a significance level of =0.05. Be sure to

state the following:

(i) null and alternative hypotheses

(ii) the value of the test statistic

(iii) the p value for the test4. 5. (iv) the conclusion, both in terms of whether or not to reject the null hypothesis, and in

terms of the study variables.

Body mass index (BMI) is one measure of body size commonly used in research. It is calculated as

( )

=

( )2 or ( )

( )2 703. From reported weight and height in the class

survey dataset, calculate each respondents BMI (note: BMI should be recorded as missing (.) if

either height or weight is missing, and respondents with missing BMI should be excluded from

this analysis). Use the calculated BMI variable to answer the following (among those with a

calculable BMI):

a. What proportion of respondents had BMI < 25?

b. What proportion had BMI of at least 25 but less than 30?

c. What proportion had BMI of 30 or greater?

Produce a table of (Pearson) correlation coefficients representing the correlation of age (column

H of the original spreadsheet) with all other numeric variables (columns F, G, I, and J). Using this

table, answer the following:

a. Which variable(s), if any, is/are significantly correlated with age? (That is, for which

variables would you reject the null hypothesis of a zero population correlation? Use an

alpha level of 0.05. Just list the variable(s) that meet this condition, you neednt do more.)

Find the variable with the largest direct (positive) correlation with age (largest positive r,

regardless of significance level).

(i) Which variable is it?

(ii) For this variable, what is the value of r, the Pearson correlation (with age)?

(iii) For this variable, what is the p value for testing a null hypothesis of zero

correlation? (Do not write out the whole hypothesis test, just provide the p

value from your output.)

Requirements: Short and Concise

WRITE MY PAPER

Comments

Leave a Reply