05-15, 2012

代写assignment：Statistical Analysis and Decision Making G Assignment

Answer all questions in this assignment

Answers should be written or typed on A4 paper and stapled at the top left hand corner. Each page should be numbered and have your ID number on it.

Do NOT place your answer in a plastic sleeve or folder.

The assignments are to be the individual work of the submitting student. Submitted assignments are not to be group efforts. However this statement is not meant to discourage students from discussing mutual problems that arise in answering the assignment questions. Such discussions should be verbal and not written. All calculations and computer entries should be done individually. If two or more assignments contain the same (or marginally changed) sentences and/or the same numerical errors they will be treated as a single assignment and the total marks awarded divided among the contributing students.

Your assignment should be handed to Dr. Shuang Liu on 16 Dec.

Late assignments, without an acceptable reason, will be penalised at a rate of 10% per day or part of day (including weekends) late. The maximum possible penalty for late assignments is 100% (i.e. a mark of zero). Note that ‘the printers were out of paper’ or other printing issues are not acceptable reasons for a late assignment.

Answer all questions in this assignment.

Question 1 (22 marks)

Canberrans have been concerned with the consumption of water and the factors that affect the water consumption level. In a recent study, researchers wanted to know whether Canberra's daily water consumption (ML) could be predicted by the daily maximum temperature (°C). They investigated data for January, 2011. They used Excel to analyse the results. Some of the Excel output is shown below:

Figure 1: Water Usage against Daily Maximum Temperature in Canberra, January, 2011

Source: The Canberra Times

Figure 2: Excel Output, Regression Analysis of Water Usage against Daily Maximum Temperature in Canberra, January, 2011

a)What are the independent and dependent variables?

The variable whose value depends on the other variable is called the dependent variable. The other variable is referred to as the independent variable.

Water usage = independent variable

Daily Maximum temperature = dependent variable

b)Write down the estimated equation.

Y= -4.76+4.92X

c)For January, 2011, the average water usage was 137.1 ML. On January 10th, the daily maximum temperature was 26.5 °C and the water usage was 111 ML. For January 10th, calculate:

i)the predicted water consumption

Y= -4.76+4.92X = -4.76+ 4.92*26.5 = -4.76 + 130.38 = 125.62 ML

ii)the residual

r = y- y^ = 111ML – 125.62ML = - 14.62ML

iii)the explained deviation from the mean

Explained deviation = y^ - mean y = 125.62ML - 137.1 ML = -11.48 ML#p#分页标题#e#

iv)the observed deviation from the mean

Observed deviation = y – mean y = 111ML – 137.1ML = 1 26.1 ML

d)Draw a diagram to show how the four values you calculated in (c) are related

e)The value of the coefficient of determination is missing from the Excel output. Calculate the value of the coefficient of determination and in one or two sentences, provide an interpretation of this value.

RSS = 13468.02987

TSS = 21648.68966

R = RSS/TSS = 13468.02987/ 21648.68966 = 0.622

The variability of water consumption is 62.2%.

f)The value of the correlation coefficient is missing from the Excel output. Calculate the correlation coefficient and in one or two sentences, provide an interpretation of this value.

g)Use Excel to generate your regression output for the data given with earlier months than those used in Figure 1. This must be done using the method given in the Excel exercises in Week 4 on Moodle.

where y = the daily water consumption (ML)

x = the temperature (ºC)

Print out your Excel output. Do not submit more than one page of this.

SUMMARY OUTPUT

回归统计

Multiple R 0.781807654

R Square 0.611223208

Adjusted R Square 0.608936286

标准误差 3.602446345

观测值 172

方差分析

df SS MS F Significance F

回归分析 1 3468.514 3468.514 267.2689 1.03732E-36

残差 170 2206.195 12.97762

总计 171 5674.709

Coefficients 标准误差 t Stat P-value Lower 95% Upper 95% 下限 95.0% 上限 95.0%

Intercept 5.324651686 1.303233 4.085726 6.76E-05 2.752048818 7.897255 2.752049 7.897255

140 0.14926083 0.00913 16.34836 1.04E-36 0.131238022 0.167284 0.131238 0.167284

RESIDUAL OUTPUT

#p#分页标题#e#

观测值 预测 31.7 残差

1 24.72855953 8.57144

2 25.02708119 6.272919

3 24.43003787 -0.03004

4 24.13151621 -5.83152

5 22.63890792 -2.33891

6 25.77338534 -1.27339

7 29.50490608 -0.60491

8 28.46008027 3.13992

9 30.1019494 2.898051

Question 2 (14 marks)

An internet blogger is interested in the amount of time readers spend browsing his blog during one particular visit. For each reader who visited his blog, the blogger recorded whether the amount was ‘less than 5 minutes’, ‘between 5 and 10 minutes inclusive' or ‘more than 10 minutes’.

a)What is the sample space?

(‘less than 5 minutes’, ‘between 5 and 10 minutes inclusive' , ‘more than 10 minutes’)

b)After recording data for 400 people, the blogger found that 283 spent ‘less than 5 minutes’, 99 spent ‘between 5 and 10 minutes inclusive' and 18 spent ‘more than 10 minutes’. Construct the probability distribution for the amount of time spent browsing the blog.

less than 5 minutes = 283/ 400 = 0.7075

between 5 and 10 minutes inclusive = 99 /400 = 0.2475

more than 10 minutes = 18/400 = 0.045

c)In a few words, please specify the approach to probability you have used in b).

There are 3 approaches to probability. They are classical, relative frequency and formula.

I have used relative frequency in b.

d)What is the probability that a visitor to the blog spends ‘less than 10 minutes (inclusive)’ browsing?

The probability that a visitor to the blog spends ‘less than 10 minutes (inclusive)’ browsing is 283/400 +99/400 = 0.7075+ 0.2475 = 0.955 = 95.5%

e)The internet blogger also recorded whether the customer ‘clicked on an ad’ or

‘did not click on an ad’. The probability that a visitor ‘clicked on an ad’ was 0.0025 and the probability that a visitor ‘clicked on an ad and spent more than 10 minutes browsing’ was 0.00185. Are the events `clicked on an ad' and ‘spent more than 10 minutes browsing’ independent events?

According to question, P (AB) = 0.00185 , P (A) = 0.0025 and P (B) = 0.045.

P(AB) not = P(A) P(B) ,so that the events `clicked on an ad' and ‘spent more than 10 minutes browsing’ are not independent events.

Question 3 (14 marks)#p#分页标题#e#

The University of Orkutschk has collected information on the background knowledge of its MBA students. 55% of MBA students at the University have previously studied Statistics, 45% have previously studied Management and 40% have previously studied both.

a)Display the above information in a Venn or tree diagram.

b)What is the probability that a randomly selected MBA student has previously studied neither Statistics nor Management?

1- P(S or M) = 1- (55%-40%)(45%-40%) - 40%= 0.5925

c)An MBA student is randomly selected. It is found that this student has studied Statistics. What is the probability that this student has also studied Management?

40% / 55% = 72.72%

Next door to the University of Orkutschk, lies the University of Qapla. Whilst the University of Qapla does not offer a MBA program, it does offer a 5 day intensive management training program. In this program, 60% of the trainees are female and 40% are male. Of the female trainees, 70% have previously studied Statistics and of the male trainees, 65% have previously studied Statistics.

d)Display the above probabilities in a Venn or tree diagram.

e)What is the probability that a randomly selected trainee is female and has studied Statistics?

0.6*0.7 = 0.42%

f)What is the probability that a randomly selected trainee has studied Statistic

0.42+0.4*0.65 = 68%

g)A randomly selected trainee has studied Statistics, what is the probability that the trainee is female?

42%/68% = 61.78%

相关文章

UKthesis provides an online writing service for all types of academic writing. Check out some of them and don't hesitate to place your order.