• Tidak ada hasil yang ditemukan

Recoding missing values into another category

The missing value group has the highest median and average systolic blood pressure. In order to create a new variable with three levels type:

> saltadd1 <- saltadd

> levels(saltadd1) <- c("no", "yes", "missing")

> saltadd1[is.na(saltadd)] <- "missing"

> summary(saltadd1)

no yes missing 37 43 20

> summary(aov(age ~ saltadd1))

Df Sum Sq Mean Sq F value Pr(>F) saltadd1 2 114.8 57.4 0.4484 0.64 Residuals 97 12421.8 128.1

Since there is not enough evidence that the missing group is important and for additional reasons of simplicity, we will ignore this group and continue the analysis with the original 'saltadd' variable consisting of only two levels. Before doing this however, a simple regression model and regression line are first fitted.

> lm1 <- lm(sbp ~ age)

> summary(lm1) Coefficients:

Estimate Std. Error t value Pr(>|t|) (Intercept) 65.1465 14.8942 4.374 3.05e-05 age 1.8422 0.2997 6.147 1.71e-08

Residual standard error: 33.56 on 98 degrees of freedom Multiple R-Squared: 0.2782, Adjusted R-squared: 0.2709 F-statistic: 37.78 on 1 and 98 DF, p-value: 1.712e-08

Although the R-squared is not very high, the P value is small indicating important influence of age on systolic blood pressure.

A scatterplot of age against systolic blood pressure is now shown with the regression line added using the 'abline' function, previously mentioned in chapter 11. This function can accept many different argument forms, including a regression object. If this object has a 'coef' method, and it returns a vector of length 1, then the value is taken to be the slope of a line through the origin, otherwise the first two values are taken to be the intercept and slope, as is the case for 'lm1'.

> plot(age, sbp, main = "Systolic BP by age", xlab = "Years", ylab = "mm.Hg")

> coef(lm1)

(Intercept) age 65.1465 1.8422

> abline(lm1)

30 40 50 60 70

100150200

Systolic BP by age

Years

mm.Hg

Subsequent exploration of residuals suggests a non-significant deviation from normality and no pattern. Details of this can be adopted from the techniques discussed in the previous chapter and are omitted here. The next step is to provide different plot patterns for different groups of salt habits.

> lm2 <- lm(sbp ~ age + saltadd)

> summary(lm2)

====================

Coefficients:

Estimate Std. Error t value Pr(>|t|) (Intercept) 63.1291 15.7645 4.005 0.000142 age 1.5526 0.3118 4.979 3.81e-06 saltaddyes 22.9094 6.9340 3.304 0.001448 ---

Residual standard error: 30.83 on 77 degrees of freedom Multiple R-Squared: 0.3331, Adjusted R-squared: 0.3158 F-statistic: 19.23 on 2 and 77 DF, p-value: 1.68e-07

On the average, a one year increment of age increases systolic blood pressure by 1.5 mmHg. Adding table salt increases systolic blood pressure significantly by approximately 23 mmHg.

Similar to the method used in the previous chapter, the following step creates an empty frame for the plots:

> plot(age, sbp, main="Systolic BP by age", xlab="Years", ylab="mm.Hg", type="n")

Add blue hollow circles for subjects who did not add table salt.

> points(age[saltadd=="no"], sbp[saltadd=="no"], col="blue") Then add red solid points for those who did add table salt.

> points(age[saltadd=="yes"], sbp[saltadd=="yes"], col="red", pch = 18)

Note that the red dots corresponding to those who added table salt are higher than the blue circles. The final task is to draw two separate regression lines for each group.

Since model 'lm2' contains 3 coefficients, the command abline now requires the argument 'a' as the intercept and 'b' as the slope.

> coef(lm2)

(Intercept) age saltaddyes 63.129112 1.552615 22.909449

We now have two regression lines to draw, one for each group. The intercept for non-salt users will be the first coefficient and for salt users will be the first plus the third. The slope for both groups is the same. Thus the intercept for the non-salt users is:

> a0 <- coef(lm2)[1]

For the salt users, the intercept is the first plus the third coefficient:

> a1 <- coef(lm2)[1] + coef(lm2)[3]

For both groups, the slope is fixed at:

> b <- coef(lm2)[2]

Now the first (lower) regression line is drawn in blue, then the other in red.

> abline(a = a0, b, col = "blue")

> abline(a = a1, b, col = "red")

Note that X-axis does not start at zero. Thus the intercepts are out of the plot frame.

The red line is for the red points of salt adders and the blue line is for the blue points of non-adders. In this model, age has a constant independent effect on systolic blood pressure.

Look at the distributions of the points of the two colours; the red points are higher than the blue ones but mainly on the right half of the graph. To fit lines with different slopes, a new model with interaction term is created.

30 40 50 60 70

100150200

Systolic BP by age

Years

mm.Hg

The next step is to prepare a model with different slopes (or different 'b' for the abline arguments) for different lines. The model needs an interaction term between 'addsalt' and 'age'.

> lm3 <- lm(sbp ~ age * saltadd)

> summary(lm3) Call:

lm(formula = sbp ~ age * saltadd)

===============

Coefficients:

Estimate Std. Error t value Pr(>|t|) (Intercept) 78.0066 20.3981 3.824 0.000267 ***

age 1.2419 0.4128 3.009 0.003558 **

saltaddyes -12.2540 31.4574 -0.390 0.697965 age:saltaddyes 0.7199 0.6282 1.146 0.255441 ---

Multiple R-Squared: 0.3445, Adjusted R-squared: 0.3186 F-statistic: 13.31 on 3 and 76 DF, p-value: 4.528e-07

In the formula part of the model, 'age * saltadd' is the same as 'age + saltadd + age:saltadd'. The four coefficients are displayed in the summary of the model. They can also be checked as follows.

> coef(lm3)

(Intercept) age saltaddyes age:saltaddyes 78.0065572 1.2418547 -12.2539696 0.7198851 The first coefficient is the intercept of the fitted line among non-salt users.

For the intercept of the salt users, the second term and the fourth are all zero (since age is zero) but the third should be kept as such. This term is negative. The intercept of salt users is therefore lower than that of the non-users.

> a0 <- coef(lm3)[1]

> a1 <- coef(lm3)[1] + coef(lm3)[3]

For the slope of the non-salt users, the second coefficient alone is enough since the first and the third are not involved with each unit of increment of age and the fourth term has 'saltadd' being 0. The slope for the salt users group includes the second and the fourth coefficients since 'saltaddyes' is 1.

> b0 <- coef(lm3)[2]

> b1 <- coef(lm3)[2] + coef(lm3)[4]

These terms are used to draw the two regression lines.

Redraw the graph but this time with black representing the non-salt adders.

> plot(age, sbp, main="Systolic BP by age", xlab="Years", ylab="mm.Hg", pch=18, col=as.numeric(saltadd))

> abline(a = a0, b = b0, col = 1)

> abline(a = a1, b = b1, col = 2)

> legend("topleft", legend = c("Salt added", "No salt added"), lty=1, col=c("red","black"))

30 40 50 60 70

100150200

Systolic BP by age

Years

mm.Hg

Salt added No salt added

Note that 'as.numeric(saltadd)' converts the factor levels into the integers 1 (black) and 2 (red), representing the non-salt adders and the salt adders, respectively. These colour codes come from the R colour palette.

This model suggests that at the young age, the systolic blood pressure of two groups are not much different as the two lines are close together on the left of the plot. For example, at the age of 25, the difference is 5.7mmHg. Increasing age increases the difference between the two groups. At 70 years of age, the difference is as great as 38mmHg. (For simplicity, the procedures for computation of these two levels of difference are skipped in these notes). In this aspect, age modifies the effect of adding table salt.

On the other hand the slope of age is 1.24mmHg per year among those who did not add salt but becomes 1.24+0.72 = 1.96mmHg among the salt adders. Thus, salt adding modifies the effect of age. Interaction is a statistical term whereas effect modification is the equivalent epidemiological term.

The coefficient of the interaction term 'age:saltaddyes' is not statistically significant.

The two slopes just differ by chance.

Exercise_________________________________________________

Plot systolic and diastolic blood pressures of the subjects, use red colour of males and blue for females as shown in the following figure. [Hint: segments]

0 20 40 60 80 100

0 50 100 150 200

Index

blood pressure

Systolic and diastolic blood pressure of the subjects

Check whether there is any significant difference of diastolic blood pressure among