• Tidak ada hasil yang ditemukan

The dataset used to create the metabolic syndrome classier is the SAPBA dataset col- lected by researchers from the North West University [46]. It contains over 1200 elds and measurements for 409 participants. Many of these elds are the same conditions measured with dierent techniques or dierent equipment. Since these dierent measure- ment techniques fall outside the scope of this study, only a single relevant measurement for each condition was included. Measurements that could not be feasibly collected on a smartphone and do not apply to any of the standard denitions of MetS (discussed in

2Total Energy Expenditure

3Proof of Concept

Figure 3.1: Example of biometric data and PPG measurement sections from the applica- tion, with bottom navigation bar

Section 4.2.2) were also excluded. If any of the applicable elds for a participant was blank the participant was excluded. This left 24 applicable elds for 402 participants. A description of every eld and the method used to measure it is provided in appendix A.

Figure 3.2: Correlation matrix for data from SAPBA dataset

The correlation matrix of all the applicable variables in the SAPBA dataset can be seen in Figure 3.2. The correlation between variables gives a good initial indication of corre- lation that may be between variables. Lack of correlation does not necessarily rule out a relationship though. It may just indicate that, if there were a relationship, it may be non-linear. It is because of this non-linear relationship with many of the available inputs that an ANN will be used to determine MetS, since neural networks are particularly well suited to non-linear classications.

As can be seen from Figure 3.2, the variables that correlate the most with MetS are (in order of highest absolute correlation to lowest):

1. Waist circumference (0.51) 2. Body mass (0.47)

3. Serum triglyceride levels (0.45) 4. Serum HdL cholesterol (0.43) 5. Diastolic blood pressure (0.42) 6. BMI (0.39)

7. Systolic blood pressure (0.36) 8. Blood glucose (0.30)

As would be expected, the values that correlate the closest to MetS are those that are used in ordinary circumstances to make a diagnosis. All of these have a correlation between 0.3 and 0.6, which is a signicant correlation but not enough to make a valid diagnosis on its own. The high correlation of body mass and waist circumference is promising, since both of these values are provided by the user on the application.

In Figure 3.3 the distribution of some of the features for patients with and without MetS can be seen. Comparing the two distributions for each feature may provide some insight into how the feature relates to MetS. From Figure 3.3, it can be seen that below the age of 40-45 the likelihood of having MetS is lower, but over the age of 45 the distribution is very similar. BMI, HDL cholesterol and SBP all have close to normal distributions with just a shift in mean between positive and negative MetS diagnoses. Specically, there is a mean dierence in BMI of about 5 with positive diagnosis being higher, negative diagnosis shows an average HDL that is 0.6 mmol/l higher than positive diagnosis, and a dierence in SBP of about 10 mmHg. These dierences are signicant, but not to such an extent that a single feature could be used to accurately screen for MetS.

Figure 3.3: Distribution of metabolic syndrome risk factors

3.4 Preprocessing and feature selection

3.4.1 Main features

The initial features chosen to determine MetS was largely based on [15] and the features that were available in the SAPBA dataset. However, since the training data used in this dissertation is dierent to that used by [15], the results may also dier somewhat. The following features were chosen for the rst model:

1. Age 2. Gender 3. BMI

4. Waist-to-height ratio 5. SBP

6. DBP 7. Heart rate

These features were chosen because they are all non-invasive measurements that can potentially be made by a smartphone or because they would already be known by a user.

3.4.2 Lifestyle factors

A second model was built that included all the other information that was available in the SABPA dataset that may be useful for determining MetS. The following features were selected and included with the previous features:

1. Medical history 2. Alcohol use 3. Smoking 4. Activity level

Both medical history and activity level aren't directly available in the dataset and required some preprocessing to determine. Activity level is discussed in Section 3.4.3. Medical history is a combination of the diseases provided in the SAPBA dataset, such as diabetes or stroke, by means of an OR operation. The conditions that were included in the OR operation are:

ˆ Cardiovascular disease history

ˆ Stroke history

ˆ Myocardial infarction/ cardiac events history

ˆ Kidney disease history

ˆ Atrial brillation

ˆ Use of anti-hypertensive drugs

ˆ Use of anti-diabetic drugs

Note that while some of these features (like medical history) may not necessarily be lifestyle related, for the sake of simplicity going forward, when dierentiating between the two models the inclusion of lifestyle factors is the distinction that will be made.

3.4.3 Activity level

Several studies have shown the impact of exercise on MetS [7][3][17]. The SAPBA dataset does not include any direct indicator of activity level, however, it does include TEE (Total Energy Expenditure). This is a measure of the total caloric consumption of an individual in a day, which was measured using an activity tracker. The BMR4, which is the amount of calories required per day without any physical activity, can be predictively calculated with the revised Harris-Benedict equations [66]:

Table 3.1: Harris-Benedict equations as revised by Miin and St Jeor

Men BMR = (10× weight in kg) + (6.25× height in cm) + (5×age in years) + 5 Women BMR = (10× weight in kg) + (6.25× height in cm) + (5×age in years) - 161 The activity level is then determined by looking at the ratio of TEE over BMR. The following classication is dened by the original Harris-Benedict equations [66]:

ˆ Little/no exercise: T EE/BM R= 1.2

ˆ Light exercise: T EE/BM R= 1.375

ˆ Moderate exercise (3-5 days/wk): T EE/BM R = 1.55

ˆ Very active (6-7 days/wk): T EE/BM R= 1.725

ˆ Extra active (very active & physical job): T EE/BM R = 1.9

With this classication it will be easy to link an activity level to users using the application.

In the model the TEE / BMR ratio will be normalised, so that each option a user could ll in would represent the following in the model:

ˆ Little/no exercise: 0

ˆ Light exercise: 0.25

ˆ Moderate exercise (3-5 days/wk): 0.5

ˆ Very active (6-7 days/wk): 0.75

ˆ Extra active (very active & physical job): 1

3.4.4 Scaling

All of the features were scaled to be between 0 and 1. Some were binary values, which were either 0 or 1. Activity level has ve possibilities ranging from 0 to 1. Analog values were scaled to be between 0 and 1 using min-max scaling per feature of the training set.

A description of each feature's value range is given in Table 3.2.

4Basal Metabolic Rate

Table 3.2: Value ranges of metabolic syndrome features

Feature Value range

Age 0.0-1.0

Gender 0 or 1

BMI 0.0-1.0

Waist-to-height ratio 0.0-1.0

SBP 0.0-1.0

DBP 0.0-1.0

Heart rate 0.0-1.0

Medical history 0 or 1

Alcohol use 0 or 1

Smoking 0 or 1

Activity level 0.0-1.0