4.2 Method
4.2.4 Data processing
4.2.5.1 Introduction of typical classification models
In the theory of statistical learning, according to whether the data are labeled or not, the models are divided into supervised learning models and unsupervised learning models. In this study, our data were labeled as a faller or a non-faller based on fall history. So we intended to use supervised learning models to classify fallers and non-fallers. Many supervised learning models have been proposed for the classification. According to the flexibility of the model, these classification models include linear models, tree models, neural network models, and support vector machine (Friedman et al.,2001).
The basic linear model for classification is the logistic regression model, which is a regression model measuring the relationship between the categorical variable and one or more independent variables by estimating probabilities using a logistic function (Hilbe,2009). In the logistic regression, the model tries to learn p(y|x)directly that learns mappings directly from the space of input x to the labels y. It is also called discriminative learning model. There are also another kind of models called generative learning models, which try to modelp(x|y)by using the Bayes rule to derive the posterior distribution on y given
x:
p(y|x) = p(x|y)p(y) p(x)
Linear discrimination analysis also called Gaussian discriminant analysis assumes thatp(~x|y)is a mul- tivariate Gaussian distribution, where~xis a vector of continuous variables. Another generative learning model is Naive Bayesion model that is also based on Bayes’ theorem with independence assumptions between predictor variables.
Tree-based models partition the feature space into a set of rectangles, and then fit a simple model in each one. A popular and simple tree-based model is a classification and regression tree (CART). The basic tree models are easy to explain, more closely mirror human decision-making than do the regression and classification approaches, and are able to be displayed graphically. Unfortunately, basic trees generally do not have the same level of predictive accuracy. However, by aggregating many basic decision trees, using methods like bagging, random forests and boosting, the predictive performance of trees can be substantially improved. The boosted regression tree differs fundamentally from basic tree models (E.g.
CART) that produce a single ’best’ model, instead using the technique of boosting to combine large numbers of relatively simple tree models adaptively, to optimize predictive performance (Elith et al., 2008). Different from boosted tree model, random forests add an additional layer of randomness to bagging. In addition to constructing each tree using a different bootstrap sample of the data, random forests change how the classification or regression trees are constructed (Liaw and Wiener,2002).
Neural network models in artificial intelligence are usually known as artificial neural networks (ANN).
It is an information paradigm that is inspired by the way biological nervous systems, such as the brain, process information (Anderson,1995). A neural network has several inputs, hidden, and output nodes.
Each node applies a function (E.g. linear, logistic), and returns an output. Every node in the proceeding layer takes a weighted average of the outputs of the previous layer, until an output is reached. The reasoning is that multiple nodes can collectively gain insight about solving a problem (like classification) that an individual node cannot. The cost function differs for this type of model and the weights between nodes adjust to minimize errors. However, ANNs often converge on local minima rather than global minima, meaning that they are essentially ”missing the big picture” sometimes, or missing the forest for the trees. ANNs often overfit if training goes on too long, meaning that for a given pattern, an ANN might start to consider the noise as part of the pattern.
Another advance model that overcomes the disadvantages of ANN is support vector machine (SVM), which is to construct a hyperplane or set of hyperplanes in a high- or infinite-dimensional space, which can be used for classification, regression, or other tasks (Cristianini and Shawe-Taylor,2000). In detail, SVM tries to fit a hyperplane/function between two different classes, while given a maximum margin parameter. This hyperplane attempts to separate the classes so that each falls on either side of the plane, and by a specified margin. There is a specific cost function for this kind of model which adjusts the plane until the error is minimized.
There is trade-off between prediction accuracy and model interpretability (James et al.,2014). Models with low flexibility such as logistic regression have good interpretability and low variance of its pre- diction accuracy, but low prediction accuracy and use restrictive assumptions. On the other hand, high flexible models such as SVM present high prediction accuracy but high variance of its accuracy and low interpretability. Therefore, we would like to select typical models with different flexibility. In this study, six typical statistical models including linear models of logistic regression and linear discriminant analysis, basic tree model of classification and regression tree, advance tree models of boosted tree and random forest, and support vector machine (SVM) radial basis function were used to assess the fall risk (Table4.8).
TABLE4.8: Typical fall risk assessment models.
Model type Typical model
Linear Model
Logistic regression Linear discriminant analysis Basic tree model CART: Classification and regression tree Advance tree model
Boosted tree Random forest
Support vector machine (SVM) SVM radial basis function (SVMRBF)