I would like to thank the current head of the Department of Computer Science and Engineering for providing an excellent research environment. All the proposed models are evaluated on two different relationship classification tasks, namely, clinical relationship classification (CRC) and drug relationship classification (DDI Extraction).
Problem Formulation
Sequence Labeling Problem: Sequence labeling is the task of assigning a label to an element of a sequence from a predefined label setL. In the literature, the named entity recognition task is modeled as a sequence labeling problem, where each word in a text is assigned a label from a predefined set.
Standard Methods
High performance levels have also been achieved through this method [Li, 2012, Bjorne et al., 2013]. Feature-based methods use sentences with predefined entities to construct a feature vector through feature extraction [Hong, 2005, Minard et al., 2011, Rink et al., 2011].
Contribution of the Thesis
We evaluate the performance of the proposed model on the I2B2/VA-2010 clinical relationship classification and SemEval 2013 DDI extraction challenge datasets. We evaluate the performance of the proposed models on clinical relationship classification and drug-drug interaction extraction tasks.
Outline of the Thesis
This chapter briefly describes all relevant neural networks used in our proposed models and their training mechanisms. Finally, we conclude this chapter by reporting on the corpus and preprocessing used to obtain the pre-trained word vectors used in our experiments.
Neural Networks
For computing the output ht, the RNN uses the inputs xt as well as the output state at (t−1). The final output of a hidden state is calculated based on the current status of the memory cell and the output gate.
Training of Neural Network
Cross Entropy Loss
Backpropagation Algorithm
Parameter Update
Word Embedding
CBOW: CBOW takes a window of preceding and following words as its input and predicts the center word as its output. It takes a single word as input and predicts the entire context of words as output.
GloVe
For our experiments, we downloaded the word2vec2 tool, which is effectively implemented in C. It should be noted here that when the number of co-occurrences between words is zero, the weighting function will also return zero.
Corpus Data and Preprocessing
Evaluation Metrics of NER and RE
Conclusion
Named entity recognition is one of the necessary steps in some biomedical and clinical information extraction tasks. We compare the performance of the proposed model with that of existing state-of-the-art models on benchmark datasets for all three tasks.
Introduction
Although the proposed methods outperform several generic domain sequence tagging tasks, their performance fails to surpass the state of the art in the biomedical domain [Yao et al., 2015]. It has been observed that word-level embeddings preserve the syntactic and semantic properties of a word, but they may fail to retain morphological information, which may also play an important role in biomedical entity recognition [dos Santos and Zadrozny, 2014, Lample et al. , 2016, Mahbub Chowdhury and Lavelli, 2010, Leaman and Gonzalez, 2008]. The learned feature vectors are used in a CRF layer to predict the correct label order of a sentence.
Model Architecture
- Features Layer
- Word BLSTM Layer
- CRF Layer
- Training and Implementation
The output of the feature layer is a sequence of vectors corresponding to the sequence of words in a sentence. Thus, a fixed-length window limits the ability of the learned vectors to acquire knowledge of the entire sentence. The result of the Word BLSTM layer is again a sequence of vectors that have contextual and semantic information.
The Benchmark Tasks
Recognizing entities with the name of a disease in text is crucial for gaining knowledge about the disease [Bundschus et al., 2008, Agarwal and Searls, 2008]. This dataset has four drug types as entities, namely drug, brand, group and drug no. The publicly available (under license) I2B2/VA6 challenge dataset is used to identify clinical entities [Uzuner et al., 2011].
Results and Discussion
- Experimental Design
- Baseline Methods
- Comparison with Baseline
- Comparison with Other Methods
- Feature Ablation Study
- Effects of CRF and BLSTM
- Analysis of Learned Word Embeddings
NA indicates that the word is not present in the glossary of the GloVe vectors. Then, we analyze the embedding characteristics of learned words after training the proposed model. We obtained a character-based word embedding for each word of the NER drug dataset after training.
Conclusion
In the previous chapter, we examined how different neural networks can be used in a hierarchy to learn rich morphological and contextual feature representations for various biomedical and clinical entity recognition tasks. This chapter investigates the use of convolutional neural network (CNN) in another information extraction subtask, namely the relation classification task. Here, we assume that all the entities of interest in the texts are already given, and we need to classify the semantic relation between the entities into one of a set of predefined categories.
Introduction
Feature-based methods use feature extraction techniques to use contextual information present in a sentence containing predefined entities to extract a vector representation [Zelenko et al., 2003, Culotta and Sorensen, 2004, Hong, 2005, Minard et al., 2011 , Qian. and Zhou, 2012, Zeng et al., 2014]. Another problem faced by these methods is that feature extraction must be tailored according to the data source. CNN has been shown to be a dominant model for solving problems in image processing and computer vision [Krizhevsky et al., 2012, Karpathy and Fei-Fei, 2014].
Model Architecture
P1: The distance to the first entity in terms of the number of words [Collobert and Weston, 2008]. This value would be zero for all words included in the first entity. The length (m−c+1) of the convolution layer output will vary based on the number of words in the sentence.
Implementation
Tasks and Datasets
Advice: The text states an opinion or recommendation regarding the simultaneous use of the two medicines, for example: “alpha blockers should not be combined with uroxatral”. Effect: The sentence states the effect of the drug interaction or the pharmacodynamic mechanism of the interaction. As a preprocessing step, we replace the texts of the entities in the I2B2 dataset with the corresponding entity type labels.
Experiment Design
Hyper-parameters
Baseline Methods for comparison
Results and Discussion
- Influence of Filter Lengths
- Class wise Performance
- Feature Ablation Study
- Comparison with Baseline
- Error Analysis
The performance of the CRC task is significantly better for the TeRP, TrAP, and PIP classes than for the TeCP and TrCP classes. When the position embedding feature is removed from CNN-RE (row 2), we observe 4%, 2.4%, and 9.3% relative reductions in the F scores obtained for the DDI, DDIC, and CRC tasks, respectively. It can be observed that our most successful CNN-RE model [3,4,5] is competitive with the state-of-the-art in DDI and DDIC tasks.
Conclusion
In the last chapter, we investigated the performance of the CNN model for various biomedical and clinical relationship classification tasks. We note that CNN can be used as an alternative to conventional feature-based methods for relationship classification tasks. In this chapter, our main goal is to improve the performance of the relation classification task by using a more powerful representation learning ability.
Introduction
Therefore, when we apply aggregation on the output of BLSTM, we can obtain features that contain information about the entire context of the entire sentence. BLSTM-RE uses maximal clustering, while ABLSTM-RE uses attentive clustering on the outputs of BLSTM to obtain fixed-length features throughout the sentence. However, as a joint BLSTM-RE and ABLSTM-RE, Joint ABLSTM-RE uses two BLSTMs, one with.
Model Architecture
BLSTM-RE Model
Here we use BLSTM instead of the convolutional neural network used in the CNN-RE model. These features are then fed into a fully connected neural network, followed by the softmax layer to produce the final classification.
ABLSTM-RE Model
Joint ABLSTM-RE Model
Training and Implementation
Results and Discussion
- Comparison with Baseline Methods
- Class Wise Performance Analysis
- Feature Analysis
- LSTM vs CNN models
- Error Analysis
- Visual Analysis
Furthermore, the McNemar test also suggests that there is no significant difference in the performance of the CNN-RE, BLSTM-RE and ABLSTM-RE models on the CRC task. We compare the class-wise performance of the proposed models with existing models on the DDIC task (Table 5.3). On the other hand, removing the position features decreases the performance of the model by about 1.1%.
Conclusion
Among the three proposed models, the Joint ABLSTM-RE model outperforms others for all three tasks. But if we compare between CNN-RE and LSTM based models, LSTM models are generally found to have a better prediction for longer sentences than the CNN-RE model.
Overview
Introduction
Intuitively, it is preferable for the source task to be as similar as possible to the target task. On the other hand, it is also possible that no bijection exists between the labels of the source and target tasks. An example of such a scenario would be if the target task required multi-class classification of DDIs, while the source task only involved binary classification.
Model Architectures
It is quite clear that this model is only applicable for those cases where bijection mapping exists between labels of the source and target tasks. We transfer the entire set of network parameters if a bijection mapping exists between the source and target label sets. In the case of T-BLSTM-Mixed and T-BLSTM-Multi, one of the source and target tasks is chosen with equal probability and then one batch of instances is chosen from the corresponding training set.
Task Definitions and Used Datasets
BankDDIC: This dataset is the same as BankDDI, with the addition of class labels for the DDI relationships [Segura-Bedmar et al., 2013]. ADE: For ADE extraction we used the dataset described in [Gurulingappa et al., 2012b, Gurulingappa et al., 2012a]. CRC: For the classification of clinical relationships, we used the I2B2/VA 20105 Clinical Information Extraction Challenge dataset [Uzuner et al., 2011].
Results and Discussion
- Performance on Same Label Set Transfer
- Performance on Disparate Label Set Transfer
- Analyzing Similarity between Source and Target Tasks
- Analyzing Size of Source Task Dataset
- Comparison with state-of-art Results
On the other hand, T-BLSTM-Multifails to exploit the full knowledge present in the training data of the source task. Variability between the different sizes of the source data training could have influenced the observed performance differences. In this section, we investigate the effect of source data size on the performance improvement of the T-BLSTM-Mixed and T-BLSTM-Multi models.
Conclusions
I Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), sider Doha, Qatar. I Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, side 1201–. I Proceedings of the 54th annual meeting of the Association for Computational Linguistics (Bind 2: Short Papers), side 207–.
Clinical entity recognition example
Statistics NER dataset
Comparison with baseline
Performance comparison of disease NER
Performance comparison of drug NER
Performance of the model and its variants
Statistics of OoV words
Lists of words and their five nearest neighbors obtained through pre-
Effects of CRE, WLL and BLSTM in the model
Analysis of learned word vectors
Statistics of relation classification datasets
Values of the different regularization parameters
Performance of CNN-RE model with different filter length
Class wise performance of the DDIC
Class wise performance of the CRC
Performance of the CNN-RE model in a feature ablation study
Performance comparison of CNN-RE with baseline methods
Effect of sentence length in TP and FN
Values of different regularization parameters used in three models
Performance comparison of our models with all baseline methods
Class wise performance of our models with all baseline methods on
Performance of Joint ABLSTM-RE model in feature ablation study
Effect of sentence length in TP and FN
Statistics of source task
Statistics of target task
Baseline results
Results of TL I
Results of TL II
Comparision with state-of-art
Single Neuron
Feed Forward Neural Network
Convolution Neural Network
Block digram of RNN
CWBLSTM Model
Char BLSTM model
CNN-RE Model
BLSTM-RE Model
Comparision of CNN and LSTM based models on basis of sentence
Visualization of attention weights
BLSTM-RE and T-BLSTM-Mixed Model
Same size dataset transfer
Different dataset size