Feature selection for event extraction in biomedical text

In this paper we report our work on multiobjective optimization (MOO) based feature selection approach for event extraction in biomedical texts. Event extraction deals with the detection and classification of expressions that represent complex biological phenomenon involving genes and proteins. We perform feature selection within the framework of a robust machine learning algorithm, namely Conditional Random Field (CRF). We implement a set of diverse features that exploit lexical, shallow syntactic and contextual information. At first we develop a single objective optimization (SOO) based feature selection technique where we optimize F-measure function.

Thereafter we develop two different models of MOO based feature selection by optimizing different pairs of objective functions, i.e. recall and precision; and feature count and F-measure. We carried out experiments on the benchmark setup of BioNLP-2013 shared task. We obtain the best performance with the overall average recall, precision and F-measure values of 57.04%, 75.08% and 64.77%, respectively. Evaluation shows that the classifier can achieve good performance level when trained with an effective feature set. We also observe that MOO can indeed performs better than the SOO based approach.

Share This Post