Title: An End-to-End Deep Learning Architecture for Extracting Protein-Protein Interactions Affected by Genetic Mutations
Presenter: Tung Tran, PhD candidate
Abstract: As part of the BioCreative VI Track IV we built a supervised relation extraction model capable of taking a test article and returning a list of interacting protein pairs identified by their Entrez Gene IDs. Such pairs represent proteins participating in a binary protein-protein interaction (PPI) relation where the interaction is additionally affected by a genetic mutation (PPIm). In this study, we explored a PPIm relation extraction by deploying a three-component pipeline involving deep learning-based named entity recognition and relation classification models along with a knowledge-based approach for gene normalization. We propose several recall-focused improvements to our original challenge entry which placed 2nd in the competition. On exact matching, the new system achieved test results of 37.78% micro-F1 with a precision of 38.22% and recall of 37.34% which corresponds to an improvement by approximately 3 micro-F1 points. When matching on HomoloGene IDs, we report similarly competitive test results at 46.17% micro-F1 with a precision and recall of 46.67% and 45.59%, corresponding to an improvement of more than 8 micro-F1 points over the prior best result.