Informatics Seminar Recordings

FactsFerret

FactsFerret: Facilitating Clinical Research from a Multi-center Clinical Dataset

Validity and generalizability of clinical research can be hampered by availability of clinical data from multiple health centers. The Cerner HealthFacts database helps address this by aggregating de-identified clinical data from across its electronic health record installations. In this presentation, we will describe the HealthFacts database, present examples of research questions that have been addressed using the database, and introduce a tool (FactsFerret) that we have developed to ease exploration and interrogation of the data by researchers.

Link to recording:

Streaming:
https://gwu.webex.com/gwu/ldr.php?RCID=8a8edae4c458f6f8023a9349b43dd928

Download:
https://gwu.webex.com/gwu/lsr.php?RCID=a028c6d1ec2e63138ea2767ef1aa96fc


The CRISP Health Information Exchange

HIE-Supported Research: the CRISP Health Information Exchange Research Initiative Experience

In 2016, CRISP, the health information exchange (HIE) serving Maryland, DC, West Virginia, and the region, received regulatory approval as the Maryland state-designated HIE to support clinical research. Since that time, CRISP has established a basic capability for offering clinical researchers access to CRISP tools and services to support more than a dozen studies. The most common study type involves following a cohort of consented patients using two core services: the CRISP Encounter Notification Service (ENS) to receive real-time alerts when a patient has been admitted to, discharged from or transferred within one of more than 100 acute care hospitals, long-term care, or outpatient facilities in the region; and the CRISP query portal to review and download clinical documents (such as discharge summaries, surgical, radiology and encounter reports, laboratory reports, medication lists, care summaries, etc.) related to these encounters. Ross D. Martin, MD, MHA, FAMIA, Program Director of the CRISP Research Initiative, will present on CRISP experience and the policy, technical and process challenges in making HIE-mediated data available to researchers. He will also discuss plans for developing new capabilities to support researchers as they seek access to data sets that are not currently available.

Link to recording:

Streaming:
https://gwu.webex.com/gwu/ldr.php?RCID=9b52c2765f17e2fab6ac783e2ea477d4

Download:
https://gwu.webex.com/gwu/lsr.php?RCID=bdb085b70b76f230d332dc2af22b74a1


Feature Importance Distributions

On the Discovery of Feature Importance Distributions: An Overlooked Area

Detecting feature importance (predictive power) is a key problem in Machine Learning. Previous methods have been focusing on providing a single value as the estimation of the importance. However, the meaning of such value is not always obvious. Moreover, in reality a feature's importance may vary dramatically across the feature's values. A point estimation of the importance cannot capture such variations. We propose a new definition of feature importance, which directly measures a feature's predictive power. We also propose an approach to detect a high-resolution distribution of a feature's importance across the feature's values. The key novelty is a feature importance model that allows identifying significant change of importance between adjacent feature values, and a cost function that permits separating the importance of different features. Empirical results on real-world medical datasets (Breast Cancer, Parkinson's, and Drug Consumption) show that, the proposed work could help discover better knowledge, build better models, and make better decisions.

Link to recording:

Streaming:

https://gwu.webex.com/gwu/ldr.php?RCID=d17fd2fa198b64edfc1b104beb81e177

Download:

https://gwu.webex.com/gwu/lsr.php?RCID=8779f75b8ec437b179c195c1697e1aab

Butterfly Effect in Claims Data

Butterfly Effect in Claims Data: Small Changes in Design Elements, Large Impacts on Causal Inference

Cohort studies using real world evidence from claims databases have been part of medical product post-market safety assessment for over a decade. In these studies, design elements are tailored to the main inference question of whether a drug exposure causes an adverse outcome. Some design elements are universal to all data sources while others are unique to claims data (e.g., the pharmacy dispensing record stockpiling algorithm). Our study investigated whether small changes in small design elements, coming for example from different interpretations of the same published information, can impact causal inference. This study used a multi-factorial design to assess impact of co-varying multiple design elements on different stages of the estimation process, from cohort identification to risk assessment. Data source and main design elements of a test case remained fixed but some elements co-varied across different study designs. Our results show that small changes in use of the Index Date and the stockpiling algorithm impact cohort size, length of follow-up and causal estimates. Standardizing definitions of these design elements will help minimize a study’s potential bias and facilitate replication of study findings.

Link to recording: 

Streaming: 
https://gwu.webex.com/gwu/ldr.php?RCID=ee51c3a1f550a6162cd92e018bfe8a4f

Download: 

https://gwu.webex.com/gwu/lsr.php?RCID=548599af42c940ad1800f9165b25ad69


Case-Based Causality

Case-Based Causality: An Application of Artificial Intelligence to Epidemiology and Public Health

Causality in public health is a complex and controversial issue, involving epidemiological and toxicological studies and a family of methods discussed and debated for decades: the general scientific method, study design and statistical methods, and research synthesis methods such as the systematic narrative review, meta-analysis, and criteria-based methods of causation.  These have been applied to occupational, environmental, and lifestyle exposures as well as diverse outcomes such as cancer, neurological disorders, cardiovascular diseases, and psychiatric conditions.  We bring an artificial intelligence (AI) method—Case-Based Reasoning (CBR)—to bear on the issue of causality. The 5 “Rs” of Case-Based Causality (CBC) will be described and applied to current issues in epidemiology and public health.  Also, the relationship of CBC to existing methods of causal inference will be noted as well as the links between CBC and the concept of reliability.  Fundamentally, Case-Based Causality is a method for examining whether a body of evidence can be considered causal by the extent to which its characteristics are similar to bodies of evidence from known (i.e. established) causal relationships.

Link to recording: 

Streaming: 
https://gwu.webex.com/gwu/ldr.php?RCID=d3db218878f1cd90f4e221d972e71857

Download: 
https://gwu.webex.com/gwu/lsr.php?RCID=a3ccbb56ba20d583f881d467e1780820


Mining the literature for genes associated with placenta-mediated maternal diseases

Mining the literature for genes associated with placenta-mediated maternal diseases

Automated literature analysis could significantly speed up understanding of the role of the placenta and the impact of its development and functions on the health of the mother and the child. To facilitate automatic extraction of information about placenta-mediated disorders from the literature, we manually annotated genes and proteins, the associated diseases, and the functions and processes involved in the development and function of placenta in a collection of PubMed/MEDLINE abstracts. We developed three baseline approaches to finding sentences containing this information: one based on supervised machine learning (ML) and two based on distant supervision: 1) using automated detection of named entities and 2) using MeSH. We compare the performance of several well-known supervised ML algorithms and identify two approaches, Support Vector Machines (SVM) and Generalized Linear Models (GLM), which yield up to 98% recall precision and F1 score. We demonstrate that distant supervision approaches could be used at the expense of missing up to 15% of relevant documents.

Link to recording: 

Streaming: 
https://gwu.webex.com/gwu/ldr.php?RCID=b7ef4be94c9c2d69b70975f002d535b1

Download: 
https://gwu.webex.com/gwu/lsr.php?RCID=d846cb48f5ee03e860873a03217dd73e


Public Health Information Credibility in the Era of “Fake News” 

Helping the Public Evaluate Health Information Credibility in the Era of “Fake Health News”

With the emergence of new Web media platforms and the ubiquity of social media, critical evaluation of online health information has taken on a new dimension and urgency. At the same time, many established information quality evaluation guidelines address information characteristics other than the content (e.g., authority, currency) and do not address information presented via novel Web technologies. This talk will describe a research program that develops a methodological approach for analyzing diverse online health information sources. It will also present a window into the universe of non-evidence-based online health information, particularly as it pertains to the possibility of curing type 2 diabetes. The presentation will use the above evaluation criteria to describe how these sites portray complexity of type 2 diabetes, characterize healthcare establishment, use language and emotional cues, discuss medical research, and convey certainty. It will also address the potential role of technology in supporting users in the changing digital health ecosystem.

Link to recording: 

Streaming: 
https://gwu.webex.com/gwu/ldr.php?RCID=a1ea8babf0249e775dce7a3f078429ba

Download: 
https://gwu.webex.com/gwu/lsr.php?RCID=256af5d61b5070bc7f63015a576b61cb


Deep Learning Architecture for Extracting Protein-Protein Interactions

An End-to-End Deep Learning Architecture for Extracting Protein-Protein Interactions Affected by Genetic Mutations

As part of the BioCreative VI Track IV we built a supervised relation extraction model capable of taking a test article and returning a list of interacting protein pairs identified by their Entrez Gene IDs. Such pairs represent proteins participating in a binary protein-protein interaction (PPI) relation where the interaction is additionally affected by a genetic mutation (PPIm). In this study, we explored a PPIm relation extraction by deploying a three-component pipeline involving deep learning-based named entity recognition and relation classification models along with a knowledge-based approach for gene normalization. We propose several recall-focused improvements to our original challenge entry which placed 2nd in the competition. On exact matching, the new system achieved test results of 37.78% micro-F1 with a precision of 38.22% and recall of 37.34% which corresponds to an improvement by approximately 3 micro-F1 points. When matching on HomoloGene IDs, we report similarly competitive test results at 46.17% micro-F1 with a precision and recall of 46.67% and 45.59%, corresponding to an improvement of more than 8 micro-F1 points over the prior best result. 

Link to recording: 

Streaming: 

https://gwu.webex.com/gwu/ldr.php?RCID=0c2ec6daa0c891665a16f30f3a651f6e

Download: 

https://gwu.webex.com/gwu/lsr.php?RCID=97c4c13c636e27bf1203d47ebe002b2e

Characterization of Critically Ill Patients

Characterization of Critically Ill Patients: A Clinical Application of the Health Facts Data Set 

Using the Health Facts EMR data, critically ill pediatric patients that had at least one admission to the Intensive Care Unit (ICU) were characterized in terms of the number of hours they were administered with drugs usually administrated when either intubated or in mechanical ventilation. The study analyzed the vectors containing the number of hours each combination of medicines was administered to each patient during different periods of ICU admission and floor admission, using a class of Bayesian regression models with the Dirichlet-Multinomial distribution for the response and random effects to capture the inherent variability of each encounter and hospital, adjusting for demographic information. During this seminar, we will describe the process of cohort and records selection, the model and the interpretation of the parameters, and the results of the characterization. We will also explain how the model can be used for treatment comparisons for similar patients in different hospitals.

Link to recording: 

Streaming: 

https://gwu.webex.com/gwu/ldr.php?RCID=44c5abef9c8ac1e13a70902e7513a357

Download: 

https://gwu.webex.com/gwu/lsr.php?RCID=70ce253ecb4d0b757b46297cd6abae6f


Clinical Outcome Prediction through Deep Learning

Clinical Outcome Prediction through Deep Learning

Accurately predicting clinical outcomes in advance can benefit both healthcare providers and patients, though it remains a challenging task. Artificial Intelligence (AI) has recently generated much excitement due to the breakthroughs made by Deep Neural Networks (DNNs) on many tasks such as image classification and speech recognition. Inspired by those developments, there has been great interest in applying DNNs to the biomedical domain. In this seminar, I will present a DNN-based predictive modeling approach applied to two clinical use cases. DNN models are usually considered as black boxes which would hinder their acceptance by clinicians. Therefore, we also developed a novel method for explaining the predictions of our DNN models.

Link to recording: 

Streaming: https://gwu.webex.com/gwu/ldr.php?RCID=6b4f78fcca2d6e47c6f6576c3e8c613e

Download: https://gwu.webex.com/gwu/lsr.php?RCID=535725886993615bce6077a2e09a7a1f


The FABRIC environment

The FABRIC environment: Architectural Features and Big Data Analytics.

The Flexible Architecture for Building Research Informatics Collaborations (FABRIC) is an informatics platform (in development) which offers a service oriented research toolbox that investigators, clinicians, and patient advocates can use to easily access a wide array of data repositories integrated with customizable query tools. This cloud environment is able to support the rapid formation of dedicated cross-domain research teams, the sharing of raw and de-identified datasets in secured enclaves, and the access to a suite of advanced analytics tools commonly used throughout clinical translational research (CTR). This seminar will focus on some of the architectural features of FABRIC, its connection to the Colonial One HPC cluster, and the integration of the HealthFacts database.

Link to recording: 

Streaming: https://gwu.webex.com/gwu/ldr.php?RCID=660e0fb4a361e2dbe81583aeceb2d7a7

Download: https://gwu.webex.com/gwu/lsr.php?RCID=c5289e6cdc48877f01df247dd04b5704