Back to All Events

Workshop: Rapid Biomedical Knowledge Base Construction from Text

  • Stanford University, Stanford, CA (map)

Do you want to automatically identify biomarkers reported within the scientific literature that are related to a particular disease?

Do you have a large collection of text-based documents (e.g., articles, webpages, reports, catalogs) from which you want to create a database of experimentally derived parameters, like P53 concentration levels or tissue stiffness?

Do you want to analyze clinical notes to extract patient-reported functional capabilities related to a given treatment?

The Mobilize Center, an NIH Big Data to Knowledge Center of Excellence, invites you to participate in our upcoming workshop on rapidly creating biomedical knowledge bases from unstructured data. You will learn how to use a tool called Snorkel to automatically extract information from data sources, such as the scientific literature and clinical notes.

When:   November 6-7, 2018
Where:  Stanford University, Stanford, CA
Registration: The workshop is free to attend, but registration is required and space is limited. 
Travel Awards: Travel awards are available. Please visit the workshop webpage for more information. 
Deadline for Applications:  Friday, September 21st, 2018

Over 80% of the data available in the world today is currently unreadable by computers. These “dark data” are unstructured and include a wide range of invaluable information sources, from the text of scientific articles to the notes written by your doctor. Transforming these data into a form readable by machines is called knowledge base construction and is a vital process for unlocking the potential found in these resources.
Current approaches for automatically building knowledge bases require large, labeled datasets for training. These gold standard datasets are difficult to come by, particularly in biomedicine, limiting our ability to create new knowledge bases that can be analyzed.
Snorkel was created in response to this challenge. Developed in Christopher Re’s lab at Stanford University, Snorkel constructs knowledge bases from “dark data.” And unlike other approaches, which require precisely labeled data to train and build the models, Snorkel can work with just a set of user-input rules. 

On the first day, participants will learn about the Snorkel workflow through brief lectures and hands-on activities. On the second day, participants will utilize their new knowledge to apply Snorkel to a real-world problem using the scientific literature or electronic health record data.


This workshop is designed for individuals who are interested in applying state-of-the-art machine reading approaches to extracting information from the text and tables of documents. You do not need to know anything about machine reading or machine learning, but you should have some basic Python programming skills.

To learn more and apply, visit
Application deadline:  Friday, September 21st, 2018