Translational Biomedical Informatics Center

The Translational Biomedical Informatics Center (TBIC) focuses on enabling and easing access and use of unstructured clinical data for primary and secondary uses.

Most clinical data (i.e., patient electronic health record data) is unstructured text. Reuse typically relies on manual chart abstraction, a costly and lengthy process. Innovative and more efficient approaches based on Natural Language Processing (NLP) enable or ease reuse by extracting structured and coded information from clinical text, or transforming clinical text for easier sharing and patient privacy protection.

About TBIC

Main objectives

Development, implementation, and management of Natural Language Processing (NLP) infrastructure to:
- enable and enhance clinical text secondary use/re-use
- extract structured and coded data from unstructured data sources
Creation of a recharge center to enable and enhance unstructured clinical data use; establish local, statewide, and regional collaborations for either NLP expertise addition to other projects or NLP expertise collaborative development.
Development of a NLP and text mining experts team, for collaborative efforts and consulting with MUSC, state, and regional collaborations.

Clinical Data Reuse Vision

A more efficient, scalable, and precise approach to unstructured clinical data reuse or secondary use could involve the following:

Clinical Information Extraction Improvements:
- Real-time, accurate, and generalizable extraction of structured data from narrative text
- Customization to specific data reuse needs (i.e., specific terminologies and formats)
- Interactive training, testing, and configuration workflows
- Preservation of original text and formatting keeps the highest level of details over for the long term
Patient Privacy Protection Improvements:
- Reliable de-identification and anonymization of structured and unstructured data in general, with optimal clinical data preservation

People

Current Members

Paul M. Heider, Ph.D., Assistant Professor
- ResearchGate
- Google Scholar

Past Members

Stéphane Meystre, M.D., Ph.D., FACMI
Youngjun Kim, Ph.D.

Projects and Collaborations

Patient Screening for Trial Eligibility

Insufficient patient enrollment in clinical trials remains a serious and costly problem, and is often considered the most critical issue to solve for the clinical trials community. Health care providers' lack of awareness of appropriate trials and the difficulty to correlate eligibility criteria with patient characteristics are often cited reasons. Eligibility criteria specify the characteristics of study participants and provide a checklist for screening and recruiting those participants. They are essential to every clinical research study. Computable representations of eligibility criteria can significantly accelerate electronic screening of clinical research study participants and improve research recruitment efficiency. The adoption of Electronic Health Record (EHR) systems is growing at a fast pace in the U.S., and this growth results in very large quantities of patient clinical data becoming available in electronic format. Secondary use of clinical data is essential to fulfill the potentials for effective scientific research. Our hypothesis is that an automated process based on natural language processing (NLP) can detect patients eligible for a specific clinical trial, linking the information extracted from the narrative description of clinical trial eligibility criteria to the corresponding clinical information extracted from the EHR, and alerting clinicians taking care of the patient.

System developed for the National NLP Clinical Challenges (n2c2) Task 1: Cohort Selection for Clinical Trials
Pilot focused on breast cancer trial recruitment initiated in collaboration with Hollings Cancer Center and SCTR
Pilot focused on patients with HIV and fatigue initiated in collaboration with the College of Nursing

Improvements & Automation to the Problems & Allergens List

Medical errors are recognized as the cause of numerous deaths, and even if some are difficult to avoid, many are preventable. Computerized physician order-entry systems with decision support have been proposed to reduce this risk of medication errors, but these systems rely on structured and coded information in the electronic health record (EHR). Unfortunately, a substantial proportion of the information available in the EHR is only mentioned in narrative clinical documents. Electronic lists of problems and allergies are available in most EHRs, but they require manual management by their users, to add new problems, modify existing ones, and the removal of the ones that are irrelevant. Consequently, these electronic lists are often incomplete, inaccurate, and out of date. As a solution to these problems, we are developing, implementing, and evaluating a new system to automatically extract structured and coded medical problems and allergies from clinical narrative text in the EHR of patients suffering from cancer. This not only helps Health care providers maintain complete and timely lists of problems and allergies, providing them with an efficient overview of a patient, but also helps Health care organizations attain meaningful use requirements. This project is funded by the National Cancer Institute.

In collaboration with Dr. Brandon Welch

CliniDeID: Automated Clinical Text De-Identification

Secondary use of clinical data is essential to fulfill the potentials for high quality Health care, improved Health care management, and effective clinical research. De-identification of patient data has been proposed as a solution to both facilitate secondary uses of clinical data, and protect patient data confidentiality. The majority of clinical data found in the EHR is represented as narrative text clinical notes, and de-identification of clinical text is a tedious and costly manual endeavor. To address these issues, we are developing and evaluating a new system to automatically de-identify clinical notes found in the EHR, to then improve the availability of clinical text for secondary uses, as well as ameliorate the protection of patient data confidentiality. This will improve access to richer, more detailed, and more accurate clinical data for clinical researchers. It will also ease research data sharing, and help Health care organizations protect patient data confidentiality. This project is funded by the National Institute of General Medical Sciences (NIGMS).

Built in collaboration with Clinacuity Inc., a start-up company founded and led by Dr. Stéphane Meystre

Treatment Performance & Quality Measures Assessment

The development of high quality Health care and improved Health care management relies on methods to encourage and assess adherence to evidence-based care. This assessment is typically based on quality or performance measures such as the Health care Effectiveness Data and Information Set (HEDIS) published by NCQA or CMS Quality Measures. Similarly, as part of multiple U.S. federal incentives for "meaningful" adoption and use of Electronic Health Record (EHR) systems, various performance measures have been developed and implemented.

In general, these quality or performance measures rely on clinical data that can often only be found in the unstructured part of the EHR: clinical notes. Manual chart review is therefore the most common approach to acquire these measures. To enable more scalable and sensitive approaches, NLP-based clinical information approaches are being developed, to eventually automatically compute quality and performance metrics based on the EHR content.

Software and Resources

Publicly available code is hosted under our MUSC-TBIC@GitHub.com account. Please see those pages for the latest updates and patches. Also feel free to update us on issues you have found using our software. If you like these tools, help us out by spreading the word.

Ensemble Method Framework

Ensemble methods can provide more flexible and robust integration of IE models. For clinical note de-identification, we applied a varying set of classifiers using different types of learning algorithms, including RNN, CRF, MIRA, and SVM. To regulate less accurate outputs from individual classifiers, we have created a voting ensemble that effectively generates more accurate predictions. If you're interested in using our ensemble system, you can download a copy from ensemble@GitHub.com.

ETUDE Engine

Evaluation Tool for Unstructured Data and Extractions Engine
A Python-based tool to help with scoring and evaluating text extraction performance
If you're interested in using the ETUDE engine, you can download a copy from etude-engine@GitHub.com.

Third-Party Tools & Resources

Natural Language Processing (NLP) Tools
- UIMA
- cTAKES
Ontologies and Vocabularies
- UMLS
Annotation Tools
- WebAnno

TBIC Services

The Translational Biomedical Informatics center focuses on unstructured clinical data reuse, on enabling effective and scalable use of clinical notes content for research and other secondary uses.

The TBIC combines expertise in Natural Language Processing, Computational Linguistics, Machine Learning, and Health care.

As part of our objectives, the TBIC offers consulting services with MUSC, state, and regional collaborators for the development of NLP and text mining resources to enable and enhance unstructured clinical data reuse.

Contact Us