Research Data Request

Pages in this Section

Data Overview

Need Data for Research?

Researchers have several options to query clinical data for research purposes to assist in recruitment, chart review, or complex feasibility analysis (data preparatory for research). There are two methods to access MUSC’s electronic medical data for research: 1) Self-Service, 2) Honest Broker Services.

Method 1 Self Service Tools - For De-Identified Patient Counts

Epic SlicerDicer:

Epic users may complete required MyQuest training to access SlicerDicer for research. This tool allows de-identified query access to all patients available in Epic (Ambulatory 5/2012, Emergency 11/2012, Inpatient 7/2014).

i2b2 (Informatics for Integrating Biology and the Bedside and TriNetX):

MUSC faculty and sponsored staff members may also access our i2b2 or TriNetX Query Tool. All non-faculty users must be sponsored by a MUSC faculty member; please send a completed i2b2 Sponsorship Form (Standard PDF) / (Electronic PDF) to datarequest@musc.edu. Access is generally granted within 1 to 2 business days.

Resources:

Method 2 Honest Broker Services - For Identified Data Access & More Complex De-identified Queries

In order to get the data request process started, you will need to submit a “Data Delivery Services” service request through the SPARC system. Below is a screen shot on how to locate the “Data Delivery Services” service in SPARC.

Once you select the service and click “Continue” you will be taken through several steps where you will be prompted to enter information about your study. When you have provided all of the necessary information you will be brought to a final screen where you will click “Submit to Start Services”.

Upon submitting your service request in SPARC, you will be contacted by a research data request administrator who will provide the link to a BMIC Intake REDCap survey. With the information you provide in this form, we will be able to triage your request to the appropriate analyst to schedule your Pre-Data Request Consultation.

Please do not hesitate to contact the SCTR SUCCESS center (843-792-8300, success@musc.edu) if you have any questions or would like any assistance with this process.

Need Data for a Quality Improvement Project?

The data access process is different. Determine if your project is quality improvement here.

General Questions

Please check out of the resources available to you on this page. If you still need additional help, you can submit a Self-Service Research Data & Feasibility Consultation service request at sparc.musc.edu.

The data that populates i2b2 is a limited data set. All HIPPA identifiers have been removed except for the dates of service. Any queries that return less than or equal to 25 patients will return 0 as the patient count to prevent users from analyzing small patient sets with the plugin tools.

Please email datarequest@musc.edu to report an issue or suggest an enhancement.

We are using web client version 1.7.08a. Details can be found in the Release Notes.

i2b2 stands for Informatics for Integrating Biology and the Bedside. You can find out more by visiting the website at i2b2.

All MUSC faculty and sponsored staff members have access. All non-faculty users must be sponsored by a MUSC faculty member; please send a completed Sponsorship Form to datarequest@musc.edu. Access is generally granted within 1 to 2 business days.

The data for the MUSC i2b2 project includes a subset of the data from the Research Data Warehouse.

Navigating in the Application

The concepts that appear in the Navigate Terms portion of the i2b2 Query & Analysis Tool are sometimes part of multiple medical ontologies and therefore appear multiple times.

The items that appear in the Navigate Terms portion of the i2b2 Query & Analysis Tool are built from standard medical ontologies or customized to match MUSC’s data in the Research Data Warehouse. The terms are meant to be understood by researchers familiar with our organization’s data.

The concepts that appear in the Navigate Terms are described below:

Navigation Folder Name: Allergies
Ontology: Variation of SNOMED
Additional Information: SNOMED website
Navigation Folder Name: Assessments
Ontology: Custom
Additional Information: Based on subset of MUSC observations
Navigation Folder Name: Cancer Registry
Ontology: Based on NAACCR
Additional Information: NAACCR website
Navigation Folder Name: Demographics
Ontology: MUSC Local Codes
Additional Information: Epic
Navigation Folder Name: Diagnoses
Ontology: ICD9-CM and ICD10-CM
Additional Information: ICD9 – prior to October 2015 and ICD10 – after October 2015
Navigation Folder Name: Immunizations
Ontology: CDC
Additional Information: CDC website
Navigation Folder Name: Labs
Ontology: LOINC
Additional Information: LOINC website
Navigation Folder Name: Medications
Ontology: RxNorm
Additional Information: RxNorm website
Navigation Folder Name: Problem List
Ontology: ICD9-CM
Navigation Folder Name: Procedures
Ontology: CPT, ICD9-CM, ICD10-PCS
Navigation Folder Name: Research Permissions
Ontology: MUSC Local Codes
Additional Information: Epic
Navigation Folder Name: Visit
Ontology: MUSC Local Codes
Additional Information: Epic
Navigation Folder Name: Vital Signs
Ontology: Custom

Any query you execute will be saved in the Previous Queries section. The default name is built from the first characters for the items placed in the Group boxes. A user can overwrite the default name or rename after it’s saved by right clicking. There is a limit to how many items appear in the Previous Queries section. Only the most recent queries are visible in the Previous Queries section; older queries roll off. For long term saving of queries, drag them to the personal folder. You can right click and delete a previous query. You can change the Max Number of Queries to display by pressing the ‘Set Options’ button and overwriting the default of 20.

Items in the Workplace SHARED folder are queries that are visible to other users with access to the same project. If you want to share a query with others in your group, drag the query to the SHARED folder. These queries won’t be deleted. Items in the NetID folder are personal queries that won’t be deleted. If you want to save a query for personal use, drag it to your personal folder.

The steps below serve as a reminder of the steps needed to set up a Temporal Query. Full instructions with pictures are available in the i2b2 User Guide available by pressing the Help button in the upper right section of the screen.

Change the Temporal Constraint from Treat all groups independently to Define sequence of events
A second entry appears with a drop down displaying the default selection of Population in which events occur. Select your base set of patients. (If you don’t provide a filter here, the entire patient population is used.)
Press the drop down arrow next to Population in which events occur and select Event 1. Define your Event 1.
Change drop down from Event 1 to Event 2. (optional step and you can add events if you need more than two)
Change drop down from Event # to Define order of events. A new set of columns appear allowing you to define the rules related to the events.
Press the Run Query button.
Temporal queries have a (t) in front of the name in the Previous Queries panel.

Yes, to access the user guide press Help in the upper right section of the web client.

There are two options: you can replay the previous query to see the prior results or rerun against current data. Dragging the previous query to the “Query Name” bar will replay the results of the original query. Dragging the query to the “Group 1” panel will allow you to re-run the query against the current data. If the data has changed, the result sets will be different. For details, reference the i2b2 help guide or select the “i2b2 Previous Queries” section from the i2b2 user guide in the web client.

Understanding the Data

The current date range, available domains, and patient counts are displayed in the i2b2 interface. An example is below.

Based on final billing codes.

The chronic conditions are derived monthly. We implemented a variation of the 27 Chronic Conditions algorithms from the CMS Chronic Conditions Data Warehouse (CCW). We use only ‘claims' related diagnosis codes and use inclusion/exclusion rules regarding the claims (any DX on the claim, ONLY first or second DX on the claim, ONLY principal DX on the claim). We do not have any restriction for the reference period or the number / type of claims. CMS Chronic Conditions Data Warehouse (CCW), CCW Condition Algorithms (PDF), and CCW Chronic Condition Reference List (PDF) are available with more details.

The medication information exposed in i2b2 are medication orders.

Only lab results that have an associated LOINC code are exposed in i2b2.

Medications are represented through the VA Drug Classes and RXNORM ingredients. If a medication has multiple ingredients, the medication will have a fact for each ingredient.

The min, max, and median measurement per day are available.

A monthly incremental procedure runs on the 3rd of each month for data entered or updated during the previous month. Utilize the date range displayed in the i2b2 interface to confirm the date range of data loaded.

The “Age” concept under Demographics is calculated based on the current age of the patient on the day you are running the query. The “Age at Visit” concept under Visit is the age of the patient at the visit date.

We calculate two Charlson Comorbidity scores - one with an adjustment for age and one without an adjustment for age. Below are the score adjustments for age:

Age Range	Score
<>	0
50 to 59	1
60 to 69	2
70 to 79	3
>= 80	4

This table (PDF) contains the diagnosis codes for each Charlson category under the ICD-10 and Enhanced ICD-9-CM columns in the TABLE 1. ICD-9-CM and ICD-10 Coding Algorithms for Charlson Comorbidities.

The weights for the Charlson categories are listed below:

Weight	Charlson Category
1	Cerebrovascular disease Chronic pulmonary disease Congestive heart failure Dementia Diabetes with chronic complication Diabetes without chronic complication Mild liver disease Myocardial infraction Peptic ulcer disease Peripheral vascular disease Rheumatic disease
2	Any malignancy, including lymphoma and leukemia, except malignant neoplasm of skin Hemiplegia or paraplegia Renal disease
3	Moderate or severe liver disease
6	AIDS/HIV Metastatic solid tumor

The clinical trial data comes from the RSCH record in EPIC. Not all trials are entered into Epic and therefore the Trial data included in the Research Data Warehouse and i2b2 is a subset of the total number of patients enrolled in trials on an institution level. Only patients associated to a study with a valid NCT number and with one of the below active ENROLLMENT_STATUS values at the time of the monthly incremental are included.

Consented – In Screening
Enrolled Follow-Up Only
Enrolled – Receiving Treatment AND/OR Intervention
Enrolled – Without Treatment AND/OR Intervention

The data that populates the Cancer Registry fields comes from the registry managed by MUSC’s Hollings Cancer Center (HCC). All data from patients diagnosed 5/12/2012 or later are available in i2b2.

Using Plugins

At the top of the Output Options tab for the plugin change the Formatting by selecting ‘1 row per patient, 1 column per observation set’ from the drop down. The default Formatting option is ‘1 row per observation (duplicates removed,1 column per observation set)’. The following details are available on the “Plug In Help” tab.

1 row per observation (duplicates removed, 1 column per observation set): A new row is created for each observation. All observation details (concept code, value, unit, ...) are written into one cell. One column is created for each concept that has been dragged onto the input box in step 1. Attention: Duplicate entries are removed. This format only returns a list of the different observations that were found.
1 row per observation (all, with timestamps, 1 column per observation set): Similar to the option above, but: timestamps of the observations are tabulated as well. Therefore, duplicates are not possible and nothing is removed.
1 row per observation (detailed, 1 column per observation detail): This is the most detailed option. A new row is created for each observation and all observation details (concept code, value, unit, ...) are written to dedicated columns.
1 row per patient, 1 column per observation set: A new row is created for each patient. One column is created for each concept that has been dragged onto the input box. All observations of a patient are then written into one cell (with respect to the concept column). Note: This is the only output option where the first column starting number will match the specified value of 'Starting Patient'.

Access the plugins page by clicking Analysis Tools in the top right-hand corner of the screen. Click once on the desired plugin from the Plugins field at the bottom of the screen to select it. Select the Specify Data tab to provide input, and drag and drop patient sets from the Previous Queries box and concepts from the Navigate Termsbox. Select the View Results tab to see results. Select the Plugin Help tab for more detailed information about the plugin.

Note: You must have created a patient set beforehand to use plugins. See below.

Before using plugins, you must run a query in the Find Patients tab. Select the patient set option in the Run Query pop-up box to produce a patient set from your query results.

To locate your patient set(s), expand a query folder and then the results folder under Previous Queries in the lower left-hand corner of the webclient.

The i2b2 community offers the following suggestions:

If a query would return very large result sets, the server automatically pages the result. This causes a considerable delay that sometimes will fail or hang, due to timeouts. If you encounter this problem, the query can be paged manually by setting the 'Query Subgroup Size' value on the "Specify Data" tab. This is still slower than an 'at-once' query, but faster than automatic paging and it avoids server overload. The necessary value cannot be predicted in general and strongly depends on the number of observations returned, but 20 to 50 is a good idea for beginning. Higher values result in faster processing but higher risk of server
overload. Press the HELP button for more details.
If a patient set contains thousands of patients, and you are not sure if the concepts you are specifying would result in large run time or large result set, then it is better to start with a subset of the patient set (e.g. 'Starting Patient': 1, and 'Number of Patients: 500, etc.) first, as you can always run again with subsequent subsets (e.g. 'Starting Patient': 501, and 'Number of Patients: 500, etc.). Press the HELP button for more details. Press the HELP button for more details.
Avoid checking ‘Resolve Concept/Modifier Codes’ or ‘Include Ontology Path of Concepts’ on the “Output Options” tab until the final export of data.

For more detailed information, refer to the User Guide to i2b2 Plugins.

Academic Programs

Faculty

Research & Innovation

Who We Are

Quickly find what you need

Research Data Request

Need Data for Research?

On This Page

Method 1 Self Service Tools - For De-Identified Patient Counts

Method 2 Honest Broker Services - For Identified Data Access & More Complex De-identified Queries

Need Data for a Quality Improvement Project?

General Questions

Navigating in the Application

Understanding the Data

Using Plugins

Research Data Request

Need Data for Research?

On This Page

Method 1 Self Service Tools - For De-Identified Patient Counts

Method 2 Honest Broker Services - For Identified Data Access & More Complex De-identified Queries

Need Data for a Quality Improvement Project?

General Questions

What if I would like additional help in navigating how to use i2b2 for my project or study? addremove

How is patient data protected? addremove

How/where do I report an issue or enhancement? addremove

What version of i2b2 are we using? addremove

What does i2b2 stand for and where can I get more information? addremove

Who has access to the i2b2 project? How do I gain access? addremove

Who is the scope of data for the MUSC i2b2 project? addremove

Navigating in the Application

Why are there duplicate items in the Navigate Terms hierarchy? addremove

Where do the items in Navigate Terms come from? addremove

What ontologies are utilized? addremove

What are the items in the Previous Queries section? addremove

What is the SHARED folder in the Workplace section for? What about the folder with my id? addremove

How do I perform a Temporal Query? addremove

Is there a user guide for i2b2? addremove

How do you view data from previous queries? addremove

Understanding the Data

What is the date range of data? addremove

What is the scope of the Diagnosis and Procedures data? addremove

How are the chronic condition concepts derived under the computable phenotypes category? addremove

What is the scope of the Medications data? addremove

Not all lab results seem to be available, why is this? addremove

How do I utilize the RXNORM ontology for Medications in my queries? addremove

What vitals are included in i2b2? addremove

When is i2b2 refreshed? addremove

There are several age concepts and values, which one should I use? addremove

How are the Charlson Comorbidity scores calculated? addremove

Where does the clinical trial data come from? addremove

What is the scope of the Cancer Registry data and where does it come from? addremove

Using Plugins

How can I format the Export XLS Plugin output so all of the data is on the same row for the same patient? addremove

How do I use the i2b2 plugins? addremove

How do I make a patient set and where can I locate it? addremove

Are there suggestions to avoid timeout issues while using the Export XLS Plugin? addremove

Where can I get more details on specific plugins? addremove