Organizing and mapping disparate data sources to a Central Breast Cancer Database that is used for Machine Learning, clinical care and research

It is an unfortunate reality that most discrete medical data is held in multiple isolated databases that use different fields, codes and terminology.  Even if the various data sets could be brought into one place, they would still be talking in different languages that makes consolidation difficult.  This has created the approach of a Data Lake, where the data sets are co-located.  The next step is mapping these data sets into a common language (The medical equivalent of Esperanto) so that a Central Breast Database can be created and used for research and clinical care and for training of AI models. 

Once this approach has been optimized for breast disease, it can then be extrapolated to other disease types.

sample data map