MEDomicsLab-docs
V0
V0
  • 👋Welcome!
  • 👊Quick start
  • 👀Overview
  • 🧑‍🏫Tutorials
    • 🔵Design
      • Extraction Module
        • Image Extraction Page
        • Text Extraction Page
        • Time Series Extraction Page
        • MEDimage
      • Input Module
        • Feature Reduction Tool
        • MEDprofiles
          • MEDprofiles Viewer
      • Exploratory Module
    • 🟠Development
      • Learning Module
      • Evaluation Module
      • Federated Learning Module
        • Overview
        • Configure database
        • Create pipelines
        • Pipeline results
        • Hyperparameters optimization
        • Merge results
        • Crash tutorial
    • 🟢Deployment
      • Application Module
    • 🛠️Miscellaneous
  • 📄Testing Phase with MIMIC
    • MIMIC data access
    • Step 1: Install and Explore
    • Step 2: Extract Data
    • Step 3: Prepare ML tables
    • Step 4: Explore Data
    • Step 5: Vacations
    • Step 6: Create Model
    • Step 7: Evaluate & Apply Model
    • Step 8: Challenge
    • Wrap-Up
  • 👩‍💻Contributing
    • Our coding standards
    • How to push my modification ?
  • 🤕Troubleshooting
  • ❓FAQ
  • 🤓About us
  • Important Links
    • Official Website
    • 📔Release Notes
    • 🥲Known Issues
    • 😎Project Board
    • 🧬Physionet
  • MEDIA
    • ⚛️MEDomics
    • 👾Discord
    • 😺Github
    • 📺YouTube
  • Forms
    • 🗣️Contact us
    • 📝Report an issue
    • ‼️Join the testing phase
Powered by GitBook
On this page
  • Recommendations
  • Instructions for Step 3 - Prepare ML tables
  1. Testing Phase with MIMIC

Step 3: Prepare ML tables

Feb 12 – Feb 26 | Prepare ML tables

PreviousStep 2: Extract DataNextStep 4: Explore Data

If you completed , you have data ready for Step 3 - Prepare ML tables.

However, before proceeding to Step 3 - Prepare ML tables, we recommend that you replace your own output data from (the extracted_features folder) with the data that we prepared for you (MEDomicsLab_TestingPhase_Step3.zip). This will ensure consistency of results across all participants of the Testing Phase.

An invitation to access the MEDomicsLab_TestingPhase_Step3.zip data was sent by email.

  1. Visualize Data: Use the MEDprofiles figure to visualize the data.

  2. Define Static Time Points: Use the MEDprofiles figure to set static time points and export the data as static CSV tables.

Recommendations

Instructions for Step 3 - Prepare ML tables

Reminder: Make sure to save your datasets when updating column names by pressing the 'Save' button icon (an example is shown at 16:08 in the video above).

If you do not press the 'Save' button icon after modifying a CSV file in the app, the changes will not be applied in your workspace.

Content


We acknowledge that using Spearman correlation with the target variable to massively reduce the feature set dimension on the whole dataset is not part of best practices in machine learning.

This Spearman correlation process, if needed as a feature set reduction method, should normally be performed "on-the-fly" on the training sets of the Learning set (and ideally, the PCA process too).

Here, we decided to use Spearman correlation on the whole dataset during the Reduce extracted features process to get around some difficulties we have in handling large feature sets in downstream processes.

However, please note that we are actively working on enhancing the scalability of our application to eliminate the need of applying Spearman correlation on the whole dataset in the future.

The current Step 3 - Prepare ML tables step is divided into five parts, and involves preparing Machine Learning tables using the extracted features from of the Testing Phase as follows:

Reduce Extracted Features: Use the to reduce the large CSV files obtained from the previous step via Principal Component Analysis (PCA) and Spearman correlation.

Merge All Data: Combine the reduced extracted features with demographic embeddings into a master CSV table using the . Additionally, create MEDprofiles with the master table.

Create Learning and Holdout Sets: Use the to generate Learning and Holdout sets.

The goal of defining static time points is to simulate a longitudinal CDSS (Clinical Decision Support System) scenario using data aggregated over time. In of the Testing Phase, we will attempt to identify the point in time where we reach sufficient predictive power (the point in time when, in real-life, we could potentially intervene).

Before proceeding with Step 3 - Prepare ML tables of the MEDomicsLab Testing Phase, we recommend consulting the documentation of the .

Intro

Reduce extracted features

Merge all our data

Visualize MEDprofiles

Define static time points

Create learning and holdout sets

📄
Step 2 - Extract Data
Input Module
MEDprofiles package
Input Module
Step 5 - Create Model
Input Module
Input Module
0:00
0:50
8:24
10:52
12:01
14:13
Step 2 - Extract Data
Step 2 - Extract Data
Step 3 - Prepare ML tables