Step 4: Explore Data
Feb 26 – Mar 11 | Explore Data
Feb 26 – Mar 11 | Explore Data
The MEDomicsLab_TestingPhase_Step4.zip also includes a new_demographic_embeddings CSV file that you will need for this Step 4 - Explore Data. The rationale behind providing this new file is explained in the "Set the demographics in T1 only" section () of the video.
The current Step 4 - Explore Data step is divided into seven parts, and involves exploring the learning set we obtained from of the Testing Phase as follows:
Analyze the learning set using YData profiling: Employ the YData profiling tool from the to delve into your learning set. Record the percentages of missing values for each class across all time points.
Set demographic embeddings in T1 only: Based on the insights from part 1, eliminate demographic embeddings from all time point CSV files, including learning and holdout sets. Consolidate all demographic data into the T1 time point using the new_demographic_embeddings CSV file. Conduct this operation in the .
Remove chart events from T1: Referring to the analysis in part 1, eliminate chart events from the T1 datasets (both learning and holdout) using the .
Transform procedure events: Building on the findings from part 1, transform the procedure events columns in all time point datasets (learning and holdout sets) using the .
Analyze the learning set using D-Tale: Leverage the D-Tale tool from the to scrutinize your learning set. Explore the inter-variable correlation matrices for each time point.
Analyze the learning set using SweetViz: Utilize the SweetViz tool from the to study your learning set. Identify sets of variables exhibiting a high correlation rate, considering the observations made with D-Tale.
Remove high correlated columns from the time points datasets: In the , eliminate the variables identified as having high correlation rates from the time point datasets (learning and holdout sets), aligning with the insights gained in part 6.
Please note that the CSV files for the time points obtained from are already tagged. This is done by the when exporting the data as time points.
We encourage you not only to follow the video but also to independently utilize the for exploring the learning set. This self-directed analysis will prove valuable in Step 8 - Challenge.
Before proceeding with Step 4 - Explore Data of the MEDomicsLab Testing Phase, we recommend consulting the documentation of the and .
Please consider the warnings mentioned on the page: we are continuously working on enhancing the MEDomicsLab platform.
Intro
YData-profiling
Set the demographics in T1 only
Remove chart events from T1
Transform procedure events
D-Tale
SweetViz
Remove Highly Correlated Columns
Kindly be informed that the last part, 'Remove Highly Correlated Columns,' is optional as it can be time-consuming. We recognize that this process might be lengthy, and we are actively working to enhance the Delete Columns tool in the Input Module to expedite this procedure in the future. Rest assured, even if you choose not to perform this last part, we will provide you with our own output data from for use in .