Input Module
The Input module consolidates all the tools necessary for preprocessing tabular data.
The Input module consolidates all the tools necessary for preprocessing tabular data.
We are continuously working on enhancing the MEDomicsLab platform, and we would like to inform you about the improvements that we are currently working on (i.e. not yet implemented):
Definition of Empty Cells: While we often refer to empty cells as NaN (Not A Number) values, it is important to note that empty does not necessarily mean NaN.
Display in Simple Cleaning Tool: In the Simple Cleaning tool, we currently display the percentages of non-NaN values. However, we acknowledge that this can be confusing, and we plan to improve it by showing the percentage of NaN values instead.
Cleaning Columns and Rows in Simple Cleaning Tool: When cleaning columns and rows simultaneously in the Simple Cleaning tool, the cleaning is currently done independently (as opposed to sequentially where the output of one process influence the other), and all the columns and rows displayed in red are removed. We are working on enhancing this tool. Additionally, please be aware that imputation methods are available in the .
Holdout Set Creation Tool: In the Holdout Set Creation tool, the NaN method is applied only to rows that contain NaN values in columns selected as a means to "Stratify". We plan to enhance the NaN handling method by introducing options such as mean fill, median fill, and mode fill.
Feature Reduction Tool: The Feature Reduction tool currently has only basic utilities. We are committed to improving it, for example by allowing to transfer the PCA (Principal Component Analysis) transformations through the .
We appreciate your understanding as we work towards making MEDomicsLab even more effective and user-friendly.
Content
Introduction
Merge tool
Grouping/Tagging tool
Simple Cleaning tool
Holdout Set Creation tool
Subset Creation tool
Feature Reduction tool
The Merge tool functions as a visual representation of the pandas Python library merge function (). Follow these steps to merge dataframes:
Select the merge type. For additional information about merge types, consult the pandas documentation ().
The Holdout Set Creation tool serves as a visual representation of the scikit-learn Python package's model_selection train_test_split function (). Follow these steps to create a holdout set:
If Shuffle is selected, you can also choose to Stratify the holdout set based on selected columns. Refer to the documentation for additional information ().