This page provides an overview of the Federated Learning module in MEDomicsLab, offering insights into both the application's interface and the backend package employed for conducting experiments.
The Federated Learning Module in MEDomicsLab simulates the process of federated learning and allows for training models in a decentralized manner using multiple datasets. This approach preserves privacy and enhances data security by ensuring that data never leaves its original location.
Decentralized Training: Models are trained across multiple nodes without transferring raw data.
Privacy Preservation: Utilizing techniques like differential privacy to ensure data confidentiality.
Hyperparameter Optimization: Tools to automatically tune and optimize model hyperparameters for improved performance.
Transfer learning: Allowing the user to use pre-trained models to initialize the central server to improve the model performance
The Federated Learning module in the MEDomicsLab application uses MEDfl in the backend, a standalone Python package designed for simulating federated learning.
You can also use MEDfl independently from the app to create your networks and pipelines directly with code. Below is a brief example demonstrating how to do that.
For more detailed examples, you can check the tutorials on the GitHub repository.
The interface of the MEDfl module in the MEDomicsLab application provides a user-friendly space where you can visually manage and connect multiple nodes to create your federated learning pipelines. Each node type in the interface has a specific role and attributes, allowing you to build and customize your federated learning networks seamlessly.
Below is a table explaining the role and attributes of each node:
Node | Description | input | Output |
---|---|---|---|
The Dataset Node is where you specify the master dataset for your experiment. The master dataset is used differently based on the type of network you create:
Auto Network: The master dataset is split between the different created nodes based on a specified column value.
Manual Network: The master dataset is used to validate the schema of the dataset selected for each node.
To select a master dataset, click on the "Select Dataset" button, choose the file, and specify the target of the dataset.
/
Dataset
The Network Node is responsible for creating the federated network. A new screen will appear when you click on it, displaying additional node types: the Client Node and the Server Node. You will have the option to add multiple clients and a central server that will aggregate the results.
Dataset
Network
The FL Setup Node is responsible for configuring the federated learning setup. The user only needs to specify the name and description of the setup.
Network
Flsetup
The FL Dataset Node creates the federated dataset, which generates train, test, and validation loaders from the clients' datasets. To create a federated dataset, the user must specify two parameters:
Validation fraction: The fraction of the data used for validation.
Test fraction: The fraction of the data used for testing.
Flsetupt
FL dataset
The Model Node is responsible for creating the model that initializes the federated learning process. The user has several options based on whether they activate or deactivate transfer learning:
Transfer Learning Activated: Specify a pre-trained model and additional parameters such as optimizer and learning rate.
Transfer Learning Deactivated: Choose between two options:
Use custom models provided by MEDfl, specifying parameters like the number of layers, hidden size, optimizer, and learning rate. Optionally, parameters can be filled using results from a hyperparameter optimization experiment.
Create a model from scratch using a code editor to define a custom model.
FL datatset
model
The Optimize Node is responsible for hyperparameter optimization. Users can optimize hyperparameters using the following methods:
Grid Search Optimization: A straightforward method where the user specifies a list of metrics to optimize, such as the number of layers, hidden size, and others.
Optuna Central Optimization: Utilizes Optuna to optimize parameters on the central server. Users can specify Optuna parameters like metric, direction, optimization algorithm, and intervals for each hyperparameter.
Optuna Federated Optimization: Uses Optuna for hyperparameter optimization in a federated manner. Optimization occurs during the execution of the federated pipeline, adapting parameters based on distributed data.
For more details on Optuna, you can find additional information here.
Model + Dataset
Model
The FL Strategy Node is responsible for creating the server strategy to aggregate and manage the network it contains. This includes defining:
Aggregation Algorithm: A list of algorithms from the Flower package.
fraction_fit: The fraction of clients sampled for training the model.
fraction_evaluate: The fraction of clients sampled for model evaluation (validation).
min_fit_clients: The minimum number of clients sampled for training in each round.
min_evaluate_clients: The minimum number of clients sampled for evaluation in each round.
min_available_clients: The minimum required number of available clients to initiate a federation round.
Model
fl strategy
The Train Model Node is used to define the client resources for training, specifying whether to utilize GPU or CPU resources during the training process.
flstrategy
train results
The Train Model Node is used to define the client resources for training, specifying whether to utilize GPU or CPU resources during the training process.
train results
save results
This node is used to merge two or more results files into one file
save results / none