Architecture¶
GenerativeProteomics follows a modular architecture that promotes flexibility and scalability. It is composed of seven main classes responsible for different tasks in the processing and imputation of large proteomics datasets. Among those tasks, we can highlight the data processing, the imputation of missing values, the generation of synthetic data and the metrics calculation.
Bellow, you can find a class diagram that showcases how these modules are connected and how they interact with each other.
Overview¶
This diagram illustrates the overall architecture of GenerativeProteomics, showing how the different components interact during the imputation process.
GenerativeProteomics: The main entry point that initializes all classes.
Data: Handles dataset loading and preparation, as well as the creation of the hint matrix, the mask matrix and the synthetic reference dataset.
Network: Trains the model using the attributes from the Data class.
Params: Stores hyperparameters and passes them to the network.
Metrics: Computes key metrics such as loss values for both the Discriminator and Generator.
Utils: Provides auxiliary functions like indexing, output generation, and CSV creation.
ImputationManager: Manages the selection and execution of different imputation methods.
ImputationModel: Provides a common API to the difference pre-trained models that can be used, ensuring their consistency and compatibility with the rest of the codebase.
GainDannImputationModel: Contains the logic of the GAIN-DANN imputation method.
Execution Flow¶
The GenerativeProteomics module orchestrates the imputation process.
The Data module loads the dataset, which is used by the Network.
The Network requires hyperparameters from the Params class.
The Metrics class contains evaluation metrics from the training process.
The Utils class provides helper functions for tasks like file management.
The model outputs files such as impute.csv, test_imputed.csv, and performance metrics like loss_G and loss_D.
The ImputationManager class allows users to select and run different imputation methods.
The ImputationModel class serves as a base for various imputation models, ensuring a consistent interface.
This structure ensures modularity, maintainability, and scalability, making it easier to extend GenerativeProteomics.