How to use GenerativeProteomics¶

If your main goal is simply to just impute a general dataset, the most straightforward and simplest way to use GenerativeProteomics is to run:

python generativeproteomics.py -i /path/to/file_to_impute.csv

By running it in this manner, it will result in two separate training phases.

Evaluation run:
In this run a percentage of the values (10% by default) are concealed during the training phase and then the dataset is imputed. The RMSE (Root Mean Square-Error) is calculated with those hidden values as targets and at the end of the training phase a test_imputed.csv file will be created containing the original hidden values and the resulting imputation. This way you can have an estimation of the imputation accuracy.
Imputation run:
Afterwards, a proper training phase takes place using the entire dataset. An imputed.csv file will be created containing the imputed dataset.

However, there might be a few arguments which you may want to change. You can do this using a parameters.json file (you may find an example in GenerativeProteomics/breast/parameters.json) or you can choose them directly in the command line.

Run with a parameters.json file:

python generativeproteomics.py --parameters /path/to/parameters.json

Run with command line arguments:

python generativeproteomics.py -i /path/to/file_to_impute.csv -o imputed_name --ofolder ./results/ --it 2001

Arguments:

-i: Path to file to impute
-o: Name of imputed file
–ofolder: Path to the output folder
–it: Number of iterations to train the model
–miss: The percentage of values to be concealed during the evaluation run (from 0 to 1)
–outall: Set this argument to 1 if you want to output every metric
–override: Set this argument to 1 if you want to delete the previously created files when writing the new output
–model: Choose the model to use (None if GenerativeProteomics, otherwise provide name of the pre-trained model)

If you want to assess the efficiency of the code you may provide a reference file containing a complete version of the dataset (without missing values):

python generativeproteomics.py -i /path/to/file_to_impute.csv --ref /path/to/complete_dataset.csv

Running this way will calculate the RMSE of the imputation in relation to the complete dataset.

How to use GenerativeProteomics¶

GenerativeProteomics

Navigation

Related Topics