Changelog
v0.2 (June 17, 2025)
Implemented Features
General Refactoring & Modularity:
- Clear separation between models, optimizers, datasets, and experiment configs.
- Modular imports for data, models, segmentation, metrics, and configs.
- Procedural code replaced by class-based logic (notably, newTrainer
class intrain.py
).Dynamic Path Management:
- Centralized path definitions viasettings
.
- Output directories organized byDataPart
enumerations.Description Files:
- Metadata for each dataset partition saved as CSV duringddos
process.Comet.ml Integration:
- Centralized experiment tracking, logging parameters, metrics, and visualizations.
- Tabular logging of training/validation metrics.Dataset & Dataloader Handling:
- Dynamic splitting into train/validate/test, explicit DR5 handling.
- Scalabledata.create_dataloaders
method.
- Ready datasets for IR/microwave data now available for download/training.
- Only IR or microwave data can be used per run; not combined in this version.Code Modularity:
- Reusable components, class-basedClusterDataset
, enums forDataPart
andDataSource
.Configuration Management:
- Centralized viasettings
module.Randomization & Reproducibility:
- Consistent random seed (settings.SEED
).Model Selection:
- Models loaded via unifiedload_model
.
- New Baseline, CNN_MLP, and YOLO classification models (YOLO still malfunctions).
- Fixed structure for all models except YOLO classification.Optimizer Selection:
- Added NAdam and RAdam; removed Adagrad, Adadelta, LBFGS.
- Modular optimizer selection viaall_optimizers
dictionary.Parameterization & Device Management:
- All script parameters now configurable via CLI.
- Automatic GPU/CPU assignment.Training Pipeline:
- Training logic encapsulated inTrainer
.
- Learning rate schedulers, dynamic scheduler step updates.
- Improved checkpoint management and naming.
- Learning rate finder integration (Trainer.find_lr
).Unified compute_all Method:
- Centralized computation of loss, predictions, and accuracy.Refined Training Workflow:
- Metrics logged per epoch to tables and external tools.
- Integrated validation and dedicatedtest
method for predictions/accuracies in DataFrame format.Predictor Class:
- New class for inference, outputs DataFrame with predictions, probabilities, and metadata.Performance Enhancements:
- Implicittorch.cuda.empty_cache()
for GPU memory.
- Dynamic schedulers for learning rate.Segmentation Features:
-segmentation.create_segmentation_plots
for post-training visualizations.Logging & Metrics:
- Enhanced tracking, JSON-serialized metrics, combined CSV for benchmarking.
-metrics.combine_metrics
for aggregation.
- New metrics: probability histograms, ROC/PR curves, confusion matrices, recall by redshift, training progress plots, metrics aggregation.
- Multi-page PDFs for consolidated visualizations.Removed Features:
- Procedural training/validation loops, redundant val_results, static filepaths, and commented-out code.
Bugs
Not all optimizers work.
YOLO classification model does not show proper results.
v0.1 (May 7, 2024)
Implemented Features
main.py
Dataloader Creation: Utilizes
data.py
to create a dataloader and splits it into train and validation loaders.Model Selection: Offers a list of easily modifiable models for training.
Optimizer Selection: Provides a choice of optimizers from a predefined list.
Training Script: Executes training with pre-selected models, number of epochs, optimizer, and its settings such as learning rate (
lr
) and momentum (mm
).Training Progress: Displays training progress and statistics during execution.
Results Saving: Saves the results of training for further analysis.
train.py
train(): Displays progress and statistics during training. Saves best model weights, checkpoints, and results.
validate(): Validates the model and provides statistics.
continue_training(): Allows resuming training from checkpoints.
data.py
Queries GAIA Star Catalog asynchronously (
read_gaia
).Reads data from ACT_DR5 and MaDCoWS catalogs, pre-downloading if necessary.
Creates positive and negative samples for training: -
createNegativeClassDR5
,create_data_dr5
.Includes data transformations: resizing, rotation, reflection, and normalization.
segmentation.py
Implements image dataset class and segmentation map creation: -
create_samples
,formSegmentationMaps
,printSegMaps
,printBigSegMap
.Predicts probabilities for images using
predict_folder
andpredict_tests
.
legacy_for_img.py
Downloads image cutouts using multithreading (
grab_cutouts
).Supports VLASS and unWISE image downloads.
metrics.py
Includes plotting functions: - ROC Curve (
plot_roc_curve
), Precision-Recall Curve (plot_precision_recall
).modelPerformance
: Calculates and displays metrics such as accuracy, precision, recall, and F1-score.
Known Issues
Not all optimizers are functional.
Incorrect loss and accuracy calculations with Adam optimizer.
segmentation.py: Paths for loaded weights need fixing.
data.py: Ensure data folder existence.
main.py: - Models are loaded simultaneously; optimize memory usage. - Some models need fixes (e.g.,
ViTL16
).
Directory Structure
.
├── data
│ ├── DATA
│ │ ├── test_dr5
│ │ ├── test_madcows
│ │ ├── train
│ │ └── val
├── models
├── notebooks
├── results
├── screenshots
├── state_dict
└── trained_models
Performance
Initial startup: ~1h 9m 43s (Internet speed-dependent).
Subsequent runs: ~6m 55s.
Usage
Run the script with the following command:
python3 main.py --model MODEL_NAME --epoch NUM_EPOCH --optimizer OPTIMIZER --lr LR --mm MOMENTUM
Supported models:
ResNet18
,AlexNet_VGG
,SpinalNet_VGG
,SpinalNet_ResNet
.
Supported optimizers:
SGD
,Adam
,RMSprop
,AdamW
,Adadelta
,DiffGrad
.
Example output for AlexNet training with SGD optimizer:
Epoch 1/10. Training AlexNet with SGD optimizer: acc=0.683, loss=0.598
Validation Loss: 0.0075, Validation Accuracy: 0.7926
...
Epoch 10/10. Training AlexNet with SGD optimizer: acc=0.884, loss=0.286
Validation Loss: 0.0057, Validation Accuracy: 0.8445
Bugs
Not all optimizers work.
Loss and accuracy calculations incorrect with Adam optimizer.
Additional feedback from users is welcome.