galaxy.data
Script to get dataloaders
Attributes
Generate object that are not present in any of currently included datasets |
Classes
Applies random shifts to coordinates for data augmentation. |
|
Custom PyTorch Dataset for cluster data. |
|
Class to manage and cache datasets for training and evaluation. |
Functions
|
Shifts the image by (shift_x, shift_y) and applies mirror-padding. |
|
Downloads required data files. |
|
Combines multiple datasets to create the positive class. |
|
Creates a SkyCoord catalog of known clusters. |
|
Combines datasets to create the negative class. |
|
Creates a SkyCoord catalog of non-cluster objects. |
|
Creates a SkyCoord catalog of Gaia catalog objects. |
|
Filters candidate objects based on angular distance and galactic latitude. |
|
Generates random sky coordinates for candidates. |
|
Generates random candidates based on existing datasets. |
|
Splits the dataset into training, validation, and test sets. |
|
Generates cutouts for all data splits and saves them. |
|
Downloads dataset requested by the user. |
|
Creates datasets and dataloaders for each data part. |
Module Contents
- galaxy.data.shift_and_mirror_pad(image: torch.Tensor, shift_x: int, shift_y: int) torch.Tensor
Shifts the image by (shift_x, shift_y) and applies mirror-padding.
Parameters: - image (Tensor): Input tensor of shape (C, H, W) - shift_x (int): Horizontal shift (positive: right, negative: left) - shift_y (int): Vertical shift (positive: down, negative: up)
Returns: - Tensor: Augmented image tensor of shape (C, H, W)
- class galaxy.data.ShiftAndMirrorPadTransform(max_shift_x: int = 20, max_shift_y: int = 20)
Applies random shifts to coordinates for data augmentation.
- max_shift_x = 20
- max_shift_y = 20
- shift_x = 0
- shift_y = 0
- apply_shift(ra_deg: float, dec_deg: float) tuple[float, float]
Applies the generated shift to given coordinates.
- Args:
ra_deg (float): Right ascension in degrees. dec_deg (float): Declination in degrees.
- Returns:
tuple[float, float]: Shifted coordinates (RA, Dec).
- __call__(image)
- class galaxy.data.ClusterDataset(images_dir_path: str, description_csv_path: str, transform=None)
Bases:
torch.utils.data.Dataset
Custom PyTorch Dataset for cluster data.
- images_dir_path
- description_df
- transform = None
- __len__() int
Returns the total number of samples.
- __getitem__(idx: int) dict
Gets a single sample by index.
- Args:
idx (int): Index of the sample.
- Returns:
dict: Dictionary containing image, label, and metadata.
- static _read_img(fits_path: pathlib.Path) torch.Tensor
Reads a FITS image from the given path.
- Args:
fits_path (Path): Path to the FITS file.
- Returns:
torch.Tensor: Tensor representation of the image.
- show_img(idx: int)
- galaxy.data.download_data() None
Downloads required data files.
- galaxy.data.get_positive_class() pandas.DataFrame
Combines multiple datasets to create the positive class.
- Returns:
pd.DataFrame: Combined positive class dataset.
- galaxy.data.get_cluster_catalog() astropy.coordinates.SkyCoord
Creates a SkyCoord catalog of known clusters.
- Returns:
coord.SkyCoord: Catalog of clusters as a SkyCoord object.
- galaxy.data.get_negative_class() pandas.DataFrame
Combines datasets to create the negative class.
- Returns:
pd.DataFrame: Combined dataset of negative class.
- galaxy.data.get_non_cluster_catalog() astropy.coordinates.SkyCoord
Creates a SkyCoord catalog of non-cluster objects.
- Returns:
coord.SkyCoord: Catalog of non-cluster objects as a SkyCoord object.
- galaxy.data.get_stars() astropy.coordinates.SkyCoord
Creates a SkyCoord catalog of Gaia catalog objects.
- Returns:
coord.SkyCoord: Catalog of stars as a SkyCoord object.
- class galaxy.data.DatasetsInfo
Class to manage and cache datasets for training and evaluation.
- _clusters = None
- _expanded_clusters = None
- _stars = None
- _non_clusters = None
- _test_sample = None
- load_clusters() pandas.DataFrame
Loads the positive class dataset.
- Returns:
pd.DataFrame: Positive class dataset.
- load_test_sample() pandas.DataFrame
Loads the test sample dataset.
- Returns:
pd.DataFrame: Test sample dataset.
- load_stars() pandas.DataFrame
Loads the stars dataset for use in segmentation.
- Returns:
pd.DataFrame: Bright stars dataset.
- load_non_clusters() pandas.DataFrame
Loads the negative class dataset. Currently uses SGA and TYC2 catalogues.
- Returns:
pd.DataFrame: Negative class dataset.
- galaxy.data.datasets_collection
Generate object that are not present in any of currently included datasets
- galaxy.data.filter_candidates(candidates: astropy.coordinates.SkyCoord, max_len: int) astropy.coordinates.SkyCoord
Filters candidate objects based on angular distance and galactic latitude.
- Args:
candidates (coord.SkyCoord): SkyCoord object containing candidate objects. max_len (int): Maximum number of filtered candidates to return.
- Returns:
coord.SkyCoord: Filtered SkyCoord object.
- galaxy.data.generate_random(n_sim: int = 10000, max_len=2000) pandas.DataFrame
Generates random sky coordinates for candidates.
- Args:
len (int, optional): Number of candidates to generate. Defaults to 7500.
- Returns:
coord.SkyCoord: SkyCoord object containing generated candidates.
- galaxy.data.generate_random_based(dataset, required_num: int = 100) dict[galaxy.util.DataPart, pandas.DataFrame]
Generates random candidates based on existing datasets. Basically required for normalizing data based on random fields.
- Args:
required_num (int, optional): Number of candidates to generate. Defaults to 100.
- Returns:
coord.SkyCoord: SkyCoord object containing generated candidates.
# ТЕХНИЧЕСКИЕ ШОКОЛАДКИ??? # /usr/local/lib/python3.10/dist-packages/astropy/coordinates/angles/core.py in _validate_angles(self, angles) # 647 f”<= 90 deg, got {angles.to(u.degree)}” # 648 ) # –> 649 raise ValueError( # 650 “Latitude angle(s) must be within -90 deg <= angle ” # 651 f”<= 90 deg, got {angles.min().to(u.degree)} <= “
ValueError: Latitude angle(s) must be within -90 deg <= angle <= 90 deg, got -100.8107 deg <= angle <= 106.2399 deg
- galaxy.data.train_val_test_split() dict[galaxy.util.DataPart, pandas.DataFrame]
Splits the dataset into training, validation, and test sets.
- Returns:
dict[DataPart, pd.DataFrame]: Dictionary mapping data parts to DataFrames.
- galaxy.data.ddos(dataset, already_ddosed=False, parts=None) None
Generates cutouts for all data splits and saves them.
- galaxy.data.download_sample(dataset)
Downloads dataset requested by the user. If the script fails to download it by link, WISE data will be ddosed (requires VPN)
- galaxy.data.create_dataloaders(dataset) tuple[dict[galaxy.util.DataPart, torch.utils.data.Dataset], dict[galaxy.util.DataPart, torch.utils.data.DataLoader]]
Creates datasets and dataloaders for each data part.
- Returns:
- tuple[dict[DataPart, Dataset], dict[DataPart, DataLoader]]:
Dictionary of datasets and corresponding dataloaders.