galaxy.data

Script to get dataloaders

Attributes

datasets_collection

Generate object that are not present in any of currently included datasets

Classes

ShiftAndMirrorPadTransform

Applies random shifts to coordinates for data augmentation.

ClusterDataset

Custom PyTorch Dataset for cluster data.

DatasetsInfo

Class to manage and cache datasets for training and evaluation.

Functions

shift_and_mirror_pad(→ torch.Tensor)

Shifts the image by (shift_x, shift_y) and applies mirror-padding.

download_data(→ None)

Downloads required data files.

get_positive_class(→ pandas.DataFrame)

Combines multiple datasets to create the positive class.

get_cluster_catalog(→ astropy.coordinates.SkyCoord)

Creates a SkyCoord catalog of known clusters.

get_negative_class(→ pandas.DataFrame)

Combines datasets to create the negative class.

get_non_cluster_catalog(→ astropy.coordinates.SkyCoord)

Creates a SkyCoord catalog of non-cluster objects.

get_stars(→ astropy.coordinates.SkyCoord)

Creates a SkyCoord catalog of Gaia catalog objects.

filter_candidates(→ astropy.coordinates.SkyCoord)

Filters candidate objects based on angular distance and galactic latitude.

generate_random(→ pandas.DataFrame)

Generates random sky coordinates for candidates.

generate_random_based(→ dict[galaxy.util.DataPart, ...)

Generates random candidates based on existing datasets.

train_val_test_split(→ dict[galaxy.util.DataPart, ...)

Splits the dataset into training, validation, and test sets.

ddos(→ None)

Generates cutouts for all data splits and saves them.

download_sample(dataset)

Downloads dataset requested by the user.

create_dataloaders(→ tuple[dict[galaxy.util.DataPart, ...)

Creates datasets and dataloaders for each data part.

Module Contents

galaxy.data.shift_and_mirror_pad(image: torch.Tensor, shift_x: int, shift_y: int) torch.Tensor

Shifts the image by (shift_x, shift_y) and applies mirror-padding.

Parameters: - image (Tensor): Input tensor of shape (C, H, W) - shift_x (int): Horizontal shift (positive: right, negative: left) - shift_y (int): Vertical shift (positive: down, negative: up)

Returns: - Tensor: Augmented image tensor of shape (C, H, W)

class galaxy.data.ShiftAndMirrorPadTransform(max_shift_x: int = 20, max_shift_y: int = 20)

Applies random shifts to coordinates for data augmentation.

max_shift_x = 20
max_shift_y = 20
shift_x = 0
shift_y = 0
apply_shift(ra_deg: float, dec_deg: float) tuple[float, float]

Applies the generated shift to given coordinates.

Args:

ra_deg (float): Right ascension in degrees. dec_deg (float): Declination in degrees.

Returns:

tuple[float, float]: Shifted coordinates (RA, Dec).

__call__(image)
class galaxy.data.ClusterDataset(images_dir_path: str, description_csv_path: str, transform=None)

Bases: torch.utils.data.Dataset

Custom PyTorch Dataset for cluster data.

images_dir_path
description_df
transform = None
__len__() int

Returns the total number of samples.

__getitem__(idx: int) dict

Gets a single sample by index.

Args:

idx (int): Index of the sample.

Returns:

dict: Dictionary containing image, label, and metadata.

static _read_img(fits_path: pathlib.Path) torch.Tensor

Reads a FITS image from the given path.

Args:

fits_path (Path): Path to the FITS file.

Returns:

torch.Tensor: Tensor representation of the image.

show_img(idx: int)
galaxy.data.download_data() None

Downloads required data files.

galaxy.data.get_positive_class() pandas.DataFrame

Combines multiple datasets to create the positive class.

Returns:

pd.DataFrame: Combined positive class dataset.

galaxy.data.get_cluster_catalog() astropy.coordinates.SkyCoord

Creates a SkyCoord catalog of known clusters.

Returns:

coord.SkyCoord: Catalog of clusters as a SkyCoord object.

galaxy.data.get_negative_class() pandas.DataFrame

Combines datasets to create the negative class.

Returns:

pd.DataFrame: Combined dataset of negative class.

galaxy.data.get_non_cluster_catalog() astropy.coordinates.SkyCoord

Creates a SkyCoord catalog of non-cluster objects.

Returns:

coord.SkyCoord: Catalog of non-cluster objects as a SkyCoord object.

galaxy.data.get_stars() astropy.coordinates.SkyCoord

Creates a SkyCoord catalog of Gaia catalog objects.

Returns:

coord.SkyCoord: Catalog of stars as a SkyCoord object.

class galaxy.data.DatasetsInfo

Class to manage and cache datasets for training and evaluation.

_clusters = None
_expanded_clusters = None
_stars = None
_non_clusters = None
_test_sample = None
load_clusters() pandas.DataFrame

Loads the positive class dataset.

Returns:

pd.DataFrame: Positive class dataset.

load_test_sample() pandas.DataFrame

Loads the test sample dataset.

Returns:

pd.DataFrame: Test sample dataset.

load_stars() pandas.DataFrame

Loads the stars dataset for use in segmentation.

Returns:

pd.DataFrame: Bright stars dataset.

load_non_clusters() pandas.DataFrame

Loads the negative class dataset. Currently uses SGA and TYC2 catalogues.

Returns:

pd.DataFrame: Negative class dataset.

galaxy.data.datasets_collection

Generate object that are not present in any of currently included datasets

galaxy.data.filter_candidates(candidates: astropy.coordinates.SkyCoord, max_len: int) astropy.coordinates.SkyCoord

Filters candidate objects based on angular distance and galactic latitude.

Args:

candidates (coord.SkyCoord): SkyCoord object containing candidate objects. max_len (int): Maximum number of filtered candidates to return.

Returns:

coord.SkyCoord: Filtered SkyCoord object.

galaxy.data.generate_random(n_sim: int = 10000, max_len=2000) pandas.DataFrame

Generates random sky coordinates for candidates.

Args:

len (int, optional): Number of candidates to generate. Defaults to 7500.

Returns:

coord.SkyCoord: SkyCoord object containing generated candidates.

galaxy.data.generate_random_based(dataset, required_num: int = 100) dict[galaxy.util.DataPart, pandas.DataFrame]

Generates random candidates based on existing datasets. Basically required for normalizing data based on random fields.

Args:

required_num (int, optional): Number of candidates to generate. Defaults to 100.

Returns:

coord.SkyCoord: SkyCoord object containing generated candidates.

# ТЕХНИЧЕСКИЕ ШОКОЛАДКИ??? # /usr/local/lib/python3.10/dist-packages/astropy/coordinates/angles/core.py in _validate_angles(self, angles) # 647 f”<= 90 deg, got {angles.to(u.degree)}” # 648 ) # –> 649 raise ValueError( # 650 “Latitude angle(s) must be within -90 deg <= angle ” # 651 f”<= 90 deg, got {angles.min().to(u.degree)} <= “

ValueError: Latitude angle(s) must be within -90 deg <= angle <= 90 deg, got -100.8107 deg <= angle <= 106.2399 deg

galaxy.data.train_val_test_split() dict[galaxy.util.DataPart, pandas.DataFrame]

Splits the dataset into training, validation, and test sets.

Returns:

dict[DataPart, pd.DataFrame]: Dictionary mapping data parts to DataFrames.

galaxy.data.ddos(dataset, already_ddosed=False, parts=None) None

Generates cutouts for all data splits and saves them.

galaxy.data.download_sample(dataset)

Downloads dataset requested by the user. If the script fails to download it by link, WISE data will be ddosed (requires VPN)

galaxy.data.create_dataloaders(dataset) tuple[dict[galaxy.util.DataPart, torch.utils.data.Dataset], dict[galaxy.util.DataPart, torch.utils.data.DataLoader]]

Creates datasets and dataloaders for each data part.

Returns:
tuple[dict[DataPart, Dataset], dict[DataPart, DataLoader]]:

Dictionary of datasets and corresponding dataloaders.