galaxy.normalize

Classes

DataChannel

str(object='') -> str

Functions

get_channel_file(file_path, output_folder, ...)

separate_channels(dataset)

Separates data by frequencies for later calculation of mean and std for normalization of each channel of frequency

normalize_asym(, 0.95), n_bins, outlier_thr)

Normalize data with asymmetrical distribution.

load_existing_normalization_values()

store_normalization_values(existing_data)

normalize(dataset)

TODO: look at data.py

Module Contents

class galaxy.normalize.DataChannel

Bases: str, enum.Enum

str(object=’’) -> str str(bytes_or_buffer[, encoding[, errors]]) -> str

Create a new string object from the given object. If encoding or errors is specified, then the object must expose a data buffer that will be decoded using the given encoding and error handler. Otherwise, returns the result of object.__str__() (if defined) or repr(object). encoding defaults to ‘utf-8’. errors defaults to ‘strict’.

WISE_W1 = 'W1'
WISE_W2 = 'W2'
galaxy.normalize.get_channel_file(file_path, output_folder, channel_idx, fits_idx)
galaxy.normalize.separate_channels(dataset)

Separates data by frequencies for later calculation of mean and std for normalization of each channel of frequency

galaxy.normalize.normalize_asym(i_data: numpy.ndarray, p: Tuple[float] = (10**-3, 0.95), n_bins: int = 500, outlier_thr: float = 10**4) numpy.ndarray

Normalize data with asymmetrical distribution.

(By fitting Gauss curve to left wing of the distribution).

Parameters:
  • i_data (np.ndarray) – Data with asymmetrical distribution.

  • p (Tuple[float]) – Probability range for quantile.

  • n_bins (int) – Number of bins for histogram.

  • outlier_thr (float) – Threshold for finding the outliers.

Return type:

np.ndarray

galaxy.normalize.load_existing_normalization_values()
galaxy.normalize.store_normalization_values(existing_data)
galaxy.normalize.normalize(dataset)

TODO: look at data.py def ddos() grabs cutouts in the following way: i-th element of table description (which is either table of train, val or test elements)

is put into folder like train as train/i.fits

IDEA: create RANDOM_BASED = “rand_based” in DataSource, all random based elems belong to this enum (consider def generate_random_sample) after function ddos everything is ready in create_dataloaders

go to descriptions/part folder, search elements with source random_based and send these pictures to normalization calculate mean, std and return to create_dataloaders