inference.scale_utils

def normalize_list(numbers):

Normalizes a list of numbers to the range [0, 1].

Args:

numbers (list of numeric): List of numbers to be normalized.

Returns:

list of float: Normalized list of numbers.

Formula:

$x_i = rac{x_i - x_{min}}{x_{min} - x_{max}}$
where, $x_{min}$ is the smallest value of all $x_i$
$x_{max}$ is the largest value of all $x_i$ for all valid $i$.

def estimate_optimal_ratios_from_models( model_configs, train_seq_len, x_train, y_train, max_epochs, batch_size):

Estimate the optimal ratios of model size and number of training tokens from FLOP counts.

Args:

model_configs (list): List of tuples representing model configurations. Each tuple contains parameters for building the model.
train_seq_len (list): List of integers representing different numbers of training sequences.
x_train (numpy array): Input data for training.
y_train (numpy array): Target data for training.
max_epochs (int): Maximum number of epochs for training.
batch_size (int): Batch size for training.

Returns:

flops (numpy array): Array of FLOP counts for each experiment.
loss_history (numpy array): Array of loss histories for each experiment.
model_params (numpy array): Array of total model parameters for each experiment.

Example:

>>> # Define the Model configs to be tested for
    >>> vocab_size = 454+1
    >>> input_len = 256
 

    >>> model_configs =  [
    >>>     (input_len, vocab_size, 100, 1, 0, 10, 10, 1),
    >>>     (input_len, vocab_size, 200, 2, 0, 10, 20, 1),
    >>>     (input_len, vocab_size, 300, 1, 0, 30, 10, 1),
    >>>     (input_len, vocab_size, 400, 1, 0, 10, 40, 1),
    >>> ]
 

    >>> # Start testing the models by training them
    >>> model_epochs = estimate_optimal_ratios_from_models(model_configs, [1000, 2500, 5000], X[:5000], Y[:5000], 30, 128)
 

    >>> # Preprocessing numbers for plotting
    >>> flops = model_epochs[0]
    >>> loss_curve = model_epochs[1]
    >>> params = model_epochs[2]
 

    >>> flops_c = normalize_list(flops)
 

    >>> # Plotting
    >>> import matplotlib.pyplot as plt
 

    >>> fig = plt.figure(figsize=(10, 10), dpi=200)
    >>> for i in range(12):
    >>>     plt.plot(loss_curve[i], c=[flops_c[i], 0, 0], label=f'Floating-point Operations/forward inference: {flops[i]}' )
    >>> plt.legend()
    >>> plt.xlabel('Gradient Update Number #1')
    >>> plt.ylabel('Sparse Crossentropy Loss (with Logits)')

def estimate_optimal_ratios_from_flops( flop_list, input_len, num_heads, head_dims, num_decoders, fc_dim_factor, vocab_size, dropout_rate, x_train, y_train, trials_per_flop=2, batch_size=32):

Estimates optimal ratios of various model parameters based on FLOP count.

Args:

flop_list (list): List of FLOP counts to estimate optimal ratios for.
input_len (int): Length of the input sequence.
num_heads (tuple): Tuple containing the minimum and maximum values for the number of attention heads.
head_dims (tuple): Tuple containing the minimum and maximum values for the dimensionality of attention heads.
num_decoders (int): Number of decoder layers.
fc_dim_factor (int): Factor to determine the dimensionality of fully connected layers.
vocab_size (int): Size of the vocabulary.
dropout_rate (float): Dropout rate.
x_train (numpy.ndarray): Training input data.
y_train (numpy.ndarray): Training target data.
trials_per_flop (int, optional): Number of trials per FLOP count. Defaults to 2.
batch_size (int, optional): Batch size for training. Defaults to 32.

Warning:

The estimate_optimal_ratios_from_flops is currently in the experimental phase and hasn't been tested thoroughly. It could lead to bugs!

Returns:

tuple: Tuple containing loss history, FLOP history, and number of parameters for each trial.