inference.scale_utils

def normalize_list(numbers):

Normalizes a list of numbers to the range [0, 1].

Args:

  • numbers (list of numeric): List of numbers to be normalized.

Returns:

  • list of float: Normalized list of numbers.

Formula:

  • $x_i = rac{x_i - x_{min}}{x_{min} - x_{max}}$
  • where, $x_{min}$ is the smallest value of all $x_i$
  • $x_{max}$ is the largest value of all $x_i$ for all valid $i$.
def estimate_optimal_ratios_from_models( model_configs, train_seq_len, x_train, y_train, max_epochs, batch_size):

Estimate the optimal ratios of model size and number of training tokens from FLOP counts.

Args:

  • model_configs (list): List of tuples representing model configurations. Each tuple contains parameters for building the model.
  • train_seq_len (list): List of integers representing different numbers of training sequences.
  • x_train (numpy array): Input data for training.
  • y_train (numpy array): Target data for training.
  • max_epochs (int): Maximum number of epochs for training.
  • batch_size (int): Batch size for training.

Returns:

  • flops (numpy array): Array of FLOP counts for each experiment.
  • loss_history (numpy array): Array of loss histories for each experiment.
  • model_params (numpy array): Array of total model parameters for each experiment.

Example:

>>> # Define the Model configs to be tested for
    >>> vocab_size = 454+1
    >>> input_len = 256
 
>>> model_configs = [ >>> (input_len, vocab_size, 100, 1, 0, 10, 10, 1), >>> (input_len, vocab_size, 200, 2, 0, 10, 20, 1), >>> (input_len, vocab_size, 300, 1, 0, 30, 10, 1), >>> (input_len, vocab_size, 400, 1, 0, 10, 40, 1), >>> ]
>>> # Start testing the models by training them >>> model_epochs = estimate_optimal_ratios_from_models(model_configs, [1000, 2500, 5000], X[:5000], Y[:5000], 30, 128)
>>> # Preprocessing numbers for plotting >>> flops = model_epochs[0] >>> loss_curve = model_epochs[1] >>> params = model_epochs[2]
>>> flops_c = normalize_list(flops)
>>> # Plotting >>> import matplotlib.pyplot as plt
>>> fig = plt.figure(figsize=(10, 10), dpi=200) >>> for i in range(12): >>> plt.plot(loss_curve[i], c=[flops_c[i], 0, 0], label=f'Floating-point Operations/forward inference: {flops[i]}' ) >>> plt.legend() >>> plt.xlabel('Gradient Update Number #1') >>> plt.ylabel('Sparse Crossentropy Loss (with Logits)')

def estimate_optimal_ratios_from_flops( flop_list, input_len, num_heads, head_dims, num_decoders, fc_dim_factor, vocab_size, dropout_rate, x_train, y_train, trials_per_flop=2, batch_size=32):

Estimates optimal ratios of various model parameters based on FLOP count.

Args:

  • flop_list (list): List of FLOP counts to estimate optimal ratios for.
  • input_len (int): Length of the input sequence.
  • num_heads (tuple): Tuple containing the minimum and maximum values for the number of attention heads.
  • head_dims (tuple): Tuple containing the minimum and maximum values for the dimensionality of attention heads.
  • num_decoders (int): Number of decoder layers.
  • fc_dim_factor (int): Factor to determine the dimensionality of fully connected layers.
  • vocab_size (int): Size of the vocabulary.
  • dropout_rate (float): Dropout rate.
  • x_train (numpy.ndarray): Training input data.
  • y_train (numpy.ndarray): Training target data.
  • trials_per_flop (int, optional): Number of trials per FLOP count. Defaults to 2.
  • batch_size (int, optional): Batch size for training. Defaults to 32.

Warning:

Returns:

  • tuple: Tuple containing loss history, FLOP history, and number of parameters for each trial.