Batch size selection for neural networks

In my previous blog I looked at the various learning rates and the effect on a neural network. Another important hyperparameter that can be tuned for a neural network is the batch size. In this blog I’ll dive into what batch size is and how you can select it.

What is batch size?

So, what is exactly batch size? When we talk about epochs that specifies the number of times that all of the training samples will make a forward and backward pass through the neural network. If the training set is very large it will not be possible to train the network on all training samples at once. With various algorithms however, it is possible to use mini-batches which contain a subset of the training samples and train the network on that subset. With batch size we specify the number of training samples (in the mini-batch) that will be in one forward and backward pass through the neural network. For example, if we have 10000 training samples and we use a batch size of 200 then the network needs 50 iterations to train on all data.

Small or large batch size?

How should I select the batch size to use? Should I choose a small or large batch size? This depends on a number of things such as:

  • How much memory is available?
  • Are the samples small or large? Do we have for example pictures of 28 * 28 pixels or 1024 * 768 pixels.
  • The setup of the neural network

You can use the following guidelines:

  • A larger batch size requires more memory. Depending on the GPU card it could run out of memory. A smaller batch size requires less memory obviously.
  • A smaller batch size trains the network faster because the weights are updated after each propagation. A smaller batch size is also less accurate in its gradient estimate and will show a more ‘noisy’ training and validation curve.
  • A larger batch size trains the network slower because there are fewer propagations and weight updates. It will however have a more accurate gradient estimate and this could make for a model with a better fit on the data.

Ideally you should try a few batch sizes and use some charts to determine what is the optimal batch size for your model and data.

The code

So let’s take a look at the Python code for analyzing the batch size. The Python script file BatchSizeSelectionNeuralNetworks.py used in this blog can be found in Github.

The code is basically the same as in my previous blog. This time however the learning rate will remain fixed and the code will loop over different batch sizes. The model will be trained based on that batch size. I’ll briefly mention the most important code changes.

I’ll create a dictionary object in which the metrics will be stored.

# Store Model metrics
history = {}

To use the default learning rate I’ll just specify the Adam optimizer without learning rate.

# Compile model
model.compile(loss='categorical_crossentropy', 
optimizer=Adam(), metrics=['accuracy'])

The code will loop over the specified batch sizes. Each model training will run for 50 epochs and the metrics will be stored in the history dictionary.

# Define all batch sizes
batch_sizes = [ 16, 64, 128, 192, 256 ]

# Loop through Batch Sizes
for batch_size in batch_sizes:
    
    # Create Model
    model = create_model()
    
    # Fit Model on new batch_size
    history[batch_size] = model.fit(x_train, y_train, 
    validation_data=(x_val, y_val), 
    epochs=50, batch_size=batch_size, verbose=2, shuffle=False)

After looping over all the batch sizes I’ll call a function to plot the desired charts and save each one to file.

# Plot Charts
plot_chart_to_file('loss', 'Train Loss', 'Train_Loss')
plot_chart_to_file('val_loss', 'Validation Loss', 'Validation_Loss')
plot_chart_to_file('acc', 'Train Accuracy', 'Train_Accuracy')
plot_chart_to_file('val_acc', 'Validation Accuracy', 'Validation_Accuracy')

The function will combine (for a specific metric) all batch sizes in one chart and save it to file.

def plot_chart_to_file(keyname, ylabel, filenamepart):
    # Create Figure and Subplots
    fig, ax = plt.subplots(dpi=300)

    # Loop through Batch Sizes and Plot
    for batch_size in batch_sizes:
        
        # Get information for batch_size
        hist = history[batch_size]

        # Plot Chart
        ax.plot(hist.history[keyname], 
        label='Batch Size ' + str(batch_size))    

    # Set Title, Labels and Legend
    plt.xlabel('Epoch')
    plt.ylabel(ylabel)
    plt.legend(loc='best', prop={'size':8})
    plt.title('Model - ' + ylabel)

    # .. and save..
    plt.savefig('Blog2_Model_' + filenamepart + '.png', 
    bbox_inches="tight")

The results

After running the code on my laptop the following charts were generated. Let’s  take a look at the results. In the chart showing the Train Accuracy we can see that the batch size of 256 gives initially the lowest accuracy. It shows a smooth progress and after about 15 – 20 epochs the model trained with it has the highest accuracy on the training set. The same situation applies for the model trained with a batch size of 192. The models trained with the batch sizes 16, 64 and 128 have a slightly higher accuracy on the first few epochs then the models trained with batch sizes 192 and 256. In the long run they have a bit lower accuracy than the 2 largest batch sizes. Also note that the progress of the chart is more ‘noisy’ for the models trained with the smaller batch sizes.
Chart for Train Accuracy
If we look at the Validation Accuracy chart we can see that after 20 epochs the highest accuracy is achieved for batch sizes 192 and 256. After 35 epochs the same accuracy is achieved for batch size 128. For batch sizes 16 and 64 the maximum accuracy is lower than for the larger batch sizes. Again we can see that the chart for the smallest 2 batch sizes is ‘noisy’.
Chart for Validation Accuracy
If we look at the chart for the Train Loss we see that the 3 largest batch sizes show a very low Train Loss after 15 to 20 epochs. The 2 smallest batch sizes show a little bit higher Train Loss.
Chart for Train Loss
And last the chart for the Validation Loss. Notice the very ‘noisy’ charts for the batch sizes 16 and 64 and also the higher validation loss. The batch size of 128 is performing slightly better then the 2 smallest ones.

The batch sizes of 192 and 256 achieve the best validation loss around the 20 epochs. Around 20 epochs those 2 batch sizes also have the best metrics on all the other charts.

Based on these 4 charts a model trained with a batch size of 256 for around 20 epochs would have the best fitting model on the training and validation data. However the batch size of 192 would also be very usable as it has almost identical performance compared to the batch size of 256.
Chart for Validation Loss

Summary

In this blog I described what batch size is and I gave some guidelines on how you can determine the actual batch size. With the python script the models were trained with various batch sizes and charts were plotted based on the model metrics. I then analyzed the charts and showed how you can use the charts to select the batch size.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s