Prune Neural Network with Simulated Annealing and Genetic Algorithm (Part 4: Results)

Apply Simulated Annealing and Genetic Algorithm to solve the problem of Neural Network pruning without prior assumptions of weight importance

6 min readJan 15, 2025

The experimental results from the Simulated Annealing (SA) and Genetic Algorithms (GA) were compared based on several key values: cost, accuracy (metrics), loss, sparsity, and execution time. The results for each layer in the Neural Network were analyzed and compared in detail.

import matplotlib.pyplot as plt

def visualize_pruning_results(ga_df, sa_df, ga_by_layers, sa_by_layers, figsize=(20, 15)):
    _, axs = plt.subplots(2, 2, figsize=figsize)
    markers = {
        'ga_name': 'Genetic Algorithm', 'sa_name': 'Simulated Annealing',
        'ga_cost': '#1f77b4'          , 'sa_cost': '#aec7e8', # Blue and Light blue
        'ga_loss': '#d62728'          , 'sa_loss': '#ff9896', # Red and Light red
        'ga_accuracy': '#2ca02c'      , 'sa_accuracy': '#98df8a', # Green and Light green
    }

    # 1st subplot: Cost vs Sparsity, Accuracy vs Sparsity, Loss vs Sparsity (detailed line plot)
    ax1 = axs[0][0]
    ax2 = ax1.twinx()
    ax3 = ax1.twinx()
    ax3.spines['right'].set_position(('outward', 60)) # Offset the third axis

    # Cost vs Sparsity
    ax1.plot(ga_df['sparsity'], ga_df['cost'], label=f"{markers['ga_name']} Cost", color=markers['ga_cost'], marker='o', linestyle='--')
    ax1.plot(sa_df['sparsity'], sa_df['cost'], label=f"{markers['sa_name']} Cost",  color = markers['sa_cost'])
    ax1.set_xlabel('Sparsity', fontsize=14)
    ax1.set_ylabel('Cost', color=markers['ga_cost'], fontsize=14)
    ax1.tick_params(axis='y', labelcolor=markers['ga_cost'], labelsize=14)
    ax1.tick_params(axis='x', labelsize=14)
    ax1.set_title('Cost, Accuracy, and Loss vs Sparsity', fontsize=18, fontweight='bold')
    ax1.grid(True, which='both', linestyle='--', linewidth=0.5)
    ax1.legend(loc='upper center', fontsize=14)
    ax1.annotate(
        'Initial Point', fontsize=14, # Annotations for key points
        xy = (ga_df['sparsity'].iloc[0], ga_df['cost'].iloc[0]),
        xytext = (ga_df['sparsity'].iloc[0] + 0.02, ga_df['cost'].iloc[0] + 5),
        arrowprops = dict(facecolor='black', arrowstyle='->')
    )

    # Metrics (Accuracy) vs Sparsity
    ax2.plot(ga_df['sparsity'], ga_df['metrics'], label=f"{markers['ga_name']} Accuracy", color=markers['ga_accuracy'], marker='o', linestyle='--')
    ax2.plot(sa_df['sparsity'], sa_df['metrics'], label=f"{markers['sa_name']} Accuracy", color=markers['sa_accuracy'])
    ax2.set_ylabel('Metrics (Accuracy)', color=markers['ga_accuracy'], fontsize=14)
    ax2.tick_params(axis='y', labelcolor=markers['ga_accuracy'], labelsize=14)

    # Loss vs Sparsity
    ax3.plot(ga_df['sparsity'], ga_df['loss'], label=f"{markers['ga_name']} Loss", color=markers['ga_loss'], marker='o', linestyle='--')
    ax3.plot(sa_df['sparsity'], sa_df['loss'], label=f"{markers['sa_name']} Loss", color=markers['sa_loss'])
    ax3.set_ylabel('Loss', color=markers['ga_loss'], fontsize=14)
    ax3.tick_params(axis='y', labelcolor=markers['ga_loss'], labelsize=14)

    # 2nd subplot: Final Cost per layer (grouped bar chart)
    layers = ga_by_layers['layer']
    axs[0][1].bar(layers - 0.15, ga_by_layers['cost'], label=f"{markers['ga_name']} Cost", color=markers['ga_cost'], width=0.3)
    axs[0][1].bar(layers + 0.15, sa_by_layers['cost'], label=f"{markers['sa_name']} Cost", color=markers['sa_cost'], width=0.3)
    axs[0][1].set_title('Final Cost per Layer', fontsize=18, fontweight='bold')
    axs[0][1].set_xlabel('Layer', fontsize=14)
    axs[0][1].set_ylabel('Cost', fontsize=14)
    axs[0][1].tick_params(axis='x', labelsize=14)
    axs[0][1].tick_params(axis='y', labelsize=14)
    axs[0][1].legend(loc='upper center', fontsize=14)
    axs[0][1].grid(True, which='both', linestyle='--', linewidth=0.5)

    # 3rd subplot: Final Accuracy per layer (grouped bar chart)
    axs[1][0].bar(layers - 0.15, ga_by_layers['metrics'], label=f"{markers['ga_name']} Accuracy", color=markers['ga_accuracy'], width=0.3)
    axs[1][0].bar(layers + 0.15, sa_by_layers['metrics'], label=f"{markers['sa_name']} Accuracy", color=markers['sa_accuracy'], width=0.3)
    axs[1][0].set_title('Final Accuracy per Layer', fontsize=18, fontweight='bold')
    axs[1][0].set_xlabel('Layer', fontsize=14)
    axs[1][0].set_ylabel('Metrics (Accuracy)', fontsize=14)
    axs[1][0].tick_params(axis='x', labelsize=14)
    axs[1][0].tick_params(axis='y', labelsize=14)
    axs[1][0].legend(loc='upper center', fontsize=14)
    axs[1][0].grid(True, which='both', linestyle='--', linewidth=0.5)

    # 4th subplot: Final Loss per layer (grouped bar chart)
    axs[1][1].bar(layers - 0.15, ga_by_layers['loss'], label=f"{markers['ga_name']} Loss", color=markers['ga_loss'], width=0.3)
    axs[1][1].bar(layers + 0.15, sa_by_layers['loss'], label=f"{markers['sa_name']} Loss", color=markers['sa_loss'], width=0.3)
    axs[1][1].set_title('Final Loss per Layer', fontsize=18, fontweight='bold')
    axs[1][1].set_xlabel('Layer', fontsize=14)
    axs[1][1].set_ylabel('Loss', fontsize=14)
    axs[1][1].tick_params(axis='x', labelsize=14)
    axs[1][1].tick_params(axis='y', labelsize=14)
    axs[1][1].legend(loc='upper center', fontsize=14)
    axs[1][1].grid(True, which='both', linestyle='--', linewidth=0.5)

    plt.tight_layout()
    plt.show()

import pandas as pd
sa_df = pd.read_csv('simulated_annealing.csv')
ga_df = pd.read_csv('genetic_algorithm.csv')

# Group the history by layers and get the final results for each layer (last entries for each layer)
sa_by_layers = sa_df.groupby('layer').tail(1)
ga_by_layers = ga_df.groupby('layer').tail(1)

from visualizer import visualize_pruning_results
visualize_pruning_results(ga_df, sa_df, ga_by_layers, sa_by_layers, figsize=(20, 15))

I. Layer Comparisons

# Combine the results into a single DataFrame
combined_layer_results = pd.concat([
    ga_by_layers.assign(Algorithm='Genetic Algorithm'),
    sa_by_layers.assign(Algorithm='Simulated Annealing')
])

# Reorder columns, putting 'Algorithm' first
columns_to_include = [col for col in ga_by_layers.columns if col not in ['generation', 'mutation_rate']]
combined_layer_results = combined_layer_results[['Algorithm'] + columns_to_include].rename(columns={
    'cost': 'Final Cost/Layer',
    'metrics': 'Final Accuracy/Layer',
    'loss': 'Final Loss/Layer',
    'sparsity': 'Final Sparsity/Layer',
    'layer': 'Layer'
})
combined_layer_results.sort_values(by=['Layer', 'Algorithm'])

II. Quality of Solutions

Both SA and GA demonstrated significant model compression while maintaining reasonable accuracy. SA achieves a final sparsity of 43.84% with a 9.79%-point drop in accuracy, while GA reaches 41.33% sparsity with 8.21%-point drop. To quantify this trade-off, I use the following simple metric:

SA achieves a slightly better trade-off score, indicating that it may be more effective at balancing sparsity and accuracy. However, the difference is marginal, and both techniques prove effective for the task. They maintain a reasonable balance between pruning aggressiveness (increased sparsity) and maintaining accuracy/loss.

III. Metric Comparisons

1. Convergence behavior with Cost function analysis

2. Loss on Test set and Accuracy retention

Accuracy retention is directly related to the model’s loss after pruning. A pruning technique should aim to minimize the increase in loss while achieving high sparsity.

3. Sparsity and Complexity Reduction

Sparsity is a crucial metric in model pruning, as it represents the fraction of weights that are successfully pruned. Higher sparsity indicates a more lightweight model, leading to reduced computational requirements & memory consumption.

4. Accuracy and Performance

5. Execution Time

The computational efficiency of each algorithm is an important factor when considering large-scale models or deployment in production environments.

IV. Conclusions

This study explored the efficacy of 2 local search algorithms, Simulated Annealing and Genetic Algorithm, in pruning a simple pre-trained LeNet model on the MNIST dataset, focusing on reducing model sparsity by pruning unnecessary weights while maintaining its loss. Through experiments, this pruning is highly dependent on the network architecture, and fully connected layers tend to tolerate more pruning than convolutional layers.

Both SA and GA demonstrated strong potential for reducing the number of weights in LeNet-5 model while maintaining acceptable loss and accuracy. SA achieved a sparsity of 43.84% with 87.44% accuracy, while GA reached 41.33% sparsity with 89.02% accuracy. As pruning increased, accuracy tended to decrease slightly, but both methods maintained reasonable accuracy even with higher sparsity levels.

Despite their significant time difference, both algorithms achieved comparable pruning results and exhibited the expected trade-off between sparsity and accuracy, with SA slightly edging out (trade-off scores of 0.3951 for SA vs. 0.3784 for GA). The cost functions for both methods converged, indicating their effectiveness in minimizing pruning-related penalties while maximizing model efficiency. These suggest that local search algorithms are indeed viable and effective for neural network pruning, addressing my primary research problem.

Both algorithms implemented adaptive strategies to balance exploration and exploitation throughout the pruning process. Combined with the layer-wise pruning approach, both methods proved effective, suggesting good potential for scaling to larger, more complex networks.

References

GitHub for this project: https://github.com/18520339/unstructured-local-search-pruning
Posts in this Series: Part 1 (Overview), Part 2 (Implementation), Part 3 (Experiments), Part 4 (This post).
If you want even more details, check my source.ipynb and report.pdf.