Prune Neural Network with Simulated Annealing and Genetic Algorithm (Part 1: Overview)

Apply Simulated Annealing and Genetic Algorithm to solve the problem of Neural Network pruning without prior assumptions of weight importance

Quan Dang
3 min readJan 14, 2025

This project focuses on the application and evaluation of 2 local search optimization techniques, Simulated Annealing & Genetic Algorithm, to solve the problem of Neural Network pruning. Its primary objective was to reduce model complexity and enhance computational efficiency without significantly compromising performance.

Pruning is a crucial technique in Deep Learning model optimization, as it reduces the number of weights, leading to smaller model sizes, faster inference times, and lower memory usage.

Traditional pruning techniques rely on predefined rules or assumptions about which weights to prune. This project will bypass these limitations by leveraging local search algorithms to dynamically learn the best pruning masks.

I. The problem and its context

Traditional pruning rule-based methods require prior knowledge of which weights to prune (e.g., remove weights below a threshold) (Li et al., 2020), making them less adaptive and not scalable for more complex architectures. This limits the pruning process, especially in models where the importance of weights may not follow simple patterns. Reducing the model’s size and computational demands without relying on heuristic-driven techniques requires an approach that can dynamically adapt to the network’s structure and data distribution.

Simulated Annealing (SA) and Genetic Algorithms (GA) were chosen because of their flexibility in exploring large search spaces and ability to find optimal or near-optimal solutions in non-convex spaces without relying on assumptions about weight importance. Unlike traditional methods, SA and GA allow for a dynamic search for the best pruning masks, learning from the model’s structure and performance during pruning, thus making the process more scalable and generalizable to different architectures:

  1. Simulated Annealing: A probabilistic local search method inspired by the physics of annealing in metallurgy, which explores the pruning mask space by accepting suboptimal solutions at higher temperature to escape local minima, gradually refining the solution as the temperature cools. By gradually lowering this temperature, it controls the probability of accepting worse solutions as the algorithm refines its pruning masks.
  2. Genetic Algorithm: An evolutionary algorithm simulating processes like natural selection, crossover, and mutation. It maintains a population of possible candidate pruning solutions (chromosomes) and evolves them over several generations to discover optimal pruning masks. GA is effective at exploring a larger solution space due to its population-based nature, allowing for more aggressive pruning strategies.

LeNet was used as the baseline architecture in this project. With a layer-wise pruning strategy and adaptive parameters, the goal of 2 above algorithms is to prune a LeNet-5 model trained on the MNIST dataset, which achieved a baseline accuracy of 97.23% with 61,470 weights. The challenge was to achieve maximum sparsity while retaining the model’s loss by optimizing an objective function penalizing poor performance.

The results of each method were then compared in terms of sparsity achieved, accuracy and loss retained, and execution time, providing insights into the pros & cons of each approach.

II. Major findings

Experimental results of SA and GA for each layer were compared based on cost, accuracy, loss, and sparsity.
  1. Pruning Effectiveness: SA achieved slightly higher 43.84% sparsity with 87.44% accuracy (drop 9.8% from the baseline), while GA reached 41.33% sparsity with 89.02% accuracy, demonstrating that both methods can significantly reduce model size while maintaining reasonable performance.
  2. Efficiency: SA proved to be faster than GA, requiring less than half the execution time. It completed the pruning process in 11 mins 43 secs, whereas GA required 27 mins 46 secs. This makes SA more suitable for applications where time efficiency is critical.
  3. Convergence Behaviour: SA exhibited rapid initial improvement, particularly beneficial for quick results, while GA showed more gradual, steady improvements across generations.

References

  1. Li, M., Sattar, Y., Thrampoulidis, C., & Samet Oymak. (2020). Exploring Weight Importance and Hessian Bias in Model Pruning. arXiv.Org. https://doi.org/10.48550/arxiv.2006.10903
  2. GitHub for this project: https://github.com/18520339/unstructured-local-search-pruning
  3. Posts in this Series: Part 1 (This post), Part 2 (Implementation), Part 3 (Experiments), Part 4 (Results).
  4. If you want even more details, check my source.ipynb and report.pdf.

--

--

Quan Dang
Quan Dang

Written by Quan Dang

Pursue AI with the desire to develop most advanced solutions for real-life issues, make people's lives convenient & greatly contribute to my national language.

No responses yet