How does L1-Regularization create Sparsity — Mathematical and Visual Intuition

Bingi Nagesh
2 min readMay 21, 2022

--

I read at many sources (blogs/YouTube videos) that L1-Regularization creates Sparsity but none of them gave me the good Intuition of how until I saw this video. The video is behind paywall, so, I will give you the summary of the video in this blog.

Let us understand the intuition through steps

Step 1: Let us write down the generalized L1 and L2 mathematical formulation

Note 1: Here the loss can be anything and will be same for both L1 and L2 formulation.

Note 2: The left side of table is for L2-Regularization and right side is for L1-Regularization.

Step 2: Let us ignore loss as it is common for both and assume that the input is one-dimensional to make it simpler (The same works even if the input is n-dimensional).

Step 3: Let us find the gradient of the above formulation

Step 4: Update w₁ using gradient at the (j+1) iteration

lr is the learning rate.

Step 5: Intuition with and example. Let w₁ = 0.05 at jₜₕ iteration and lr is 0.01.

Step 6: Plugin the values into the equation

When compared to L2, L1 pushed the weight significantly more towards zero than L2. In few iterations, the weight of L1-Regularization will become zero while that of L2-Regularization will be around 0.045. This way the weight of L2-refularization may come very close to zero.

--

--

No responses yet