L1 regularization is used for sparsity. This can be beneficial especially if you are dealing with big data as L1 can generate more compressed models than L2 regularization. This is basically due to as regularization parameter increases there is a bigger chance your optima is at 0.
L2 regularization punishes big number more due to squaring. Of course, L2 is more 'elegant' in smoothness manner.
You should check this webpage
A more mathematically comprehensive explanation may not be a good fit for this website, you can try other Stack Exchange websites for example