{"id":11,"date":"2019-05-02T19:14:35","date_gmt":"2019-05-02T14:44:35","guid":{"rendered":"http:\/\/themes.tielabs.com\/sahifa5\/?p=11"},"modified":"2021-02-06T10:13:21","modified_gmt":"2021-02-06T06:43:21","slug":"weight-initialization-in-deep-learning","status":"publish","type":"post","link":"https:\/\/shahaab-co.com\/mag\/en-articles\/weight-initialization-in-deep-learning\/","title":{"rendered":"Weight Initialization in Deep Learning"},"content":{"rendered":"<p dir=\"ltr\"><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-2.gif\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1256\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-2.gif\" alt=\"\u0648\u0632\u0646 \u062f\u0647\u06cc \u0627\u0648\u0644\u06cc\u0647 \u0648\u0632\u0646\u0647\u0627\" width=\"480\" height=\"270\" title=\"\"><\/a><\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_76 counter-hierarchy ez-toc-counter-rtl ez-toc-custom ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">\u0622\u0646\u0686\u0647 \u062f\u0631 \u0627\u06cc\u0646 \u0645\u0637\u0644\u0628 \u062e\u0648\u0627\u0647\u06cc\u0645 \u062e\u0648\u0627\u0646\u062f :<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #0044bf;color:#0044bf\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #0044bf;color:#0044bf\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 eztoc-toggle-hide-by-default' ><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/weight-initialization-in-deep-learning\/#Introduction\" >Introduction<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/weight-initialization-in-deep-learning\/#Neural_network_process\" >Neural network process<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/weight-initialization-in-deep-learning\/#%DB%B1%D9%AB_Initialize_weights_and_biases\" >\u06f1\u066b Initialize weights and biases.<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/weight-initialization-in-deep-learning\/#%DB%B2%D9%AB_Forward_propagation\" >\u06f2\u066b&nbsp;Forward propagation<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/weight-initialization-in-deep-learning\/#%DB%B3%D9%AB_Compute_loss_function\" >\u06f3\u066b&nbsp;Compute loss function<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/weight-initialization-in-deep-learning\/#%DB%B4%D9%AB_Back_propagation\" >\u06f4\u066b&nbsp;Back propagation<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/weight-initialization-in-deep-learning\/#Uniform_distribution\" >Uniform distribution:<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/weight-initialization-in-deep-learning\/#Normal_distribution\" >Normal distribution:<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/weight-initialization-in-deep-learning\/#Overview_of_MNIST_dataset\" >Overview of MNIST dataset<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/weight-initialization-in-deep-learning\/#Initializing_all_weights_to_zero\" >Initializing all weights to zero<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/weight-initialization-in-deep-learning\/#Random_initialization_of_weights\" >Random initialization of weights<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-12\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/weight-initialization-in-deep-learning\/#Xavier_Glorot_initialization_of_weights\" >Xavier Glorot initialization of weights<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-13\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/weight-initialization-in-deep-learning\/#He_initialization_of_weights\" >He initialization of&nbsp;weights<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-14\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/weight-initialization-in-deep-learning\/#How_to_choose_right_weight_initialization\" >How to choose right weight initialization&nbsp;?<\/a><\/li><\/ul><\/li><\/ul><\/nav><\/div>\n<h2 id=\"cb84\" class=\"graf graf--p graf-after--h3\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Introduction\"><\/span><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Introduction<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p id=\"5ca5\" class=\"graf graf--p graf-after--figure\" dir=\"ltr\">Building a neural network is a tedious task and upon that tuning it to get better result is more challenging. The first challenging task that comes into consideration while building a neural network is initialization of weights, if the weights are initialized correctly, then optimization will be achieved in least time, Otherwise converging to minima is impossible.<\/p>\n<p id=\"56de\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Let us have an overview of whole neural network process and the reason why initialization of weights impact\u2019s our model<\/em><\/strong><\/p>\n<ul class=\"postList\" dir=\"ltr\">\n<li id=\"45b3\" class=\"graf graf--li graf-after--p\">\n<h2><span class=\"ez-toc-section\" id=\"Neural_network_process\"><\/span><strong class=\"markup--strong markup--li-strong\"><em class=\"markup--em markup--li-em\">Neural network process<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<\/li>\n<\/ul>\n<p id=\"3b88\" class=\"graf graf--p graf-after--li\" dir=\"ltr\">Whole neural network process can be explained in 4 steps&nbsp;:<\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-3.gif\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1259\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-3.gif\" alt=\"\u0634\u0628\u06a9\u0647 \u0639\u0635\u0628\u06cc \u0686\u0646\u062f \u0644\u0627\u06cc\u0647\" width=\"800\" height=\"296\" title=\"\"><\/a><\/p>\n<h3 id=\"11fc\" class=\"graf graf--p graf-after--figure\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"%DB%B1%D9%AB_Initialize_weights_and_biases\"><\/span>\u06f1\u066b Initialize weights and biases.<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h3 id=\"5c31\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"%DB%B2%D9%AB_Forward_propagation\"><\/span>\u06f2\u066b&nbsp;<em class=\"markup--em markup--p-em\">Forward propagation<\/em><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p class=\"graf graf--p graf-after--p\" dir=\"ltr\">With the weights,inputs and bias term, we multiply the weights with the input and we will add the bias term and then we will perform summation and then we pass this to activation function. This process continues to all the neurons and finally we will get predicted y_hat. This process is called forward propagation.<\/p>\n<h3 id=\"3239\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"%DB%B3%D9%AB_Compute_loss_function\"><\/span>\u06f3\u066b&nbsp;<em class=\"markup--em markup--p-em\">Compute loss function<\/em><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p class=\"graf graf--p graf-after--p\" dir=\"ltr\">Difference between the predicted y_hat and the actual y is called loss term. It captures how far our predictions are from the actual target. Our main objective is to minimize the loss function.<\/p>\n<h3 id=\"af99\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"%DB%B4%D9%AB_Back_propagation\"><\/span>\u06f4\u066b&nbsp;<em class=\"markup--em markup--p-em\">Back propagation<\/em><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p class=\"graf graf--p graf-after--p\" dir=\"ltr\">Here, we compute the gradients and update the weights with respect to loss function . We perform the updation of weights until we get minimum loss.<\/p>\n<p id=\"04d0\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">Steps 2\u2013\u06f4 are repeated for n-iterations till we get minimized loss.<\/p>\n<p id=\"07b6\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">By seeing the above neural network process, we can easily say that, steps 2,3 and 4 functionality is same for any network i.e., we do same operations until we converge to minimum loss, only the big difference for faster convergence to minima in any neural network is right initialization of weights&nbsp;.<\/p>\n<p id=\"aeb1\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">Now, let us see the different types of initialization of weights. Before going into the topic&nbsp;,let me introduce you to some terminologies<\/p>\n<p id=\"8784\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Fan-in :<\/em><\/strong><\/p>\n<p id=\"9377\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">Fan-in is the number of inputs that are entering into the neuron.<\/p>\n<p id=\"49b2\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Fan-out :<\/em><\/strong><\/p>\n<p id=\"a2c0\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">Fan-out is number of outputs that are going from the neuron.<\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-13.jpg\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1281\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-13.jpg\" alt=\"\u0633\u0627\u062e\u062a\u0627\u0631 \u0633\u0627\u062f\u0647 \u0646\u0631\u0648\u0646 \u0645\u062d\u0627\u0633\u0628\u0627\u062a\u06cc\" width=\"497\" height=\"195\" title=\"\" srcset=\"https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-13.jpg 497w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-13-300x118.jpg 300w\" sizes=\"(max-width: 497px) 100vw, 497px\" \/><\/a><\/p>\n<p id=\"06a9\" class=\"graf graf--p graf-after--figure\" dir=\"ltr\">There are two inputs that are entering into the neuron. Hence, fan-in=2.<\/p>\n<p id=\"a41d\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">One output is going away from neuron. Hence, fan-out=1 .<\/p>\n<h3 id=\"43af\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Uniform_distribution\"><\/span><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Uniform distribution:<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-4.jpg\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1262\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-4.jpg\" alt=\"\u062a\u0648\u0632\u06cc\u0639 \u06cc\u06a9\u0646\u0648\u0627\u062e\u062a\" width=\"600\" height=\"352\" title=\"\" srcset=\"https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-4.jpg 600w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-4-300x176.jpg 300w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/a><\/p>\n<p id=\"adc0\" class=\"graf graf--p graf-after--figure\" dir=\"ltr\">Uniform distribution is a type of probability distribution in which all outcomes are equally likely i.e., each variable has the same probability that it will be outcome.<\/p>\n<h3 id=\"0ed3\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Normal_distribution\"><\/span><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Normal distribution:<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-16.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1288\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-16.png\" alt=\"\u062a\u0648\u0632\u06cc\u0639 \u0646\u0631\u0645\u0627\u0644\" width=\"375\" height=\"231\" title=\"\" srcset=\"https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-16.png 375w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-16-300x185.png 300w\" sizes=\"(max-width: 375px) 100vw, 375px\" \/><\/a><\/p>\n<p id=\"119a\" class=\"graf graf--p graf-after--figure\" dir=\"ltr\">Normal distribution is a probability distribution that is symmetric about the mean, showing that data near the mean are more frequent in occurrence than the data far from the mean.<\/p>\n<p id=\"daf3\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">Now, let us dive deep into the different initialization techniques. From here, we will go in a practical aspect i.e., Let us take MNIST dataset, and we will initialize the weights with different initialization techniques and let us see what\u2019s happening with output.<\/p>\n<h2 id=\"cf3d\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Overview_of_MNIST_dataset\"><\/span><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Overview of MNIST dataset<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p id=\"6934\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">MNIST dataset is one of the most common datasets used for image classification. This dataset contains hand written number images and we have to classify them into any one of the 10 classes(i.e., 0 &#8211; 9).<\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/mnist-digit-example.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1296\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/mnist-digit-example.png\" alt=\"\u0646\u0645\u0648\u0646\u0647 \u062a\u0635\u0627\u0648\u06cc\u0631 mnist\" width=\"800\" height=\"194\" title=\"\" srcset=\"https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/mnist-digit-example.png 800w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/mnist-digit-example-300x73.png 300w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/mnist-digit-example-768x186.png 768w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/mnist-digit-example-600x146.png 600w\" sizes=\"(max-width: 800px) 100vw, 800px\" \/><\/a><\/p>\n<p id=\"bcd9\" class=\"graf graf--p graf-after--figure\" dir=\"ltr\">For simplicity, we will consider only a 2 layer neural network i.e., 1st hidden layer with 128 neurons&nbsp;, 2nd hidden layer with 64 neurons and we will a softmax classifier to classify the outputs. Here, we will use ReLU as an activation unit. Ok&nbsp;, Lets get started.<\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-9.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1273\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-9.png\" alt=\"\u0634\u0628\u06a9\u0647 \u0639\u0645\u06cc\u0642 \u0634\u0646\u0627\u0633\u0627\u06cc\u06cc \u0627\u0631\u0642\u0627\u0645\" width=\"787\" height=\"397\" title=\"\" srcset=\"https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-9.png 787w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-9-300x151.png 300w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-9-768x387.png 768w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-9-600x303.png 600w\" sizes=\"(max-width: 787px) 100vw, 787px\" \/><\/a><\/p>\n<h3 dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Initializing_all_weights_to_zero\"><\/span><span style=\"color: #0000ff;\"><strong class=\"markup--strong markup--h3-strong\"><em class=\"markup--em markup--h3-em\">Initializing all weights to zero<\/em><\/strong><\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4 id=\"a255\" class=\"graf graf--p graf-after--h3\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Theory<\/em><\/strong><\/h4>\n<p id=\"6146\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">Weights are initialized with zero. Then, all the neurons of all the layers performs same calculation, giving same output. The derivative with respect to loss function is same for every weight. The model won\u2019t learn anything. The weight\u2019s won\u2019t get update at all. Here, we are facing vanishing gradients problem.<\/p>\n<h4 id=\"6f88\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Code for initializing all weights to zero<\/em><\/strong><\/h4>\n<pre id=\"f052\" class=\"graf graf--pre graf-after--p\" dir=\"ltr\">model = Sequential()\nmodel.add(Dense(128, activation='relu', input_shape=(input_dim,), kernel_initializer='zeros'))\nmodel.add(Dense(64, activation='relu', kernel_initializer='zeros'))\nmodel.add(Dense(output_dim, activation='softmax'))<\/pre>\n<h4 id=\"31d8\" class=\"graf graf--p graf-after--pre\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Output for initializing all weights to zero in MNIST dataset.<\/em><\/strong><\/h4>\n<pre id=\"7541\" class=\"graf graf--pre graf-after--p\" dir=\"ltr\">Epoch 1\/5 60000\/60000 [==============================] - 3s 55us\/step - loss: 2.3016 - acc: 0.1119 - val_loss: 2.3011 - val_acc: 0.1135\nEpoch 2\/5 60000\/60000 [==============================] - 3s 47us\/step - loss: 2.3013 - acc: 0.1124 - val_loss: 2.3010 - val_acc: 0.1135 \nEpoch 3\/5 60000\/60000 [==============================] - 3s 46us\/step - loss: 2.3013 - acc: 0.1124 - val_loss: 2.3010 - val_acc: 0.1135 \nEpoch 4\/5 60000\/60000 [==============================] - 3s 47us\/step - loss: 2.3013 - acc: 0.1124 - val_loss: 2.3010 - val_acc: 0.1135 \nEpoch 5\/5 60000\/60000 [==============================] - 3s 46us\/step - loss: 2.3013 - acc: 0.1124 - val_loss: 2.3010 - val_acc: 0.1135<\/pre>\n<h4 id=\"593d\" class=\"graf graf--p graf-after--pre\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Plot for output values for initializing all weights to zero<\/em><\/strong><\/h4>\n<p><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-12.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1280\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-12.png\" alt=\"\u0645\u0646\u062d\u0646\u06cc \u062e\u0637\u0627 (\u0644\u0627\u0633) \u0634\u0628\u06a9\u0647 \u0639\u0645\u06cc\u0642\" width=\"512\" height=\"355\" title=\"\" srcset=\"https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-12.png 512w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-12-300x208.png 300w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-12-110x75.png 110w\" sizes=\"(max-width: 512px) 100vw, 512px\" \/><\/a><\/p>\n<h4 id=\"1a7c\" class=\"graf graf--p graf-after--figure\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Analysis of output<\/em><\/strong><\/h4>\n<p id=\"10bc\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">Here, the train loss and test loss are not changing&nbsp;. Hence, we can easily conclude that no change in weights of neuron. From this, we can conclude that, our model is effected with vanishing gradients problem.<\/p>\n<h3 id=\"3f95\" class=\"graf graf--h3 graf-after--p\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Random_initialization_of_weights\"><\/span><span style=\"color: #0000ff;\">Random initialization of weights<\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4 id=\"ceff\" class=\"graf graf--p graf-after--h3\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Theory<\/em><\/strong><\/h4>\n<p id=\"20d9\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">Instead of initializing all the weights to zeros, here we are initializing all the values to random values. Random initialization is better than zero initialization of weights. But, in random initialization we have chance of facing two issues i.e., vanishing gradients and exploding gradients. If the weights are initialized very high, then we will be facing issue of exploding gradients. If the weights are initialized very low, then we will be facing issue of vanishing gradients.<\/p>\n<h4 id=\"aae0\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Code for random initialization of weights<\/em><\/strong><\/h4>\n<pre id=\"e9b0\" class=\"graf graf--pre graf-after--p\" dir=\"ltr\">model = Sequential()\nmodel.add(Dense(128, activation='relu', input_shape=(input_dim,), kernel_initializer='random_uniform'))\nmodel.add(Dense(64, activation='relu', kernel_initializer='random_uniform'))\nmodel.add(Dense(output_dim, activation='softmax'))<\/pre>\n<h4 id=\"2e95\" class=\"graf graf--p graf-after--pre\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Output for initialization of all weights to random in MNIST dataset<\/em><\/strong><\/h4>\n<pre id=\"de94\" class=\"graf graf--pre graf-after--p\" dir=\"ltr\">Epoch 1\/5 60000\/60000 [==============================] - 3s 55us\/step - loss: 0.3929 - acc: 0.8887 - val_loss: 0.1889 - val_acc: 0.9432 \nEpoch 2\/5 60000\/60000 [==============================] - 3s 45us\/step - loss: 0.1570 - acc: 0.9534 - val_loss: 0.1247 - val_acc: 0.9622 \nEpoch 3\/5 60000\/60000 [==============================] - 3s 53us\/step - loss: 0.1069 - acc: 0.9685 - val_loss: 0.0994 - val_acc: 0.9705 \nEpoch 4\/5 60000\/60000 [==============================] - 3s 54us\/step - loss: 0.0810 - acc: 0.9761 - val_loss: 0.0986 - val_acc: 0.9710 \nEpoch 5\/5 60000\/60000 [==============================] - 3s 54us\/step - loss: 0.0629 - acc: 0.9804 - val_loss: 0.0877 - val_acc: 0.9755<\/pre>\n<h4 id=\"2f20\" class=\"graf graf--p graf-after--pre\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Plot for outputs that their weights are randomly initialized<\/em><\/strong><\/h4>\n<figure id=\"c466\" class=\"graf graf--figure graf-after--p\" dir=\"ltr\">\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"aspectRatioPlaceholder-fill\"><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-8.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1271\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-8.png\" alt=\"\u0645\u0646\u062d\u0646\u06cc \u062e\u0637\u0627 (\u0644\u0627\u0633) \u0634\u0628\u06a9\u0647 \u0639\u0645\u06cc\u0642\" width=\"501\" height=\"344\" title=\"\" srcset=\"https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-8.png 501w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-8-300x206.png 300w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-8-110x75.png 110w\" sizes=\"(max-width: 501px) 100vw, 501px\" \/><\/a><\/div>\n<\/div>\n<\/figure>\n<h4 id=\"adba\" class=\"graf graf--p graf-after--figure\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Analysis of output<\/em><\/strong><\/h4>\n<p id=\"fce7\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">Here, the train loss and the test loss are changing much i.e., they are converging to the minimum loss value. Hence, we can clearly say that random initialization is better than zero initialization of weights. But, when we rerun the model, we will be getting different results because of random initialization of weights.<\/p>\n<h3 id=\"49bb\" class=\"graf graf--h3 graf-after--p\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Xavier_Glorot_initialization_of_weights\"><\/span><span style=\"color: #0000ff;\"><em class=\"markup--em markup--h3-em\">Xavier Glorot initialization of weights<\/em><\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p id=\"f1e5\" class=\"graf graf--p graf-after--h3\" dir=\"ltr\">This is an advanced technique in initialization of weights. There are two types of initialization in this i.e., Xavier Glorot normal initialization and Xavier Glorot uniform initialization.<\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-14.jpg\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1283\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-14.jpg\" alt=\"\u0645\u0642\u062f\u0627\u0631\u062f\u0647\u06cc \u0627\u0648\u0644\u06cc\u0647 \u0648\u0632\u0646\u0647\u0627\" width=\"800\" height=\"450\" title=\"\" srcset=\"https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-14.jpg 800w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-14-300x169.jpg 300w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-14-768x432.jpg 768w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-14-600x338.jpg 600w\" sizes=\"(max-width: 800px) 100vw, 800px\" \/><\/a><\/p>\n<h4 id=\"e3b8\" class=\"graf graf--p graf-after--figure\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">a. Xavier Glorot uniform initialization of weights<\/em><\/strong><\/h4>\n<p id=\"1ffa\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">Here the weights belong to a uniform distribution with in range of +x and -x, where x=(sqrt(6\/(fan-in+fan-out)))<\/p>\n<h4 id=\"dfa4\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">code for Xavier Glorot uniform initialization of weights<\/em><\/strong><\/h4>\n<pre id=\"0032\" class=\"graf graf--pre graf-after--p\" dir=\"ltr\">model = Sequential()\nmodel.add(Dense(128, activation='relu', input_shape=(input_dim,), kernel_initializer='glorot_uniform'))\nmodel.add(Dense(64, activation='relu', kernel_initializer='glorot_uniform'))\nmodel.add(Dense(output_dim, activation='softmax'))<\/pre>\n<h4 id=\"4f20\" class=\"graf graf--p graf-after--pre\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Output for Xavier Glorot uniform initialization of weights<\/em><\/strong><\/h4>\n<pre id=\"734a\" class=\"graf graf--pre graf-after--p\" dir=\"ltr\">Epoch 1\/5 60000\/60000 [==============================] - 4s 68us\/step - loss: 0.3317 - acc: 0.9072 - val_loss: 0.1534 - val_acc: 0.9545 \nEpoch 2\/5 60000\/60000 [==============================] - 3s 55us\/step - loss: 0.1303 - acc: 0.9614 - val_loss: 0.1124 - val_acc: 0.9679 \nEpoch 3\/5 60000\/60000 [==============================] - 3s 54us\/step - loss: 0.0889 - acc: 0.9731 - val_loss: 0.0978 - val_acc: 0.9711 \nEpoch 4\/5 60000\/60000 [==============================] - 3s 54us\/step - loss: 0.0668 - acc: 0.9795 - val_loss: 0.0863 - val_acc: 0.9735 \nEpoch 5\/5 60000\/60000 [==============================] - 3s 55us\/step - loss: 0.0529 - acc: 0.9840 - val_loss: 0.0755 - val_acc: 0.9771<\/pre>\n<h4 id=\"e207\" class=\"graf graf--p graf-after--pre\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Plot for outputs of Xavier Glorot uniform initialization of weights<\/em><\/strong><\/h4>\n<p><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-7.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter wp-image-1269 size-full\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-7.png\" alt=\"\u0645\u0646\u062d\u0646\u06cc \u062e\u0637\u0627 (\u0644\u0627\u0633) \u0634\u0628\u06a9\u0647 \u0639\u0645\u06cc\u0642\" width=\"501\" height=\"344\" title=\"\" srcset=\"https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-7.png 501w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-7-300x206.png 300w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-7-110x75.png 110w\" sizes=\"(max-width: 501px) 100vw, 501px\" \/><\/a><\/p>\n<h4 id=\"64be\" class=\"graf graf--p graf-after--figure\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Analysis of output<\/em><\/strong><\/h4>\n<p id=\"bb07\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">Here, with this Xavier Glorot uniform initialization, our model tends to perform very well. Although&nbsp;,we can run it multiple times, our output won\u2019t change.<\/p>\n<h4 id=\"b071\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">b<\/em><\/strong>.&nbsp;<strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Xavier Glorot normal initialization<\/em><\/strong><\/h4>\n<p id=\"df41\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">Here the weights belongs to a normal distribution with mean=0 and variance= sqrt(2\/(fan-in+fan-out)).<\/p>\n<h4 id=\"9578\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Code for Xavier Glorot normal initialization of weights<\/em><\/strong><\/h4>\n<pre id=\"e538\" class=\"graf graf--pre graf-after--p\" dir=\"ltr\">model = Sequential()\nmodel.add(Dense(128, activation='relu', input_shape=(input_dim,), kernel_initializer='glorot_normal'))\nmodel.add(Dense(64, activation='relu', kernel_initializer='glorot_normal'))\nmodel.add(Dense(output_dim, activation='softmax'))<\/pre>\n<h4 id=\"ef66\" class=\"graf graf--p graf-after--pre\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Output for Xavier Glorot normal initialization of weights<\/em><\/strong><\/h4>\n<pre id=\"7df6\" class=\"graf graf--pre graf-after--p\" dir=\"ltr\">Epoch 1\/5 60000\/60000 [==============================] - 4s 66us\/step - loss: 0.3296 - acc: 0.9064 - val_loss: 0.1628 - val_acc: 0.9492 \nEpoch 2\/5 60000\/60000 [==============================] - 3s 50us\/step - loss: 0.1359 - acc: 0.9597 - val_loss: 0.1119 - val_acc: 0.9658 \nEpoch 3\/5 60000\/60000 [==============================] - 3s 51us\/step - loss: 0.0945 - acc: 0.9721 - val_loss: 0.0929 - val_acc: 0.9706 \nEpoch 4\/5 60000\/60000 [==============================] - 3s 52us\/step - loss: 0.0731 - acc: 0.9776 - val_loss: 0.0804 - val_acc: 0.9741 \nEpoch 5\/5 60000\/60000 [==============================] - 3s 51us\/step - loss: 0.0576 - acc: 0.9824 - val_loss: 0.0707 - val_acc: 0.9783<\/pre>\n<h4 id=\"d4b7\" class=\"graf graf--p graf-after--pre\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Plot for outputs of Xavier Glorot normal initialization of weights<\/em><\/strong><\/h4>\n<figure id=\"641c\" class=\"graf graf--figure graf-after--p\" dir=\"ltr\">\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"aspectRatioPlaceholder-fill\"><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-17.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1290\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-17.png\" alt=\"\u0645\u0646\u062d\u0646\u06cc \u062e\u0637\u0627 (\u0644\u0627\u0633) \u0634\u0628\u06a9\u0647 \u0639\u0645\u06cc\u0642\" width=\"501\" height=\"344\" title=\"\" srcset=\"https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-17.png 501w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-17-300x206.png 300w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-17-110x75.png 110w\" sizes=\"(max-width: 501px) 100vw, 501px\" \/><\/a><canvas class=\"progressiveMedia-canvas js-progressiveMedia-canvas\" width=\"75\" height=\"50\"><\/canvas><\/div>\n<\/div>\n<\/figure>\n<h4 id=\"6110\" class=\"graf graf--p graf-after--figure\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Analysis of output<\/em><\/strong><\/h4>\n<p id=\"6a7a\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">Here, with this Xavier Glorot normal initialization, our model also tends to perform very well. Although&nbsp;,we can run it multiple times, our output won\u2019t change&nbsp;.<\/p>\n<p id=\"7bd2\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><em class=\"markup--em markup--p-em\">The weights we set here are neither too big nor two small. Hence, we won\u2019t face the problem of vanishing gradients and exploding gradients. Also, Xavier Glorot initialization helps in faster convergence to minima.<\/em><\/p>\n<h3 id=\"9a7e\" class=\"graf graf--h3 graf-after--p\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"He_initialization_of_weights\"><\/span><span style=\"color: #0000ff;\"> <strong class=\"markup--strong markup--h3-strong\"><em class=\"markup--em markup--h3-em\">He initialization of&nbsp;weights<\/em><\/strong><\/span><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p id=\"41df\" class=\"graf graf--p graf-after--h3\" dir=\"ltr\">It is pronounced as&nbsp;<em class=\"markup--em markup--p-em\">hey&nbsp;<\/em>initialization&nbsp;. This is also an advanced technique in initialization of weights. ReLU activation unit performs very well with this initialization&nbsp;.We consider only&nbsp;,number of inputs in He- initialization&nbsp;.In He-initialization also, we have two types i.e., He-normal initialization and He-uniform initialization<\/p>\n<h4 id=\"df9b\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">a.<strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">&nbsp;He- uniform initialization of weights<\/em><\/strong><\/h4>\n<p id=\"c4ae\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">Here the weights belongs to a uniform distribution within the range of +x and -x, where x=(sqrt(6\/fan-in)).<\/p>\n<h4 id=\"c4a4\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Code for He- uniform initialization of weights<\/em><\/strong><\/h4>\n<pre id=\"045d\" class=\"graf graf--pre graf-after--p\" dir=\"ltr\">model = Sequential()\nmodel.add(Dense(128, activation='relu', input_shape=(input_dim,), kernel_initializer='he_uniform'))\nmodel.add(Dense(64, activation='relu', kernel_initializer='he_uniform'))\nmodel.add(Dense(output_dim, activation='softmax'))<\/pre>\n<h4 id=\"b005\" class=\"graf graf--p graf-after--pre\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Output for He-uniform initialization of weights<\/em><\/strong><\/h4>\n<pre id=\"103e\" class=\"graf graf--pre graf-after--p\" dir=\"ltr\">Epoch 1\/5 60000\/60000 [==============================] - 4s 72us\/step - loss: 0.3252 - acc: 0.9050 - val_loss: 0.1524 - val_acc: 0.9546 \nEpoch 2\/5 60000\/60000 [==============================] - 3s 52us\/step - loss: 0.1314 - acc: 0.9611 - val_loss: 0.1104 - val_acc: 0.9671 \nEpoch 3\/5 60000\/60000 [==============================] - 3s 54us\/step - loss: 0.0928 - acc: 0.9718 - val_loss: 0.0978 - val_acc: 0.9697 \nEpoch 4\/5 60000\/60000 [==============================] - 3s 53us\/step - loss: 0.0703 - acc: 0.9786 - val_loss: 0.0890 - val_acc: 0.9740 \nEpoch 5\/5 60000\/60000 [==============================] - 3s 53us\/step - loss: 0.0546 - acc: 0.9828 - val_loss: 0.0860 - val_acc: 0.9740<\/pre>\n<h4 id=\"c10d\" class=\"graf graf--p graf-after--pre\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Plot for He-uniform initialization of weights<\/em><\/strong><\/h4>\n<p><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-4.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1263\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-4.png\" alt=\"\u0645\u0646\u062d\u0646\u06cc \u062e\u0637\u0627 (\u0644\u0627\u0633) \u0634\u0628\u06a9\u0647 \u0639\u0645\u06cc\u0642\" width=\"501\" height=\"344\" title=\"\" srcset=\"https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-4.png 501w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-4-300x206.png 300w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-4-110x75.png 110w\" sizes=\"(max-width: 501px) 100vw, 501px\" \/><\/a><\/p>\n<h4 id=\"e0d2\" class=\"graf graf--p graf-after--figure\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Analysis of output<\/em><\/strong><\/h4>\n<p id=\"c0c1\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">Here, in He-uniform initialization of weights we are only using the number of inputs. But, only with number of inputs, our model is performing quite descent with the He-uniform initialization of weights.<\/p>\n<h4 id=\"1b7a\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">b.<\/em><\/strong>&nbsp;<strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">He- normal initialization of weights<\/em><\/strong><\/h4>\n<p id=\"290a\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">Here the weights belongs to a normal distribution with mean=0 and variance= sqrt(2\/(fan-in)).<\/p>\n<h4 id=\"2c48\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Code for He- normal initialization of weights<\/em><\/strong><\/h4>\n<pre id=\"3228\" class=\"graf graf--pre graf-after--p\" dir=\"ltr\">model = Sequential()\nmodel.add(Dense(128, activation='relu', input_shape=(input_dim,), kernel_initializer='he_normal'))\nmodel.add(Dense(64, activation='relu', kernel_initializer='he_normal'))\nmodel.add(Dense(output_dim, activation='softmax'))<\/pre>\n<h4 id=\"caa7\" class=\"graf graf--p graf-after--pre\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Output for He-normal initialization of weights<\/em><\/strong><\/h4>\n<pre id=\"4c7a\" class=\"graf graf--pre graf-after--p\" dir=\"ltr\">Epoch 1\/5 60000\/60000 [==============================] - 4s 61us\/step - loss: 0.3163 - acc: 0.9087 - val_loss: 0.1596 - val_acc: 0.9508 \nEpoch 2\/5 60000\/60000 [==============================] - 3s 45us\/step - loss: 0.1319 - acc: 0.9610 - val_loss: 0.1163 - val_acc: 0.9625 \nEpoch 3\/5 60000\/60000 [==============================] - 3s 44us\/step - loss: 0.0915 - acc: 0.9725 - val_loss: 0.0897 - val_acc: 0.9727 \nEpoch 4\/5 60000\/60000 [==============================] - 3s 45us\/step - loss: 0.0693 - acc: 0.9795 - val_loss: 0.0878 - val_acc: 0.9735 \nEpoch 5\/5 60000\/60000 [==============================] - 3s 44us\/step - loss: 0.0537 - acc: 0.9836 - val_loss: 0.0764 - val_acc: 0.9769<\/pre>\n<h4 id=\"e3ff\" class=\"graf graf--p graf-after--pre\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Plot for outputs of He-normal initialization of weights<\/em><\/strong><\/h4>\n<p><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-10.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1275\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-10.png\" alt=\"\u0645\u0646\u062d\u0646\u06cc \u062e\u0637\u0627 (\u0644\u0627\u0633) \u0634\u0628\u06a9\u0647 \u0639\u0645\u06cc\u0642\" width=\"501\" height=\"344\" title=\"\" srcset=\"https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-10.png 501w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-10-300x206.png 300w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-10-110x75.png 110w\" sizes=\"(max-width: 501px) 100vw, 501px\" \/><\/a><\/p>\n<figure id=\"5335\" class=\"graf graf--figure graf-after--p\" dir=\"ltr\">\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"progressiveMedia js-progressiveMedia graf-image is-canvasLoaded is-imageLoaded\" data-image-id=\"1*h5gTUsZkzXa84hn4ZObleQ.png\" data-width=\"501\" data-height=\"344\" data-scroll=\"native\"><\/div>\n<\/div>\n<\/figure>\n<h4 id=\"5a97\" class=\"graf graf--p graf-after--figure\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Analysis of output<\/em><\/strong><\/h4>\n<p id=\"3e1e\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">Here, in He-normal initialization of weights we are only using the number of inputs. But, only with number of inputs, our model is performing well.<\/p>\n<p id=\"a141\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><em class=\"markup--em markup--p-em\">In He- initialization also,we set weights neither too big nor two small. Hence, we won\u2019t face the problem of vanishing gradients and exploding gradients. Also, this initialization helps in faster convergence to minima.<\/em><\/p>\n<h3 id=\"ebbb\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"How_to_choose_right_weight_initialization\"><\/span><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">How to choose right weight initialization&nbsp;?<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p id=\"6d26\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">As their is no strong theory for choosing right weight initialization, we just have some rule of thumb methods i.e.,<\/p>\n<ul class=\"postList\" dir=\"ltr\">\n<li style=\"list-style-type: none;\">\n<ul class=\"postList\" dir=\"ltr\">\n<li id=\"18f7\" class=\"graf graf--li graf-after--p\">When we have sigmoid activation function, it is better to use Xavier Glorot initialization of weights.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<ul class=\"postList\" dir=\"ltr\">\n<li id=\"5584\" class=\"graf graf--li graf-after--li\">When we have ReLU activation function, it is better to use He-initialization of weights.<\/li>\n<\/ul>\n<p id=\"7874\" class=\"graf graf--p graf-after--li\" dir=\"ltr\">Mostly, Convolutional neural network will use ReLU activation function and it use\u2019s he-initialization.<\/p>\n<p dir=\"ltr\"><strong>Read More :<\/strong><\/p>\n<ul dir=\"ltr\">\n<li><a class=\"LinkSuggestion__Link-sc-1gewdgc-4 evyocv\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/activation-functions-in-deep-learning\/\" target=\"_blank\" rel=\"noopener noreferrer\">Activation Functions in Deep Learning<\/a><\/li>\n<li><a class=\"LinkSuggestion__Link-sc-1gewdgc-4 evyocv\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/computer-vision-tutorial-implementing-mask-r-cnn-for-image-segmentation-with-python-code\/\" target=\"_blank\" rel=\"noopener noreferrer\">Computer Vision Tutorial: Implementing Mask R-CNN for Image Segmentation + Python Code<\/a><\/li>\n<li class=\"yoast-link-suggestion__wrapper LinkSuggestion__LinkSuggestionWrapper-sc-1gewdgc-0 iQWMtS\">\n<div class=\"yoast-link-suggestion__container LinkSuggestion__LinkContainer-sc-1gewdgc-3 gazbcR\"><a class=\"LinkSuggestion__Link-sc-1gewdgc-4 evyocv\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/instance-segmentation-with-mask-r-cnn-and-tensorflow\/\" target=\"_blank\" rel=\"noopener noreferrer\">Splash of Color: Instance Segmentation with Mask R-CNN and TensorFlow<\/a><\/div>\n<\/li>\n<\/ul>\n\n\n<div class=\"kk-star-ratings kksr-auto kksr-align-right kksr-valign-bottom\"\n    data-payload='{&quot;align&quot;:&quot;right&quot;,&quot;id&quot;:&quot;11&quot;,&quot;slug&quot;:&quot;default&quot;,&quot;valign&quot;:&quot;bottom&quot;,&quot;ignore&quot;:&quot;&quot;,&quot;reference&quot;:&quot;auto&quot;,&quot;class&quot;:&quot;&quot;,&quot;count&quot;:&quot;1&quot;,&quot;legendonly&quot;:&quot;&quot;,&quot;readonly&quot;:&quot;&quot;,&quot;score&quot;:&quot;5&quot;,&quot;starsonly&quot;:&quot;&quot;,&quot;best&quot;:&quot;5&quot;,&quot;gap&quot;:&quot;5&quot;,&quot;greet&quot;:&quot;\u0627\u0645\u062a\u06cc\u0627\u0632 \u062f\u0647\u06cc\u062f!&quot;,&quot;legend&quot;:&quot;5\\\/5 - (1 \u0627\u0645\u062a\u06cc\u0627\u0632)&quot;,&quot;size&quot;:&quot;24&quot;,&quot;title&quot;:&quot;Weight Initialization in Deep Learning&quot;,&quot;width&quot;:&quot;142.5&quot;,&quot;_legend&quot;:&quot;{score}\\\/{best} - ({count} \u0627\u0645\u062a\u06cc\u0627\u0632)&quot;,&quot;font_factor&quot;:&quot;1.25&quot;}'>\n            \n<div class=\"kksr-stars\">\n    \n<div class=\"kksr-stars-inactive\">\n            <div class=\"kksr-star\" data-star=\"1\" style=\"padding-left: 5px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" data-star=\"2\" style=\"padding-left: 5px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" data-star=\"3\" style=\"padding-left: 5px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" data-star=\"4\" style=\"padding-left: 5px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" data-star=\"5\" style=\"padding-left: 5px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n        <\/div>\n    <\/div>\n    \n<div class=\"kksr-stars-active\" style=\"width: 142.5px;\">\n            <div class=\"kksr-star\" style=\"padding-left: 5px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" style=\"padding-left: 5px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" style=\"padding-left: 5px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" style=\"padding-left: 5px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" style=\"padding-left: 5px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n        <\/div>\n    <\/div>\n<\/div>\n                \n\n<div class=\"kksr-legend\" style=\"font-size: 19.2px;\">\n            5\/5 - (1 \u0627\u0645\u062a\u06cc\u0627\u0632)    <\/div>\n    <\/div>\n","protected":false},"excerpt":{"rendered":"<p>Building a neural network is a tedious task and upon that tuning it to get better result is more challenging. The first challenging task that comes into consideration while building a neural network is initialization of weights, if the weights are initialized correctly, then optimization will be achieved in least time, Otherwise converging to minima is impossible.<\/p>\n","protected":false},"author":6,"featured_media":1256,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_editorskit_title_hidden":false,"_editorskit_reading_time":0,"_editorskit_is_block_options_detached":false,"_editorskit_block_options_position":"{}","footnotes":""},"categories":[18,16,19],"tags":[89,86],"class_list":["post-11","post","type-post","status-publish","format-standard","has-post-thumbnail","","category-edu","category-en-articles","category-deep-learning","tag-weight-initialization","tag-86"],"_links":{"self":[{"href":"https:\/\/shahaab-co.com\/mag\/wp-json\/wp\/v2\/posts\/11","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/shahaab-co.com\/mag\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/shahaab-co.com\/mag\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/shahaab-co.com\/mag\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/shahaab-co.com\/mag\/wp-json\/wp\/v2\/comments?post=11"}],"version-history":[{"count":0,"href":"https:\/\/shahaab-co.com\/mag\/wp-json\/wp\/v2\/posts\/11\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/shahaab-co.com\/mag\/wp-json\/wp\/v2\/media\/1256"}],"wp:attachment":[{"href":"https:\/\/shahaab-co.com\/mag\/wp-json\/wp\/v2\/media?parent=11"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/shahaab-co.com\/mag\/wp-json\/wp\/v2\/categories?post=11"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/shahaab-co.com\/mag\/wp-json\/wp\/v2\/tags?post=11"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}