{"id":17,"date":"2019-05-01T12:40:32","date_gmt":"2019-05-01T08:10:32","guid":{"rendered":"http:\/\/themes.tielabs.com\/sahifa5\/?p=17"},"modified":"2020-03-26T21:03:43","modified_gmt":"2020-03-26T16:33:43","slug":"activation-functions-in-deep-learning","status":"publish","type":"post","link":"https:\/\/shahaab-co.com\/mag\/en-articles\/activation-functions-in-deep-learning\/","title":{"rendered":"Activation Functions in Deep Learning"},"content":{"rendered":"<blockquote id=\"09a3\" class=\"graf graf--blockquote graf--startsWithDoubleQuote graf-after--h3\">\n<blockquote id=\"09a3\" class=\"graf graf--blockquote graf--startsWithDoubleQuote graf-after--h3\" dir=\"ltr\"><p>\u201cThe expert in anything was once a beginner\u201d -Helen Hayes<\/p><\/blockquote>\n<p id=\"c82e\" class=\"graf graf--p graf-after--blockquote\" dir=\"ltr\">Yes, let me begin the initial step of yours in deep learning by teaching you the two basic and important concepts in deep learning i.e.,Activation functions and <a href=\"https:\/\/shahaab-co.ir\/mag\/en-articles\/weight-initialization-in-deep-learning\/\" target=\"_blank\" rel=\"noopener\">weight initialization<\/a> in deep learning.<\/p>\n<div id=\"ez-toc-container\" class=\"ez-toc-v2_0_76 counter-hierarchy ez-toc-counter-rtl ez-toc-custom ez-toc-container-direction\">\n<div class=\"ez-toc-title-container\">\n<p class=\"ez-toc-title\" style=\"cursor:inherit\">\u0622\u0646\u0686\u0647 \u062f\u0631 \u0627\u06cc\u0646 \u0645\u0637\u0644\u0628 \u062e\u0648\u0627\u0647\u06cc\u0645 \u062e\u0648\u0627\u0646\u062f :<\/p>\n<span class=\"ez-toc-title-toggle\"><a href=\"#\" class=\"ez-toc-pull-right ez-toc-btn ez-toc-btn-xs ez-toc-btn-default ez-toc-toggle\" aria-label=\"Toggle Table of Content\"><span class=\"ez-toc-js-icon-con\"><span class=\"\"><span class=\"eztoc-hide\" style=\"display:none;\">Toggle<\/span><span class=\"ez-toc-icon-toggle-span\"><svg style=\"fill: #0044bf;color:#0044bf\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" class=\"list-377408\" width=\"20px\" height=\"20px\" viewBox=\"0 0 24 24\" fill=\"none\"><path d=\"M6 6H4v2h2V6zm14 0H8v2h12V6zM4 11h2v2H4v-2zm16 0H8v2h12v-2zM4 16h2v2H4v-2zm16 0H8v2h12v-2z\" fill=\"currentColor\"><\/path><\/svg><svg style=\"fill: #0044bf;color:#0044bf\" class=\"arrow-unsorted-368013\" xmlns=\"http:\/\/www.w3.org\/2000\/svg\" width=\"10px\" height=\"10px\" viewBox=\"0 0 24 24\" version=\"1.2\" baseProfile=\"tiny\"><path d=\"M18.2 9.3l-6.2-6.3-6.2 6.3c-.2.2-.3.4-.3.7s.1.5.3.7c.2.2.4.3.7.3h11c.3 0 .5-.1.7-.3.2-.2.3-.5.3-.7s-.1-.5-.3-.7zM5.8 14.7l6.2 6.3 6.2-6.3c.2-.2.3-.5.3-.7s-.1-.5-.3-.7c-.2-.2-.4-.3-.7-.3h-11c-.3 0-.5.1-.7.3-.2.2-.3.5-.3.7s.1.5.3.7z\"\/><\/svg><\/span><\/span><\/span><\/a><\/span><\/div>\n<nav><ul class='ez-toc-list ez-toc-list-level-1 eztoc-toggle-hide-by-default' ><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-1\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/activation-functions-in-deep-learning\/#Activation_functions\" >Activation functions<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-2\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/activation-functions-in-deep-learning\/#Neural_networks\" >Neural networks<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-3\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/activation-functions-in-deep-learning\/#What_if_the_output_generated_is_far_away_from_the_expected_value\" >What if the output generated is far away from the expected value?<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-4\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/activation-functions-in-deep-learning\/#Vanishing_gradients\" >Vanishing gradients<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-5\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/activation-functions-in-deep-learning\/#Exploding_gradients\" >Exploding gradients<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-6\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/activation-functions-in-deep-learning\/#What_is_an_activation_function\" >What is an activation function\u00a0?<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-7\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/activation-functions-in-deep-learning\/#Linear_function\" >Linear function<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-8\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/activation-functions-in-deep-learning\/#Non-linear_function\" >Non-linear function<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-9\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/activation-functions-in-deep-learning\/#Why_we_use_activation_functions\" >Why we use activation functions\u00a0?<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-10\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/activation-functions-in-deep-learning\/#Sigmoid\" >Sigmoid<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-11\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/activation-functions-in-deep-learning\/#Graph_of_sigmoid_function\" >Graph of sigmoid function<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-12\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/activation-functions-in-deep-learning\/#Graph_of_sigmoid_derivative\" >Graph of sigmoid derivative<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-13\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/activation-functions-in-deep-learning\/#Gradient_values_of_sigmoid\" >Gradient values of sigmoid<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-14\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/activation-functions-in-deep-learning\/#Equation_of_sigmoid_function_and_its_derivatives\" >Equation of sigmoid function and its derivatives<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-15\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/activation-functions-in-deep-learning\/#Code_for_sigmoid_function_in_python\" >Code for sigmoid function in python<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-16\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/activation-functions-in-deep-learning\/#Problems_with_sigmoid_function\" >Problems with sigmoid function<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-17\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/activation-functions-in-deep-learning\/#Vanishing_gradients_problem_for_sigmoid_function\" >Vanishing gradients problem for sigmoid function<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-18\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/activation-functions-in-deep-learning\/#Exploding_gradients_problem_for_sigmoid\" >Exploding gradients problem for sigmoid<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-19\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/activation-functions-in-deep-learning\/#Tanh\" >Tanh<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-20\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/activation-functions-in-deep-learning\/#Graph_of_Tanh_function\" >Graph of Tanh function<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-21\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/activation-functions-in-deep-learning\/#Graph_of_derivative_of_tanh_function\" >Graph of derivative of tanh function<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-22\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/activation-functions-in-deep-learning\/#Equation_of_tanh_function\" >Equation of tanh function<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-23\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/activation-functions-in-deep-learning\/#Problems_with_tanh_function\" >Problems with tanh function<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-24\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/activation-functions-in-deep-learning\/#Vanishing_gradients_problem_for_tanh_function\" >Vanishing gradients problem for tanh function<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-25\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/activation-functions-in-deep-learning\/#Exploding_gradients_problem_for_tanh_function\" >Exploding gradients problem for tanh function<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-26\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/activation-functions-in-deep-learning\/#ReLU\" >ReLU<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-27\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/activation-functions-in-deep-learning\/#Graph_for_ReLU_function\" >Graph for ReLU function<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-28\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/activation-functions-in-deep-learning\/#Graph_for_derivative_of_ReLU_function\" >Graph for derivative of ReLU function<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-29\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/activation-functions-in-deep-learning\/#Problem_with_ReLU_function\" >Problem with ReLU function<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-30\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/activation-functions-in-deep-learning\/#Leaky_ReLU\" >Leaky ReLU<\/a><ul class='ez-toc-list-level-3' ><li class='ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-31\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/activation-functions-in-deep-learning\/#Graph_of_Leaky_ReLU_function\" >Graph of Leaky ReLU function<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-32\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/activation-functions-in-deep-learning\/#Graph_of_derivative_of_Leaky_ReLU_function\" >Graph of derivative of Leaky ReLU function<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-33\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/activation-functions-in-deep-learning\/#Equation_of_Leaky_ReLU_and_derivative_of_Leaky_ReLU\" >Equation of Leaky ReLU and derivative of Leaky ReLU<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-3'><a class=\"ez-toc-link ez-toc-heading-34\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/activation-functions-in-deep-learning\/#Code_of_Leaky_ReLU_in_python\" >Code of Leaky ReLU in python<\/a><\/li><\/ul><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-35\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/activation-functions-in-deep-learning\/#Graphs_of_activation_functions\" >Graphs of activation functions<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-36\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/activation-functions-in-deep-learning\/#Graphs_of_derivative_of_activation_functions\" >Graphs of derivative of activation functions<\/a><\/li><li class='ez-toc-page-1 ez-toc-heading-level-2'><a class=\"ez-toc-link ez-toc-heading-37\" href=\"https:\/\/shahaab-co.com\/mag\/en-articles\/activation-functions-in-deep-learning\/#How_to_choose_the_right_activation_function\" >How to choose the right activation function?<\/a><\/li><\/ul><\/nav><\/div>\n<h3 id=\"9f37\" class=\"graf graf--h3 graf-after--p\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Activation_functions\"><\/span><strong class=\"markup--strong markup--h3-strong\">Activation functions<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p id=\"5d0a\" class=\"graf graf--p graf-after--h3\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Introduction<\/em><\/strong><\/p>\n<figure id=\"b982\" class=\"graf graf--figure graf-after--p\" dir=\"ltr\">\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"aspectRatioPlaceholder-fill\"><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/brain.gif\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1251\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/brain.gif\" alt=\"\u0641\u0639\u0627\u0644\u06cc\u062a \u0646\u0631\u0648\u0646\u0647\u0627\u06cc \u062f\u0631 \u0645\u063a\u0632 - \u0634\u0628\u06a9\u0647 \u0639\u0635\u0628\u06cc\" width=\"320\" height=\"264\" title=\"\"><\/a><\/div>\n<div class=\"progressiveMedia js-progressiveMedia graf-image is-canvasLoaded is-imageLoaded\" data-image-id=\"1*D6G-sM5iu5Yne4Pg0RgUyw.gif\" data-width=\"320\" data-height=\"264\" data-scroll=\"native\"><\/div>\n<\/div>\n<\/figure>\n<p id=\"c349\" class=\"graf graf--p graf-after--figure\" dir=\"ltr\">For everything there is biological inspiration. Activation functions and neural networks are one of the beautiful ideas inspired from humans. When we feed lots of information to our brain, it tries hard to understand and classify the information between useful and not so useful information. In the same way, we need similar mechanism to classify the incoming information as useful and not useful in case of neural networks. Only some part of information is much useful and rest may be some noise. Network tries to learn the useful information. For that, we need activation functions. Activation function helps the network in doing segregation. In simpler words, activation function tries to build the wall between useful and less useful information.<\/p>\n<p id=\"c734\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">Let me introduce you to some terminologies, in order to simplify understanding .<\/p>\n<h2 id=\"d2dd\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Neural_networks\"><\/span><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Neural networks<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<figure id=\"5731\" class=\"graf graf--figure graf-after--p\" dir=\"ltr\">\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"aspectRatioPlaceholder-fill\"><\/div>\n<div class=\"progressiveMedia js-progressiveMedia graf-image is-canvasLoaded is-imageLoaded\" data-image-id=\"1*VsMyNFunGmgk5CgoRK10Cg.jpeg\" data-width=\"365\" data-height=\"243\" data-scroll=\"native\"><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/baby.jpg\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1249\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/baby.jpg\" alt=\"\u06a9\u0648\u062f\u06a9 \u0645\u062a\u0641\u06a9\u0631\" width=\"365\" height=\"243\" title=\"\" srcset=\"https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/baby.jpg 365w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/baby-300x200.jpg 300w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/baby-310x205.jpg 310w\" sizes=\"(max-width: 365px) 100vw, 365px\" \/><\/a><\/div>\n<\/div><figcaption class=\"imageCaption\"><\/figcaption><\/figure>\n<p id=\"251f\" class=\"graf graf--p graf-after--figure\" dir=\"ltr\">Let me give a simple example and later i will connect the dots with the theory. Suppose, we are teaching an 8 year old kid to perform addition of two numbers. First of all, he will receive the information about how to perform addition from the instructor. He now tries to learn from the information given and finally\u00a0, he performs addition. Here, the kid can be thought as neuron, it tries to learn from the input given and finally from the neuron we will get output.<\/p>\n<figure id=\"30fb\" class=\"graf graf--figure graf-after--p\" dir=\"ltr\">\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"progressiveMedia js-progressiveMedia graf-image is-canvasLoaded is-imageLoaded\" data-image-id=\"1*zRvOCcifUIShTTLuXOl64A.gif\" data-width=\"480\" data-height=\"270\" data-scroll=\"native\"><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/biologic-neuron.gif\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1250\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/biologic-neuron.gif\" alt=\"\u0646\u0631\u0648\u0646 - \u0633\u0644\u0648\u0644 \u0639\u0635\u0628\u06cc\" width=\"480\" height=\"270\" title=\"\"><\/a><\/div>\n<\/div><figcaption class=\"imageCaption\"><\/figcaption><\/figure>\n<p id=\"9f4b\" class=\"graf graf--p graf-after--figure\" dir=\"ltr\">In biological perspective, this ideal is similar to human brain\u00a0. Brain receives the stimulus from outside world, does processing on the input and then generates the output. As the task gets more complex, multiple neurons form a complex network passing information among themselves.<\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/neural-network.gif\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1297\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/neural-network.gif\" alt=\"\u0646\u062d\u0648\u0647 \u06a9\u0627\u0631\u06a9\u0631\u062f \u0634\u0628\u06a9\u0647 \u0639\u0635\u0628\u06cc - \u062a\u0634\u062e\u06cc\u0635 \u0633\u06af \u0648 \u06af\u0631\u0628\u0647\" width=\"600\" height=\"360\" title=\"\"><\/a><\/p>\n<p id=\"2dae\" class=\"graf graf--p graf-after--figure\" dir=\"ltr\">The blue circles are the neurons. Each neuron has weight,bias and activation function. Input is fed to the input layer. The neuron then performs a linear transformation on the input by the weights and biases. The non linear transformation is done by the activation function.The information moves from input layer to hidden layer. Hidden layer would do the processing and gives output. This mechanism is\u00a0<strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">forward propagation.<\/em><\/strong><\/p>\n<h3 id=\"fd0e\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"What_if_the_output_generated_is_far_away_from_the_expected_value\"><\/span><strong class=\"markup--strong markup--p-strong\">What if the output generated is far away from the expected value?<\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p dir=\"ltr\"><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/deep-nn.gif\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1252\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/deep-nn.gif\" alt=\"\u0634\u0628\u06a9\u0647 \u0639\u0635\u0628\u06cc\" width=\"480\" height=\"270\" title=\"\"><\/a><\/p>\n<p id=\"8e05\" class=\"graf graf--p graf-after--figure\" dir=\"ltr\">In neural network, we would update the weights and biases of the neurons on the biases of error. This process is known as\u00a0<strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">back propagation<\/em><\/strong>. Once the entire data has gone through this process, final weights and biases are used for predictions.<\/p>\n<h2 id=\"a01c\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Vanishing_gradients\"><\/span><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Vanishing gradients<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p id=\"8f51\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">Generally, adding more number of hidden layers in the network will allows it to learn more complex functions, thus it performs well.<\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-8.jpg\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1270\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-8.jpg\" alt=\"\u06af\u0631\u0627\u062f\u06cc\u0627\u0646 \u0645\u062d\u0648 \u0634\u0648\u0646\u062f\u0647 \u062f\u0631 \u0634\u0628\u06a9\u0647 \u0639\u0645\u06cc\u0642\" width=\"285\" height=\"177\" title=\"\"><\/a><\/p>\n<p id=\"a463\" class=\"graf graf--p graf-after--figure\" dir=\"ltr\">But, here comes the problem, when we do back propagation i.e., calculating and updating the weights in backward direction,the gradients tends to get smaller and smaller as we keep on moving backwards in the network. This means the weights of the neurons in the earlier layers learn very slowly or sometimes they won\u2019t change at all\u00a0.But earlier layers in the network are much important because they are responsible for detecting simple patterns. If the earlier layers give inappropriate results,then how can we expect our model to perform well in later layers. This problem is called vanishing gradient problem.<\/p>\n<h2 id=\"b05d\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Exploding_gradients\"><\/span><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Exploding gradients<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p id=\"82cf\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">We know that, when we have more number of hidden layers, our model tends to perform well. When we do back propagation, if the gradients become larger and larger, then the weights of the neurons in the earlier stages change much. We know that the earlier layers are much important. Because of this larger weights, the neurons in the earlier layers will give inappropriate results. This problem is called exploding gradients problem.<\/p>\n<p id=\"bdd5\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">Now, let us dive deep into core concept of activation functions.<\/p>\n<h2 id=\"ca59\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"What_is_an_activation_function\"><\/span><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">What is an activation function\u00a0?<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p id=\"7770\" class=\"graf graf--p graf--startsWithDoubleQuote graf-after--p\" dir=\"ltr\">\u201c\u00a0<em class=\"markup--em markup--p-em\">An activation function is a non-linear function applied by the neuron to introduce non-linear properties in the network.\u201d<\/em><\/p>\n<p id=\"ed4c\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">Let me explain in detail. There are two types of functions i.e., linear and non-linear functions.<\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-6.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1267\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-6.png\" alt=\"\u062a\u0627\u0628\u0639 \u062e\u0637\u06cc \u0648 \u063a\u06cc\u0631\u062e\u0637\u06cc\" width=\"709\" height=\"266\" title=\"\" srcset=\"https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-6.png 709w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-6-300x113.png 300w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-6-600x225.png 600w\" sizes=\"(max-width: 709px) 100vw, 709px\" \/><\/a><\/p>\n<h3 id=\"c506\" class=\"graf graf--p graf-after--figure\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Linear_function\"><\/span><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Linear function<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p id=\"2938\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">If the change in the first variable corresponds to a constant change in the second variable, then we call it as linear function.<\/p>\n<h3 id=\"42a4\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Non-linear_function\"><\/span><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Non-linear function<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p id=\"0cf2\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">If the change in the first variable doesn\u2019t necessarily correspond with a constant change in the second variable, then we call it as non-linear function.<\/p>\n<h3 id=\"6116\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Why_we_use_activation_functions\"><\/span><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Why we use activation functions\u00a0?<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p id=\"5155\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">In simple case of any neural network, we multiply weights with the input, add bias and apply an activation function and pass the output to the next layer and we do back propagation to update the weights.<\/p>\n<p id=\"58a1\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">Neural networks are functions approximators. The main goal of any neural network is to learn complex non-linear functions. If we don\u2019t apply any non-linearity in our neural network, we are just trying to separate the classes using a linear hyper plane. As we know, nothing is linear in this real world.<\/p>\n<p id=\"18bb\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">If we perform simple linear operation i.e., multiply the input by weight,add a bias term and sum them across all the inputs arriving to the neuron. In some cases, the output of the above values is very large. When, this output is fed to the further more layers, the values become even more larger\u00a0, making things computationally uncontrollable. This is where the activation function plays a major role i.e., activation function squashes the input real number to a fixed interval i.e., (between -1 and 1) or (between 0 and 1)\u00a0.<\/p>\n<figure id=\"7928\" class=\"graf graf--figure graf-after--p\" dir=\"ltr\">\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"aspectRatioPlaceholder-fill\"><\/div>\n<p><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-1.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1255\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-1.png\" alt=\"\u0641\u0631\u0645\u0648\u0644 \u0634\u0628\u06a9\u0647 \u0639\u0635\u0628\u06cc\" width=\"406\" height=\"49\" title=\"\" srcset=\"https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-1.png 406w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-1-300x36.png 300w\" sizes=\"(max-width: 406px) 100vw, 406px\" \/><\/a><\/p>\n<\/div>\n<\/figure>\n<p id=\"03d0\" class=\"graf graf--p graf-after--figure\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Let us discuss about the different activation functions and their problems<\/em><\/strong><\/p>\n<h3 id=\"14ee\" class=\"graf graf--h4 graf-after--p\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Sigmoid\"><\/span>Sigmoid<span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p id=\"93ac\" class=\"graf graf--p graf-after--h4\" dir=\"ltr\">Sigmoid is a smooth function and is continuously differentiable. This is a non-linear function and it looks like S- shape.Main reason to use sigmoid function is, its value exists between 0 and 1. Therefore, it is especially used for models where we have to predict the probability as an output. Since probability of anything exists between the range 0 and 1, sigmoid is right choice.<\/p>\n<p id=\"c5b7\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">As we know, sigmoid function squashes the output values between 0 and 1. In mathematical representation, a large negative number passed through the sigmoid function becomes 0 and and a large positive number becomes 1.<\/p>\n<h3 id=\"7569\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Graph_of_sigmoid_function\"><\/span><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Graph of sigmoid function<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p dir=\"ltr\"><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-5.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1265\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-5.png\" alt=\"\u0633\u06cc\u06af\u0645\u0648\u06cc\u062f\" width=\"485\" height=\"323\" title=\"\" srcset=\"https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-5.png 485w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-5-300x200.png 300w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-5-310x205.png 310w\" sizes=\"(max-width: 485px) 100vw, 485px\" \/><\/a><\/p>\n<p id=\"aee0\" class=\"graf graf--p graf-after--figure\" dir=\"ltr\">. The values of sigmoid function is high between the values of -3 and 3 but gets flatter in other regions.<\/p>\n<h3 id=\"2ee6\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Graph_of_sigmoid_derivative\"><\/span><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Graph of sigmoid derivative<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-2.jpg\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter wp-image-1257\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-2.jpg\" alt=\"\u0645\u0634\u062a\u0642 \u0633\u06cc\u06af\u0645\u0648\u06cc\u062f\" width=\"439\" height=\"244\" title=\"\" srcset=\"https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-2.jpg 600w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-2-300x167.jpg 300w\" sizes=\"(max-width: 439px) 100vw, 439px\" \/><\/a><\/p>\n<p id=\"5306\" class=\"graf graf--p graf-after--figure\" dir=\"ltr\">. Sigmoid function is easily differentiable and the values are dependent on x values. This means that during back propagation, we can easily use sigmoid function to update weights.<\/p>\n<h3 id=\"c7a1\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Gradient_values_of_sigmoid\"><\/span><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Gradient values of sigmoid<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p id=\"572b\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">Gradient values of sigmoid range between 0 and 0.25\u00a0.<\/p>\n<h3 id=\"9253\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Equation_of_sigmoid_function_and_its_derivatives\"><\/span><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Equation of sigmoid function and its derivatives<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-15.jpg\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1285\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-15.jpg\" alt=\"\u0631\u0627\u0628\u0637\u0647 \u0633\u06cc\u06af\u0645\u0648\u06cc\u062f \u0648 \u0645\u0634\u062a\u0642\" width=\"800\" height=\"333\" title=\"\" srcset=\"https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-15.jpg 800w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-15-300x125.jpg 300w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-15-768x320.jpg 768w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-15-600x250.jpg 600w\" sizes=\"(max-width: 800px) 100vw, 800px\" \/><\/a><\/p>\n<h3 id=\"b8cf\" class=\"graf graf--p graf-after--figure\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Code_for_sigmoid_function_in_python\"><\/span><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Code for sigmoid function in python<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<pre id=\"d01a\" class=\"graf graf--pre graf-after--p\" dir=\"ltr\">def sigmoid(z):\r\n return 1 \/ (1 + np.exp(-z))<\/pre>\n<p id=\"af93\" class=\"graf graf--p graf-after--pre\" dir=\"ltr\">When we write code for sigmoid, we can use this code for both forward propagation and to compute derivatives\u00a0.<\/p>\n<h3 id=\"14cc\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Problems_with_sigmoid_function\"><\/span><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Problems with sigmoid function<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<ul class=\"postList\" dir=\"ltr\">\n<li style=\"list-style-type: none;\">\n<ul class=\"postList\" dir=\"ltr\">\n<li id=\"b47c\" class=\"graf graf--li graf-after--p\">Values obtained from sigmoid function are not zero centered.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<ul class=\"postList\" dir=\"ltr\">\n<li id=\"6c88\" class=\"graf graf--li graf-after--li\">We can easily face the issue of vanishing gradients and exploding gradients.<\/li>\n<\/ul>\n<p id=\"cda7\" class=\"graf graf--p graf-after--li\" dir=\"ltr\"><em class=\"markup--em markup--p-em\">Let me explain you how sigmoid function face the problem of vanishing gradients and exploding gradients<\/em><\/p>\n<h3 id=\"c753\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Vanishing_gradients_problem_for_sigmoid_function\"><\/span><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Vanishing gradients problem for sigmoid function<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-18.jpg\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1291\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-18.jpg\" alt=\"\u0645\u0634\u06a9\u0644 \u06af\u0631\u0627\u062f\u06cc\u0627\u0646 \u0645\u062d\u0648\u0634\u0648\u0646\u062f\u0647 \u0633\u06cc\u06af\u0645\u0648\u06cc\u062f \u062f\u0631 \u0634\u0628\u06a9\u0647 \u0639\u0645\u06cc\u0642\" width=\"800\" height=\"1146\" title=\"\" srcset=\"https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-18.jpg 733w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-18-209x300.jpg 209w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-18-768x1100.jpg 768w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-18-715x1024.jpg 715w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-18-600x860.jpg 600w\" sizes=\"(max-width: 800px) 100vw, 800px\" \/><\/a><\/p>\n<p><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-9.jpg\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1272\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-9.jpg\" alt=\"\u0645\u0634\u06a9\u0644 \u06af\u0631\u0627\u062f\u06cc\u0627\u0646 \u0645\u062d\u0648\u0634\u0648\u0646\u062f\u0647 \u0633\u06cc\u06af\u0645\u0648\u06cc\u062f \u062f\u0631 \u0634\u0628\u06a9\u0647 \u0639\u0645\u06cc\u0642\" width=\"800\" height=\"1046\" title=\"\" srcset=\"https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-9.jpg 800w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-9-229x300.jpg 229w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-9-768x1004.jpg 768w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-9-783x1024.jpg 783w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-9-600x785.jpg 600w\" sizes=\"(max-width: 800px) 100vw, 800px\" \/><\/a><\/p>\n<p><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-1.jpg\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1254\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-1.jpg\" alt=\"\u0645\u0634\u06a9\u0644 \u06af\u0631\u0627\u062f\u06cc\u0627\u0646 \u0645\u062d\u0648\u0634\u0648\u0646\u062f\u0647 \u0633\u06cc\u06af\u0645\u0648\u06cc\u062f \u062f\u0631 \u0634\u0628\u06a9\u0647 \u0639\u0645\u06cc\u0642\" width=\"800\" height=\"733\" title=\"\" srcset=\"https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-1.jpg 800w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-1-300x275.jpg 300w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-1-768x704.jpg 768w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-1-600x550.jpg 600w\" sizes=\"(max-width: 800px) 100vw, 800px\" \/><\/a><\/p>\n<h3 id=\"9c02\" class=\"graf graf--p graf-after--figure\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Exploding_gradients_problem_for_sigmoid\"><\/span><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Exploding gradients problem for sigmoid<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-10.jpg\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1274\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-10.jpg\" alt=\"\u0645\u0634\u06a9\u0644 \u06af\u0631\u0627\u062f\u06cc\u0627\u0646 \u0627\u0646\u0641\u062c\u0627\u0631\u06cc \u0633\u06cc\u06af\u0645\u0648\u06cc\u062f \u062f\u0631 \u0634\u0628\u06a9\u0647 \u0639\u0645\u06cc\u0642\" width=\"800\" height=\"1120\" title=\"\" srcset=\"https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-10.jpg 750w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-10-214x300.jpg 214w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-10-768x1075.jpg 768w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-10-731x1024.jpg 731w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-10-600x840.jpg 600w\" sizes=\"(max-width: 800px) 100vw, 800px\" \/><\/a><\/p>\n<p><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-20.jpg\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1295\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-20.jpg\" alt=\"\u0645\u0634\u06a9\u0644 \u06af\u0631\u0627\u062f\u06cc\u0627\u0646 \u0627\u0646\u0641\u062c\u0627\u0631\u06cc \u0633\u06cc\u06af\u0645\u0648\u06cc\u062f \u062f\u0631 \u0634\u0628\u06a9\u0647 \u0639\u0645\u06cc\u0642\" width=\"800\" height=\"445\" title=\"\" srcset=\"https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-20.jpg 800w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-20-300x167.jpg 300w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-20-768x427.jpg 768w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-20-600x334.jpg 600w\" sizes=\"(max-width: 800px) 100vw, 800px\" \/><\/a><\/p>\n<h2 id=\"333a\" class=\"graf graf--h4 graf-after--figure\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Tanh\"><\/span>Tanh<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p id=\"fbf9\" class=\"graf graf--p graf-after--h4\" dir=\"ltr\">Tanh function is similar to sigmoid function. Working of tanh function is also similar to the sigmoid function but it is symmetric over the origin\u00a0. It is continuous and differentiable at all points. It basically takes a real valued number and squashes values to between -1 and 1. Similar to sigmoid neuron, it saturates at large positive and negative values. The output of tanh is always zero centered. Tanh functions are preferred in hidden layers over sigmoid.<\/p>\n<h3 id=\"6ab4\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Graph_of_Tanh_function\"><\/span><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Graph of Tanh function<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-1.gif\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1253\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-1.gif\" alt=\"tanh \u062a\u0627\u0646\u0698\u0627\u0646\u062a \u0647\u06cc\u067e\u0631\u0628\u0648\u0644\u06cc\u06a9\" width=\"360\" height=\"233\" title=\"\"><\/a><\/p>\n<p id=\"e83d\" class=\"graf graf--p graf-after--figure\" dir=\"ltr\">. Tanh function takes the real valued function and outputs the values between -1 and 1.<\/p>\n<h3 id=\"ede0\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Graph_of_derivative_of_tanh_function\"><\/span><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Graph of derivative of tanh function<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-3.jpg\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"alignleft wp-image-1260\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-3.jpg\" alt=\"\" width=\"346\" height=\"192\" title=\"\" srcset=\"https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-3.jpg 600w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-3-300x167.jpg 300w\" sizes=\"(max-width: 346px) 100vw, 346px\" \/><\/a><\/p>\n<p id=\"b88e\" class=\"graf graf--p graf-after--figure\" dir=\"ltr\">.The derivative of the tanh function is steeper as compared to the sigmoid function.<\/p>\n<p id=\"1616\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">. Graph of the tanh function is flat and the gradients are very low.<\/p>\n<h3 id=\"c5a9\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Equation_of_tanh_function\"><\/span><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Equation of tanh function<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-19.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"alignleft wp-image-1294\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-19.png\" alt=\"tanh \u062a\u0627\u0646\u0698\u0627\u0646\u062a \u0647\u06cc\u067e\u0631\u0628\u0648\u0644\u06cc\u06a9\" width=\"371\" height=\"159\" title=\"\" srcset=\"https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-19.png 418w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-19-300x128.png 300w\" sizes=\"(max-width: 371px) 100vw, 371px\" \/><\/a><\/p>\n<p id=\"fb46\" class=\"graf graf--p graf-after--figure\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Code for tanh function in python<\/em><\/strong><\/p>\n<pre id=\"26c7\" class=\"graf graf--pre graf-after--p\" dir=\"ltr\">def tanh(z):\r\n return np.tanh(z)<\/pre>\n<p id=\"8ea5\" class=\"graf graf--p graf-after--pre\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Gradient values of tanh<\/em><\/strong><\/p>\n<p id=\"be97\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">Gradient values of tanh range between 0 and 1.<\/p>\n<h3 id=\"001d\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Problems_with_tanh_function\"><\/span><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Problems with tanh function<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<ul class=\"postList\" dir=\"ltr\">\n<li id=\"5e0f\" class=\"graf graf--li graf-after--p\">We can easily face the issue of vanishing gradients and exploding gradients in tanh function also.<\/li>\n<\/ul>\n<p id=\"1719\" class=\"graf graf--p graf-after--li\" dir=\"ltr\"><em class=\"markup--em markup--p-em\">Let me explain you how Tanh function face the problem of vanishing gradients and exploding gradients<\/em><\/p>\n<h3 id=\"81cb\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Vanishing_gradients_problem_for_tanh_function\"><\/span><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Vanishing gradients problem for tanh function<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-12.jpg\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1279\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-12.jpg\" alt=\"\u0645\u0634\u06a9\u0644 \u06af\u0631\u0627\u062f\u06cc\u0627\u0646 \u0645\u062d\u0648 \u0634\u0648\u0646\u062f \u0628\u0627 \u062a\u0627\u0646\u0698\u0627\u0646\u062a \u0647\u06cc\u067e\u0631\u0628\u0648\u0644\u06cc\u06a9\" width=\"800\" height=\"1057\" title=\"\" srcset=\"https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-12.jpg 795w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-12-227x300.jpg 227w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-12-768x1015.jpg 768w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-12-775x1024.jpg 775w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-12-600x793.jpg 600w\" sizes=\"(max-width: 800px) 100vw, 800px\" \/><\/a><\/p>\n<p><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-19.jpg\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1293\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-19.jpg\" alt=\"\u0645\u0634\u06a9\u0644 \u06af\u0631\u0627\u062f\u06cc\u0627\u0646 \u0645\u062d\u0648 \u0634\u0648\u0646\u062f \u0628\u0627 \u062a\u0627\u0646\u0698\u0627\u0646\u062a \u0647\u06cc\u067e\u0631\u0628\u0648\u0644\u06cc\u06a9\" width=\"800\" height=\"1121\" title=\"\" srcset=\"https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-19.jpg 749w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-19-214x300.jpg 214w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-19-768x1076.jpg 768w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-19-731x1024.jpg 731w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-19-600x841.jpg 600w\" sizes=\"(max-width: 800px) 100vw, 800px\" \/><\/a><\/p>\n<p><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-17.jpg\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1289\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-17.jpg\" alt=\"\u0645\u0634\u06a9\u0644 \u06af\u0631\u0627\u062f\u06cc\u0627\u0646 \u0645\u062d\u0648 \u0634\u0648\u0646\u062f \u0628\u0627 \u062a\u0627\u0646\u0698\u0627\u0646\u062a \u0647\u06cc\u067e\u0631\u0628\u0648\u0644\u06cc\u06a9\" width=\"800\" height=\"578\" title=\"\" srcset=\"https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-17.jpg 800w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-17-300x217.jpg 300w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-17-768x555.jpg 768w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-17-600x434.jpg 600w\" sizes=\"(max-width: 800px) 100vw, 800px\" \/><\/a><\/p>\n<h3 id=\"d403\" class=\"graf graf--p graf-after--figure\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Exploding_gradients_problem_for_tanh_function\"><\/span><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Exploding gradients problem for tanh function<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-11.jpg\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1277\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-11.jpg\" alt=\"\u0645\u0634\u06a9\u0644 \u06af\u0631\u0627\u062f\u06cc\u0627\u0646 \u0627\u0646\u0641\u062c\u0627\u0631\u06cc \u0628\u0627 \u062a\u0627\u0646\u0698\u0627\u0646\u062a \u0647\u06cc\u067e\u0631\u0628\u0648\u0644\u06cc\u06a9\" width=\"800\" height=\"1036\" title=\"\" srcset=\"https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-11.jpg 800w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-11-232x300.jpg 232w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-11-768x995.jpg 768w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-11-791x1024.jpg 791w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-11-600x777.jpg 600w\" sizes=\"(max-width: 800px) 100vw, 800px\" \/><\/a><\/p>\n<p><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-5.jpg\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1264\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-5.jpg\" alt=\"\u0645\u0634\u06a9\u0644 \u06af\u0631\u0627\u062f\u06cc\u0627\u0646 \u0627\u0646\u0641\u062c\u0627\u0631\u06cc \u0628\u0627 \u062a\u0627\u0646\u0698\u0627\u0646\u062a \u0647\u06cc\u067e\u0631\u0628\u0648\u0644\u06cc\u06a9\" width=\"800\" height=\"409\" title=\"\" srcset=\"https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-5.jpg 800w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-5-300x153.jpg 300w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-5-768x393.jpg 768w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-5-600x307.jpg 600w\" sizes=\"(max-width: 800px) 100vw, 800px\" \/><\/a><\/p>\n<h2 id=\"ce6b\" class=\"graf graf--h4 graf-after--figure\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"ReLU\"><\/span>ReLU<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p id=\"2e70\" class=\"graf graf--p graf-after--h4\" dir=\"ltr\">ReLU means Rectified Linear Unit. This is the mostly used activation unit in deep learning. R(x)=max(0,x) i.e., if x&lt;0, R(x)=0 and if x\u2265\u06f0,R(x)=x. It also accelerates the convergence of stochastic gradient descent as compared to sigmoid or tanh activation functions. Main advantage of using the ReLU function is, it does not activate all the neurons at the same time i.e., if the input is negative, it will convert to zero and the neuron does not get activated. This means only a few neurons are activated making the network easy for computation. It also avoids and rectifies vanishing gradient descent problem. Almost all deep learning models use ReLU activation function nowadays.<\/p>\n<p id=\"884d\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">How do you say ReLU is a non-linear function\u00a0?<\/em><\/strong><\/p>\n<p id=\"f20a\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">Linear functions are straight line functions. But, ReLU is not a straight line function because it has bend a at value zero. Hence,we can say that ReLU is a non-linear function. Please have a look at graph of ReLU function.<\/p>\n<h3 id=\"f840\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Graph_for_ReLU_function\"><\/span><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Graph for ReLU function<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-3.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1261\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-3.png\" alt=\"\u062a\u0627\u0628\u0639 ReLU\" width=\"600\" height=\"324\" title=\"\" srcset=\"https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-3.png 600w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-3-300x162.png 300w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/a><\/p>\n<p id=\"d781\" class=\"graf graf--p graf-after--figure\" dir=\"ltr\">. If the value of x is greater than or equal to zero then the we take ReLU(x)=x.<\/p>\n<p id=\"0bc1\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">. If the value of x is less than zero then we take ReLU(x)=0.<\/p>\n<h3 id=\"b2ea\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Graph_for_derivative_of_ReLU_function\"><\/span><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Graph for derivative of ReLU function<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-6.jpg\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1266\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-6.jpg\" alt=\"\u0645\u0634\u062a\u0642 LReLU\" width=\"300\" height=\"211\" title=\"\"><\/a><\/p>\n<p id=\"1bb0\" class=\"graf graf--p graf-after--figure\" dir=\"ltr\">If the value of x is greater than zero, then the derivative of the ReLU(x) i.e., ReLU\u2019(x)=1.<\/p>\n<p id=\"f7e4\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">If the value of x is less than zero, then the derivative of the ReLU(x) i.e., ReLU\u2019(x)=0.<\/p>\n<h3 id=\"4303\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Problem_with_ReLU_function\"><\/span><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Problem with ReLU function<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<h4 id=\"9448\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Dead neurons\u00a0<\/em><\/strong><\/h4>\n<p id=\"a19d\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">If the units are not activated initially, then during back propagation, zero gradients flow through them. Hence, neurons that already died won\u2019t respond to the variation in the output and the weights will never get updated during back propagation. This problem is called as dead neurons problem.<\/p>\n<h4 id=\"0f7b\" class=\"graf graf--p graf-after--figure\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Equation of derivative of ReLU function<\/em><\/strong><\/h4>\n<p><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-18.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter wp-image-1292\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-18.png\" alt=\"\u0645\u0634\u062a\u0642 relu\" width=\"213\" height=\"74\" title=\"\" srcset=\"https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-18.png 490w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-18-300x104.png 300w\" sizes=\"(max-width: 213px) 100vw, 213px\" \/><\/a><\/p>\n<figure id=\"bc39\" class=\"graf graf--figure graf-after--p\" dir=\"ltr\"><figcaption class=\"imageCaption\"><\/figcaption><\/figure>\n<h4 id=\"6418\" class=\"graf graf--p graf-after--figure\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Code for ReLU activation in python<\/em><\/strong><\/h4>\n<pre id=\"6162\" class=\"graf graf--pre graf-after--p\" dir=\"ltr\">def relu(z):\r\n return z * (z &gt; 0)<\/pre>\n<h2 id=\"629a\" class=\"graf graf--h4 graf-after--pre\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Leaky_ReLU\"><\/span>Leaky ReLU<span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p id=\"f500\" class=\"graf graf--p graf-after--h4\" dir=\"ltr\">Leaky ReLU is an improved version of ReLU function. We know that in ReLU, the gradient is 0, for x&lt;0. Here in Leaky ReLU, instead of defining the ReLU function as 0, for x&lt;0, we define it as a multiple of small linear component of x i.e., 0.01x (Generally we take linear component as 0.01). The main advantage in Leaky ReLU is, we are just replacing horizontal line on x-axis to non-zero and non horizontal line. We are doing this to remove zero gradient. So, by removing the zero gradients, we won\u2019t face any issue of dead neurons.<\/p>\n<h3 id=\"04a1\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Graph_of_Leaky_ReLU_function\"><\/span><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Graph of Leaky ReLU function<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-11.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1278\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-11.png\" alt=\"leaky relu\" width=\"600\" height=\"274\" title=\"\" srcset=\"https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-11.png 600w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-11-300x137.png 300w\" sizes=\"(max-width: 600px) 100vw, 600px\" \/><\/a><\/p>\n<p id=\"f9ba\" class=\"graf graf--p graf-after--figure\" dir=\"ltr\">If the value of x is greater than zero, then the Leaky ReLU(x)=x.<\/p>\n<p id=\"98af\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">If the value of x is less than zero, then the Leaky ReLU(x)=0.01*x.<\/p>\n<h3 id=\"3520\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Graph_of_derivative_of_Leaky_ReLU_function\"><\/span><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Graph of derivative of Leaky ReLU function<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<figure id=\"479c\" class=\"graf graf--figure graf--layoutOutsetLeft graf-after--p\" dir=\"ltr\" data-scroll=\"native\">\n<div class=\"aspectRatioPlaceholder is-locked\">\n<div class=\"progressiveMedia js-progressiveMedia graf-image is-canvasLoaded is-imageLoaded\" data-image-id=\"1*9oFDqYyYAkslAtqbQ5ZTrw.jpeg\" data-width=\"300\" data-height=\"211\" data-scroll=\"native\"><canvas class=\"progressiveMedia-canvas js-progressiveMedia-canvas\" width=\"75\" height=\"52\"><\/canvas><img decoding=\"async\" class=\"progressiveMedia-image js-progressiveMedia-image\" src=\"https:\/\/cdn-images-1.medium.com\/max\/600\/1*9oFDqYyYAkslAtqbQ5ZTrw.jpeg\" data-src=\"https:\/\/cdn-images-1.medium.com\/max\/600\/1*9oFDqYyYAkslAtqbQ5ZTrw.jpeg\" data- alt=\"\" title=\"\"><\/div>\n<\/div>\n<\/figure>\n<p id=\"5e98\" class=\"graf graf--p graf-after--figure\" dir=\"ltr\">If the value of x &gt;0, then the derivative of Leaky ReLU(x) i.e., Leaky ReLU\u2019(x)=1.<\/p>\n<p id=\"b81e\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">If the value of x &lt;0, then the derivative of Leaky ReLU(x) i.e., Leaky ReLU\u2019(x)=0.01.<\/p>\n<h3 id=\"2675\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Equation_of_Leaky_ReLU_and_derivative_of_Leaky_ReLU\"><\/span><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Equation of Leaky ReLU and derivative of Leaky ReLU<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<p><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-7.jpg\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1268\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-7.jpg\" alt=\"\u0645\u0634\u062a\u0642 LReLU\" width=\"543\" height=\"71\" title=\"\" srcset=\"https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-7.jpg 543w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-7-300x39.jpg 300w\" sizes=\"(max-width: 543px) 100vw, 543px\" \/><\/a><\/p>\n<p id=\"a45a\" class=\"graf graf--p graf-after--figure\" dir=\"ltr\">Here alpha is the small linear component of x\u00a0. Typically we take alpha value as 0.01\u00a0.<\/p>\n<h3 id=\"94d7\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Code_of_Leaky_ReLU_in_python\"><\/span><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Code of Leaky ReLU in python<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h3>\n<pre id=\"5511\" class=\"graf graf--pre graf-after--p\" dir=\"ltr\">def leaky_relu(z):\r\n return np.maximum(0.01 * z, z)<\/pre>\n<p id=\"a8de\" class=\"graf graf--p graf-after--pre\" dir=\"ltr\">Let me keep all the graphs at one place. So, that you can easily understand the difference between them.<\/p>\n<h2 id=\"3a5c\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Graphs_of_activation_functions\"><\/span><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Graphs of activation functions<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-14.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1284\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-14.png\" alt=\"\u062a\u0648\u0627\u0628\u0639 \u0641\u0639\u0627\u0644\u0633\u0627\u0632\u06cc\" width=\"800\" height=\"402\" title=\"\" srcset=\"https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-14.png 800w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-14-300x151.png 300w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-14-768x386.png 768w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-14-600x302.png 600w\" sizes=\"(max-width: 800px) 100vw, 800px\" \/><\/a><\/p>\n<figure id=\"34ac\" class=\"graf graf--figure graf-after--p\" dir=\"ltr\"><figcaption class=\"imageCaption\"><\/figcaption><\/figure>\n<h2 id=\"1cc4\" class=\"graf graf--p graf-after--figure\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"Graphs_of_derivative_of_activation_functions\"><\/span><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Graphs of derivative of activation functions<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-15.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1286\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-15.png\" alt=\"\u0645\u0634\u062a\u0642 \u062a\u0648\u0627\u0628\u0639 \u0641\u0639\u0627\u0644\u0633\u0627\u0632\u06cc\" width=\"800\" height=\"600\" title=\"\" srcset=\"https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-15.png 800w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-15-300x225.png 300w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-15-768x576.png 768w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-15-600x450.png 600w\" sizes=\"(max-width: 800px) 100vw, 800px\" \/><\/a><\/p>\n<p id=\"f1d9\" class=\"graf graf--p graf-after--figure\" dir=\"ltr\">Some complex terms like Maxout and ELU are not covered.<\/p>\n<p id=\"06b2\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">Let me keep all the activation function equations and their derivatives at one place, So that you can easily catch up and rewind them easily.<\/p>\n<p dir=\"ltr\"><a href=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-13.png\" target=\"_blank\" rel=\"noopener\"><img decoding=\"async\" class=\"aligncenter size-full wp-image-1282\" src=\"https:\/\/shahaab-co.ir\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-13.png\" alt=\"\u062a\u0648\u0627\u0628\u0639 \u0641\u0639\u0627\u0644\u0633\u0627\u0632\u06cc\" width=\"612\" height=\"296\" title=\"\" srcset=\"https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-13.png 612w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-13-300x145.png 300w, https:\/\/shahaab-co.com\/mag\/wp-content\/uploads\/2019\/05\/dnn-activation-function-13-600x290.png 600w\" sizes=\"(max-width: 612px) 100vw, 612px\" \/><\/a><\/p>\n<h2 id=\"6c3d\" class=\"graf graf--p graf-after--figure\" dir=\"ltr\"><span class=\"ez-toc-section\" id=\"How_to_choose_the_right_activation_function\"><\/span><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">How to choose the right activation function?<\/em><\/strong><span class=\"ez-toc-section-end\"><\/span><\/h2>\n<p id=\"374f\" class=\"graf graf--p graf-after--p\" dir=\"ltr\">Depending upon the properties of the given problem, we might be able to make a choice and can make a faster convergence of the network.<\/p>\n<ul class=\"postList\" dir=\"ltr\">\n<li style=\"list-style-type: none;\">\n<ul class=\"postList\" dir=\"ltr\">\n<li id=\"faae\" class=\"graf graf--li graf-after--p\">Sigmoid functions work better in case of classifiers.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<ul class=\"postList\" dir=\"ltr\">\n<li style=\"list-style-type: none;\">\n<ul class=\"postList\" dir=\"ltr\">\n<li id=\"86b5\" class=\"graf graf--li graf-after--li\">ReLU is general activation function and can be used in most cases.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<ul class=\"postList\" dir=\"ltr\">\n<li id=\"8326\" class=\"graf graf--li graf-after--li\">If we encounter dead neurons in our network, then Leaky ReLU is the best choice.<\/li>\n<\/ul>\n<p id=\"afbc\" class=\"graf graf--p graf-after--li\" dir=\"ltr\">As a rule of thumb, we can begin with ReLU activation function and we can move to other activation functions, if ReLU does not perform well in our network.<\/p>\n<p id=\"67bd\" class=\"graf graf--p graf-after--p\" dir=\"ltr\"><strong class=\"markup--strong markup--p-strong\"><em class=\"markup--em markup--p-em\">Reference:<\/em><\/strong><\/p>\n<ul class=\"postList\" dir=\"ltr\">\n<li id=\"4572\" class=\"graf graf--li graf-after--li\"><a href=\"https:\/\/medium.com\/@sakeshpusuluri123\/activation-functions-and-weight-initialization-in-deep-learning-ebc326e62a5c\" target=\"_blank\" rel=\"noopener\">https:\/\/medium.com\/@sakeshpusuluri123\/activation-functions-and-weight-initialization-in-deep-learning-ebc326e62a5c<\/a><\/li>\n<\/ul>\n<\/blockquote>\n\n\n<div class=\"kk-star-ratings kksr-auto kksr-align-right kksr-valign-bottom\"\n    data-payload='{&quot;align&quot;:&quot;right&quot;,&quot;id&quot;:&quot;17&quot;,&quot;slug&quot;:&quot;default&quot;,&quot;valign&quot;:&quot;bottom&quot;,&quot;ignore&quot;:&quot;&quot;,&quot;reference&quot;:&quot;auto&quot;,&quot;class&quot;:&quot;&quot;,&quot;count&quot;:&quot;0&quot;,&quot;legendonly&quot;:&quot;&quot;,&quot;readonly&quot;:&quot;&quot;,&quot;score&quot;:&quot;0&quot;,&quot;starsonly&quot;:&quot;&quot;,&quot;best&quot;:&quot;5&quot;,&quot;gap&quot;:&quot;5&quot;,&quot;greet&quot;:&quot;\u0627\u0645\u062a\u06cc\u0627\u0632 \u062f\u0647\u06cc\u062f!&quot;,&quot;legend&quot;:&quot;0\\\/5 - (0 \u0627\u0645\u062a\u06cc\u0627\u0632)&quot;,&quot;size&quot;:&quot;24&quot;,&quot;title&quot;:&quot;Activation Functions in Deep Learning&quot;,&quot;width&quot;:&quot;0&quot;,&quot;_legend&quot;:&quot;{score}\\\/{best} - ({count} \u0627\u0645\u062a\u06cc\u0627\u0632)&quot;,&quot;font_factor&quot;:&quot;1.25&quot;}'>\n            \n<div class=\"kksr-stars\">\n    \n<div class=\"kksr-stars-inactive\">\n            <div class=\"kksr-star\" data-star=\"1\" style=\"padding-left: 5px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" data-star=\"2\" style=\"padding-left: 5px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" data-star=\"3\" style=\"padding-left: 5px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" data-star=\"4\" style=\"padding-left: 5px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" data-star=\"5\" style=\"padding-left: 5px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n        <\/div>\n    <\/div>\n    \n<div class=\"kksr-stars-active\" style=\"width: 0px;\">\n            <div class=\"kksr-star\" style=\"padding-left: 5px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" style=\"padding-left: 5px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" style=\"padding-left: 5px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" style=\"padding-left: 5px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n        <\/div>\n            <div class=\"kksr-star\" style=\"padding-left: 5px\">\n            \n\n<div class=\"kksr-icon\" style=\"width: 24px; height: 24px;\"><\/div>\n        <\/div>\n    <\/div>\n<\/div>\n                \n\n<div class=\"kksr-legend\" style=\"font-size: 19.2px;\">\n            <span class=\"kksr-muted\">\u0627\u0645\u062a\u06cc\u0627\u0632 \u062f\u0647\u06cc\u062f!<\/span>\n    <\/div>\n    <\/div>\n","protected":false},"excerpt":{"rendered":"<p>\u0627\u0646\u0648\u0627\u0639 \u062a\u0648\u0627\u0628\u0639 \u0641\u0639\u0627\u0644\u0633\u0627\u0632\u06cc \u0628\u0631\u0627\u06cc \u0634\u0628\u06a9\u0647 \u0647\u0627\u06cc \u0639\u0635\u0628\u06cc \u0639\u0645\u06cc\u0642 \u0645\u062b\u0644 ReLU, PReLu, Sigmoid, Tanh,&#8230; \u0648 \u0645\u0632\u0627\u06cc\u0627 \u0648 \u0645\u0639\u0627\u06cc\u0628 \u0647\u0631\u06cc\u06a9<\/p>\n","protected":false},"author":6,"featured_media":1252,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_editorskit_title_hidden":false,"_editorskit_reading_time":0,"_editorskit_is_block_options_detached":false,"_editorskit_block_options_position":"{}","footnotes":""},"categories":[18,16],"tags":[91,86],"class_list":["post-17","post","type-post","status-publish","format-standard","has-post-thumbnail","","category-edu","category-en-articles","tag-relu","tag-86"],"_links":{"self":[{"href":"https:\/\/shahaab-co.com\/mag\/wp-json\/wp\/v2\/posts\/17","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/shahaab-co.com\/mag\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/shahaab-co.com\/mag\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/shahaab-co.com\/mag\/wp-json\/wp\/v2\/users\/6"}],"replies":[{"embeddable":true,"href":"https:\/\/shahaab-co.com\/mag\/wp-json\/wp\/v2\/comments?post=17"}],"version-history":[{"count":0,"href":"https:\/\/shahaab-co.com\/mag\/wp-json\/wp\/v2\/posts\/17\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/shahaab-co.com\/mag\/wp-json\/wp\/v2\/media\/1252"}],"wp:attachment":[{"href":"https:\/\/shahaab-co.com\/mag\/wp-json\/wp\/v2\/media?parent=17"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/shahaab-co.com\/mag\/wp-json\/wp\/v2\/categories?post=17"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/shahaab-co.com\/mag\/wp-json\/wp\/v2\/tags?post=17"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}