# HG changeset patch # User Jeff Hammel # Date 1506289376 25200 # Node ID 673a295fd09c32114debe7f179d3b34c3cc717a4 # Parent 0b3daccfc36c3500e221736d738b77ca6c4ffcb6 [documentation] cache coursera notes diff -r 0b3daccfc36c -r 673a295fd09c docs/summary_of_gradient_descent.txt --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/docs/summary_of_gradient_descent.txt Sun Sep 24 14:42:56 2017 -0700 @@ -0,0 +1,26 @@ +# Summary of Gradient Descent + +For a two layer network. The `[]`s denote the layer number. +`'` denotes prime. `T` denotes transpose. + +## Scalar implementation + +``` +dz[2] = a[2] - y +dW[2] = dz[2]a[1]T +db[2] = dz[2] +dz[1] = W[2]Tdz[2] * g[1]'(z[1]) +dW[1] = dz[1]xT +db[1] = dz[1] +``` + + +## Vectorized Implementation + +``` +dZ[2] = A[2] - Y +dW[2] = (1/m)dZ[2]A[1]T +db[2] = (1/m)*np.sum(dZ[2], axis=1, keepdims=True) +dZ[1] = W[2]TdZ[2] * g[1]'(z[1]) +db[1] = (1/m)*np.sum(dZ[1], axis=1, keepdims=True) +```