# HG changeset patch
# User Jeff Hammel <k0scist@gmail.com>
# Date 1506289376 25200
# Node ID 673a295fd09c32114debe7f179d3b34c3cc717a4
# Parent  0b3daccfc36c3500e221736d738b77ca6c4ffcb6
[documentation] cache coursera notes

diff -r 0b3daccfc36c -r 673a295fd09c docs/summary_of_gradient_descent.txt
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/docs/summary_of_gradient_descent.txt	Sun Sep 24 14:42:56 2017 -0700
@@ -0,0 +1,26 @@
+# Summary of Gradient Descent
+
+For a two layer network. The `[]`s denote the layer number.
+`'` denotes prime. `T` denotes transpose.
+
+## Scalar implementation
+
+```
+dz[2] = a[2] - y
+dW[2] = dz[2]a[1]T
+db[2] = dz[2]
+dz[1] = W[2]Tdz[2] * g[1]'(z[1])
+dW[1] = dz[1]xT
+db[1] = dz[1]
+```
+
+
+## Vectorized Implementation
+
+```
+dZ[2] = A[2] - Y
+dW[2] = (1/m)dZ[2]A[1]T
+db[2] = (1/m)*np.sum(dZ[2], axis=1, keepdims=True)
+dZ[1] = W[2]TdZ[2] * g[1]'(z[1])
+db[1] = (1/m)*np.sum(dZ[1], axis=1, keepdims=True)
+```