comparison docs/summary_of_gradient_descent.txt @ 53:673a295fd09c

[documentation] cache coursera notes
author Jeff Hammel <k0scist@gmail.com>
date Sun, 24 Sep 2017 14:42:56 -0700
parents
children
comparison
equal deleted inserted replaced
52:0b3daccfc36c 53:673a295fd09c
1 # Summary of Gradient Descent
2
3 For a two layer network. The `[]`s denote the layer number.
4 `'` denotes prime. `T` denotes transpose.
5
6 ## Scalar implementation
7
8 ```
9 dz[2] = a[2] - y
10 dW[2] = dz[2]a[1]T
11 db[2] = dz[2]
12 dz[1] = W[2]Tdz[2] * g[1]'(z[1])
13 dW[1] = dz[1]xT
14 db[1] = dz[1]
15 ```
16
17
18 ## Vectorized Implementation
19
20 ```
21 dZ[2] = A[2] - Y
22 dW[2] = (1/m)dZ[2]A[1]T
23 db[2] = (1/m)*np.sum(dZ[2], axis=1, keepdims=True)
24 dZ[1] = W[2]TdZ[2] * g[1]'(z[1])
25 db[1] = (1/m)*np.sum(dZ[1], axis=1, keepdims=True)
26 ```