annotate docs/summary_of_gradient_descent.txt @ 55:0908b6cd3217

[regression] add better cost function for sigmoids
author Jeff Hammel <k0scist@gmail.com>
date Sun, 24 Sep 2017 15:30:15 -0700
parents 673a295fd09c
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
53
673a295fd09c [documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff changeset
1 # Summary of Gradient Descent
673a295fd09c [documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff changeset
2
673a295fd09c [documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff changeset
3 For a two layer network. The `[]`s denote the layer number.
673a295fd09c [documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff changeset
4 `'` denotes prime. `T` denotes transpose.
673a295fd09c [documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff changeset
5
673a295fd09c [documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff changeset
6 ## Scalar implementation
673a295fd09c [documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff changeset
7
673a295fd09c [documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff changeset
8 ```
673a295fd09c [documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff changeset
9 dz[2] = a[2] - y
673a295fd09c [documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff changeset
10 dW[2] = dz[2]a[1]T
673a295fd09c [documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff changeset
11 db[2] = dz[2]
673a295fd09c [documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff changeset
12 dz[1] = W[2]Tdz[2] * g[1]'(z[1])
673a295fd09c [documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff changeset
13 dW[1] = dz[1]xT
673a295fd09c [documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff changeset
14 db[1] = dz[1]
673a295fd09c [documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff changeset
15 ```
673a295fd09c [documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff changeset
16
673a295fd09c [documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff changeset
17
673a295fd09c [documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff changeset
18 ## Vectorized Implementation
673a295fd09c [documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff changeset
19
673a295fd09c [documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff changeset
20 ```
673a295fd09c [documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff changeset
21 dZ[2] = A[2] - Y
673a295fd09c [documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff changeset
22 dW[2] = (1/m)dZ[2]A[1]T
673a295fd09c [documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff changeset
23 db[2] = (1/m)*np.sum(dZ[2], axis=1, keepdims=True)
673a295fd09c [documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff changeset
24 dZ[1] = W[2]TdZ[2] * g[1]'(z[1])
673a295fd09c [documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff changeset
25 db[1] = (1/m)*np.sum(dZ[1], axis=1, keepdims=True)
673a295fd09c [documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff changeset
26 ```