Mercurial > hg > tvii
annotate docs/summary_of_gradient_descent.txt @ 55:0908b6cd3217
[regression] add better cost function for sigmoids
author | Jeff Hammel <k0scist@gmail.com> |
---|---|
date | Sun, 24 Sep 2017 15:30:15 -0700 |
parents | 673a295fd09c |
children |
rev | line source |
---|---|
53
673a295fd09c
[documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff
changeset
|
1 # Summary of Gradient Descent |
673a295fd09c
[documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff
changeset
|
2 |
673a295fd09c
[documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff
changeset
|
3 For a two layer network. The `[]`s denote the layer number. |
673a295fd09c
[documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff
changeset
|
4 `'` denotes prime. `T` denotes transpose. |
673a295fd09c
[documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff
changeset
|
5 |
673a295fd09c
[documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff
changeset
|
6 ## Scalar implementation |
673a295fd09c
[documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff
changeset
|
7 |
673a295fd09c
[documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff
changeset
|
8 ``` |
673a295fd09c
[documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff
changeset
|
9 dz[2] = a[2] - y |
673a295fd09c
[documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff
changeset
|
10 dW[2] = dz[2]a[1]T |
673a295fd09c
[documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff
changeset
|
11 db[2] = dz[2] |
673a295fd09c
[documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff
changeset
|
12 dz[1] = W[2]Tdz[2] * g[1]'(z[1]) |
673a295fd09c
[documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff
changeset
|
13 dW[1] = dz[1]xT |
673a295fd09c
[documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff
changeset
|
14 db[1] = dz[1] |
673a295fd09c
[documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff
changeset
|
15 ``` |
673a295fd09c
[documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff
changeset
|
16 |
673a295fd09c
[documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff
changeset
|
17 |
673a295fd09c
[documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff
changeset
|
18 ## Vectorized Implementation |
673a295fd09c
[documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff
changeset
|
19 |
673a295fd09c
[documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff
changeset
|
20 ``` |
673a295fd09c
[documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff
changeset
|
21 dZ[2] = A[2] - Y |
673a295fd09c
[documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff
changeset
|
22 dW[2] = (1/m)dZ[2]A[1]T |
673a295fd09c
[documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff
changeset
|
23 db[2] = (1/m)*np.sum(dZ[2], axis=1, keepdims=True) |
673a295fd09c
[documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff
changeset
|
24 dZ[1] = W[2]TdZ[2] * g[1]'(z[1]) |
673a295fd09c
[documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff
changeset
|
25 db[1] = (1/m)*np.sum(dZ[1], axis=1, keepdims=True) |
673a295fd09c
[documentation] cache coursera notes
Jeff Hammel <k0scist@gmail.com>
parents:
diff
changeset
|
26 ``` |