annotate docs/matrix.txt @ 55:0908b6cd3217

[regression] add better cost function for sigmoids
author Jeff Hammel <k0scist@gmail.com>
date Sun, 24 Sep 2017 15:30:15 -0700
parents 857a606783e1
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
35
37a9fb876f54 [documentation] add notes for matrices + vectorization
Jeff Hammel <k0scist@gmail.com>
parents:
diff changeset
1 [| | | ]
37
59044f78d587 documentation
Jeff Hammel <k0scist@gmail.com>
parents: 36
diff changeset
2 X = [x1 x2 ...xm] = A0
35
37a9fb876f54 [documentation] add notes for matrices + vectorization
Jeff Hammel <k0scist@gmail.com>
parents:
diff changeset
3 [| | | ]
36
433c475f42db [documentation] more matrix notes
Jeff Hammel <k0scist@gmail.com>
parents: 35
diff changeset
4
433c475f42db [documentation] more matrix notes
Jeff Hammel <k0scist@gmail.com>
parents: 35
diff changeset
5 Z1 = w'X + b1
433c475f42db [documentation] more matrix notes
Jeff Hammel <k0scist@gmail.com>
parents: 35
diff changeset
6
433c475f42db [documentation] more matrix notes
Jeff Hammel <k0scist@gmail.com>
parents: 35
diff changeset
7 A1 = sigmoid(Z1)
433c475f42db [documentation] more matrix notes
Jeff Hammel <k0scist@gmail.com>
parents: 35
diff changeset
8
37
59044f78d587 documentation
Jeff Hammel <k0scist@gmail.com>
parents: 36
diff changeset
9 Z2 = W2 A1 + b2
59044f78d587 documentation
Jeff Hammel <k0scist@gmail.com>
parents: 36
diff changeset
10
36
433c475f42db [documentation] more matrix notes
Jeff Hammel <k0scist@gmail.com>
parents: 35
diff changeset
11 [---]
433c475f42db [documentation] more matrix notes
Jeff Hammel <k0scist@gmail.com>
parents: 35
diff changeset
12 W1 = [---]
433c475f42db [documentation] more matrix notes
Jeff Hammel <k0scist@gmail.com>
parents: 35
diff changeset
13 [---]
433c475f42db [documentation] more matrix notes
Jeff Hammel <k0scist@gmail.com>
parents: 35
diff changeset
14
433c475f42db [documentation] more matrix notes
Jeff Hammel <k0scist@gmail.com>
parents: 35
diff changeset
15 `W1x1` gives some column vector, where `x1`
433c475f42db [documentation] more matrix notes
Jeff Hammel <k0scist@gmail.com>
parents: 35
diff changeset
16 is the first training example.
44
857a606783e1 [documentation] notes + stubs on gradient descent
Jeff Hammel <k0scist@gmail.com>
parents: 37
diff changeset
17
857a606783e1 [documentation] notes + stubs on gradient descent
Jeff Hammel <k0scist@gmail.com>
parents: 37
diff changeset
18 Y = [ y1 y2 ... ym]
857a606783e1 [documentation] notes + stubs on gradient descent
Jeff Hammel <k0scist@gmail.com>
parents: 37
diff changeset
19
857a606783e1 [documentation] notes + stubs on gradient descent
Jeff Hammel <k0scist@gmail.com>
parents: 37
diff changeset
20 For a two-layer network:
857a606783e1 [documentation] notes + stubs on gradient descent
Jeff Hammel <k0scist@gmail.com>
parents: 37
diff changeset
21
857a606783e1 [documentation] notes + stubs on gradient descent
Jeff Hammel <k0scist@gmail.com>
parents: 37
diff changeset
22 dZ2 = A2 - Y
857a606783e1 [documentation] notes + stubs on gradient descent
Jeff Hammel <k0scist@gmail.com>
parents: 37
diff changeset
23
857a606783e1 [documentation] notes + stubs on gradient descent
Jeff Hammel <k0scist@gmail.com>
parents: 37
diff changeset
24 dW = (1/m) dZ2 A1'
857a606783e1 [documentation] notes + stubs on gradient descent
Jeff Hammel <k0scist@gmail.com>
parents: 37
diff changeset
25
857a606783e1 [documentation] notes + stubs on gradient descent
Jeff Hammel <k0scist@gmail.com>
parents: 37
diff changeset
26 db2 = (1./m)*np.sum(dZ2, axis=1, keepdims=True)
857a606783e1 [documentation] notes + stubs on gradient descent
Jeff Hammel <k0scist@gmail.com>
parents: 37
diff changeset
27
857a606783e1 [documentation] notes + stubs on gradient descent
Jeff Hammel <k0scist@gmail.com>
parents: 37
diff changeset
28 dZ1 = W2' dZ2 * g1 ( Z1 )
857a606783e1 [documentation] notes + stubs on gradient descent
Jeff Hammel <k0scist@gmail.com>
parents: 37
diff changeset
29 : W2' dZ2 : an (n1, m) matrix
857a606783e1 [documentation] notes + stubs on gradient descent
Jeff Hammel <k0scist@gmail.com>
parents: 37
diff changeset
30 : * : element-wise product
857a606783e1 [documentation] notes + stubs on gradient descent
Jeff Hammel <k0scist@gmail.com>
parents: 37
diff changeset
31
857a606783e1 [documentation] notes + stubs on gradient descent
Jeff Hammel <k0scist@gmail.com>
parents: 37
diff changeset
32 dW1 = (1/m) dZ1 X'
857a606783e1 [documentation] notes + stubs on gradient descent
Jeff Hammel <k0scist@gmail.com>
parents: 37
diff changeset
33
857a606783e1 [documentation] notes + stubs on gradient descent
Jeff Hammel <k0scist@gmail.com>
parents: 37
diff changeset
34 db1 = (1/m) np.sum(dZ1, axis=1, keepdims=True)