view docs/matrix.txt @ 44:857a606783e1

[documentation] notes + stubs on gradient descent
author Jeff Hammel <k0scist@gmail.com>
date Mon, 04 Sep 2017 15:06:38 -0700
parents 59044f78d587
children
line wrap: on
line source

    [|  |     | ]
X = [x1 x2 ...xm] = A0
    [|  |     | ]

Z1 = w'X + b1

A1 = sigmoid(Z1)

Z2 = W2 A1 + b2

     [---]
W1 = [---]
     [---]

`W1x1` gives some column vector, where `x1`
is the first training example.

Y = [ y1 y2 ... ym]

For a two-layer network:

dZ2 = A2 - Y

dW = (1/m) dZ2 A1'

db2 = (1./m)*np.sum(dZ2, axis=1, keepdims=True)

dZ1 = W2' dZ2 * g1 ( Z1 )
 : W2' dZ2 : an (n1, m) matrix
 : * : element-wise product

dW1 = (1/m) dZ1 X'

db1 = (1/m) np.sum(dZ1, axis=1, keepdims=True)