WIP | Notion

For the first activation unit, its value can be obtained by using the sigmoid function against the sum of its weights and inputs https://trello-attachments.s3.amazonaws.com/5958a458e90c043059dd58a8/5b9b881c4ec7525a531a11b4/1a4479189455de4f4d53e85c8f5344e5/image.png Hypothesis - Activation units and Theta weights Activation of unit i, in layer j https://trello-attachments.s3.amazonaws.com/5958a458e90c043059dd58a8/5b9b83e58f0b8d4d89dfee1a/8d35e020b9d129428d1b497aadea14cf/image.png Hypothesis - In terms of theta and X See the Coursera summary for a step by step run through of how to vectorise a Neural Net. https://www.coursera.org/learn/machine-learning/supplement/YlEVx/model-representation-ii Hypothesis - Code Programming Tutorial - https://www.coursera.org/learn/machine-learning/discussions/weeks/4/threads/miam5q2IEeWhLRIkesxXNw Hypothesis - Code Note that now for that single activation unit (a), we can collect the thetas into a vector, and X features into a vector. https://trello-attachments.s3.amazonaws.com/5958a458e90c043059dd58a8/5b9b835d60ac146b8994d731/020c554bbef6908c239a0e8b542f94b9/image.png Hypothesis - Activation units and Theta weights A single model of a neuron https://trello-attachments.s3.amazonaws.com/5958a458e90c043059dd58a8/5958a6542d1f35671869bfa2/0938948179d5f4fb3c28d11a95a5cd5a/image.png Hypothesis - Inputs (x) and Activation units (a) Notation for activation units https://trello-attachments.s3.amazonaws.com/5958a458e90c043059dd58a8/5b9b838896f99712ea80227c/1dbb0d31da9bc974f6751c89a6c20f18/image.png Hypothesis - In terms of theta and X Hypothesis - Forward Propagation. https://trello-attachments.s3.amazonaws.com/5958a458e90c043059dd58a8/5b865e773e4451412776512b/5d4177f469403c3cc14e67ce1ac7e265/image.png,https://trello-attachments.s3.amazonaws.com/5958a458e90c043059dd58a8/5b865e773e4451412776512b/e9f956572b9df8bdbd3e87de7d775ef9/image.png Hypothesis - Code1 A simplistic representation of the above looks like this. https://trello-attachments.s3.amazonaws.com/5958a458e90c043059dd58a8/5b9b84b5eddcb976b3020444/73d277931976a388821001485a72bb1d/image.png Hypothesis - Inputs (x) and Activation units (a) Now if we had one hidden layer https://trello-attachments.s3.amazonaws.com/5958a458e90c043059dd58a8/595d6793623af5c69313f4a1/7d437754d0e238049163ecb836333ac0/image.png Hypothesis - Inputs (x) and Activation units (a) Each layer gets its own matrix of weights. Matrix of weights controlling function mapping from layer j to layer j+1 https://trello-attachments.s3.amazonaws.com/5958a458e90c043059dd58a8/5b9b8404748a3847159e0d3b/7c37e7dac411066566eee9641c24868f/image.png Hypothesis - In terms of theta and X Now we can express z, generally for layer j as: https://trello-attachments.s3.amazonaws.com/5958a458e90c043059dd58a8/5b9b8d39339297027965fecc/3c24c31be292c0122274f3a8af64c590/image.png Hypothesis - In terms of Z1 All activation units (a) values can then be obtained like so. https://trello-attachments.s3.amazonaws.com/5958a458e90c043059dd58a8/5b9b85a9e858293fd8e17e30/a3b8145119205699750a30d5f5ccc989/image.png Hypothesis - Activation units and Theta weights Reiterating how to obtain values of the activation units. https://trello-attachments.s3.amazonaws.com/5958a458e90c043059dd58a8/595d684928ba3bff7284cd77/ca63197d4c0bb374286ead117173886e/image.png Hypothesis - In terms of Z Therefore we can also express z j+1 as this. https://trello-attachments.s3.amazonaws.com/5958a458e90c043059dd58a8/595d6de7deeeecc6b9d5c927/6f50c0205b5a2f5a7d25e0077379b1af/image.png Hypothesis - In terms of Z1 Therefore, for activation unit a of j, we can apply the sigmoid function g "element-wise" to the matrix Z https://trello-attachments.s3.amazonaws.com/5958a458e90c043059dd58a8/5b9b8cc3821acd0c7b10ba69/be725d051ed4e89101a3ccff8a756bf8/image.png,https://trello-attachments.s3.amazonaws.com/5958a458e90c043059dd58a8/5b9b8cc3821acd0c7b10ba69/f3e671a1915f4d6060cef7f2f8db8db7/image.png Hypothesis - In terms of Z1 ...which gives us the hypothesis https://trello-attachments.s3.amazonaws.com/5958a458e90c043059dd58a8/595d6fbd472a7a7d17ae3621/f6ca1fa755d5fd359d0df28eff8ee83c/image.png Hypothesis - In terms of Z1 Neural Networks Topic The dimensions of these matrices of weights https://trello-attachments.s3.amazonaws.com/5958a458e90c043059dd58a8/595d69630fad97ab1019bd28/7b5bd8aa753406f401f26fb6d356f81b/image.png Hypothesis - Theta Matrix Dimensions Model https://trello-attachments.s3.amazonaws.com/5958a458e90c043059dd58a8/5bfe751a33f8f16599cc04c8/787e2467c4ea7577bc8d07d9d52fa4a7/image.png Topic
X is a matrix of input features. Its rows correspond to the number of training examples. Its columns to the number of features. It has a dimension of: m example rows x n feature columns Hypothesis - Code1
Theta1 and Theta2 are pre-trained matrices of theta values for a single layer neural network. - Theta1 are the weights applied to the feature input matrix X. - Theta2 are the weights applied to get the output units Hypothesis - Code (Notes on Theta matrix) Calculating training set accuracy https://trello-attachments.s3.amazonaws.com/5958a458e90c043059dd58a8/5b86647c7a53f33b573376b4/8342530f9621c4c473dde86cef75c843/image.png Hypothesis - Vectorising Now we assign a new value "z" to the inputs and theta weights. https://trello-attachments.s3.amazonaws.com/5958a458e90c043059dd58a8/5b9b88f8464b3384e2420114/a09ecde809536660738a613e675a4936/image.png Hypothesis - In terms of Z The +1 comes from the addition in Theta^j of the "bias nodes," x0 â€‹ and Theta0^j Hypothesis - Theta Matrix Dimensions Predict from passing in a single training example. https://trello-attachments.s3.amazonaws.com/5958a458e90c043059dd58a8/5b86654360a30f1e3ff81fad/6a8d775532e65fa6c09db0a0bd3f8b35/image.png Hypothesis - Vectorising Note: Knowing the dimensions of the Theta matrix is important. When you are using matrix multiplication with Neural Nets, it will be useful to know the order in which to apply the Theta matrix in Hypothesis - Theta Matrix Dimensions
The number of rows of the Theta matrices correspond to the number of "target" activation units. Hypothesis - Theta Matrix Dimensions Neural nets, learning their own features: So it's as if the neural network, instead of being constrained to feed the features x1, x2, x3 to logistic regression. It gets to learn its own features, a1, a2, a3, to feed into the logistic regression and as you can imagine depending on what parameters it chooses for theta 1. You can learn some pretty interesting and complex features and therefore 8:43 you can end up with a better hypotheses than if you were constrained to use the raw features x1, x2 or x3 or if you will constrain to say choose the polynomial terms, you know, x1, x2, x3, and so on https://trello-attachments.s3.amazonaws.com/5958a458e90c043059dd58a8/595d64baa1851ecc2099210a/957680e3e0d659e51e82eaf56e9ab78a/image.png Hypothesis - Additional Info
The number of columns of the Theta matrices correspond to the number of "source" input units Hypothesis - Theta Matrix Dimensions
We add a column of 1s to X as bias units. (Important: Add bias units 1st before you transpose) Hypothesis - Code1 2a. The number of rows of the Theta matrices correspond to the number of "target" activation units. Hypothesis - Code (Notes on Theta matrix) In other words, for layer j=2 and node k, the variable z will be: https://trello-attachments.s3.amazonaws.com/5958a458e90c043059dd58a8/5b9b893090fc70358eb948f7/93b7ea3ae5ee483a5fdb8abd86d91d92/image.png Hypothesis - In terms of Z Example, to compute the a(superscript 2) layer https://trello-attachments.s3.amazonaws.com/5958a458e90c043059dd58a8/595d665f1827aadd719dd469/4bbf6c2d788c242027d34b2895bfe3e6/image.png Hypothesis - In terms of Z1
By convention we make a1 = X, as X is the first layer. Hypothesis - Code1 2b. The number of columns of the Theta matrices correspond to the number of "source" input units (including the bias unit) Hypothesis - Code (Notes on Theta matrix) Turning x and z of j into vectors gives us this. https://trello-attachments.s3.amazonaws.com/5958a458e90c043059dd58a8/595d6c09069f5bdd865e9554/1fe0e489f0ba79eaf5c0dc9d805de0b4/image.png Hypothesis - In terms of Z 2c. So Theta1 is num "target" activation units x num "source" input units (including the bias unit) Hypothesis - Code (Notes on Theta matrix)
Then we calculate z by a1 * Theta1 (transpose). The transpose is important so that the inner product of the a1 matrix and the Theta1 matrix are the same to allow for matrix multiplication. Hypothesis - Code1
Recall that X has m example rows x n feature columns Hypothesis - Code (Notes on Theta matrix)
To get a2, we then we apply the sigmoid function to z element-wise. sigmoid(z) Hypothesis - Code1
Feature columns of X is the inner component of its matrix. This needs to match with the inner component of Theta Hypothesis - Code (Notes on Theta matrix)
Now with a2, we repeat the process to get the final out put layer. Hypothesis - Code1
To do this, you can transpose Theta so that its columns which correspond to the number of "source" inputs, is now the rows of Theta after a trans pose. Hypothesis - Code (Notes on Theta matrix)
We add bias units to a2, then apply the sigmoid function to its element z also. Hypothesis - Code1
So now: X (m x n inputs) * Theta1' (n inputs x activation units) Hypothesis - Code (Notes on Theta matrix)