.What is Architecture in deep learning(AI)?
Architecture is the part of deep learning which has responsibility to learn from data.It contains weights, which is learned from your past data.It contains all learnable components.Since deep learning promise us to teach anything with basic simple architecture but than also it requires lots of data and will take more time and GPU usage to attain high accuracy.But in real world we often don't able to accumulate that much data which will train our model effeciently or in case if we able to accumulate that much data we have to spend more time to train our basic architecture to learn properly on GPU's.As GPU's are expensive we need to spend lots of money on GPU's which increases our cost directly.So to decrease our cost and make our own model to learn from less data we need to create good architectures.Good architecture learn from less data in less time with high accuracy results.
Architecture is basically a stacks of layers in which data is passed one by one.Which change data at each layer and learn some weights(numbers) because of that data.Sequence and type of layers affects architecture efficiency and learning power.Terms and nomenclature used in deep learning for architecture are layers,weights,activation function and activations.In this weights and activations both are numbers ,layers and activation function represents some mathematical functions in which data is passed.We will try to explains those above terms using an simple MNIST explain with just linear layer which has two layers both are linear.Don't worry if you don't understand linear layer meaning we will explain it later in this post.For just assume that it contains weights and bias which both are learnable.
AS you see in above figure we have design two linear layer architecture for MNIST data set and one activation function which is (RELU) in between them.
LAYERS:-layers means a function which contains learn-able weights.Here in the above architecture we have used linear functions which is y=ax+b where 'a' is weight and 'b' is bias which both are learnable.
WEIGHTS:-Weights is just a number which learnt through data.Since in above example we have two linear layers which contains weights which is numbers,in starting it will be random numbers but as our loss function will improve that number as we train.
ACTIVATIONS:-activations is also a number which is created after passing input to linear layer.Output of linear layer and activation function is a number which is called Activations.
ACTIVATION FUNCTION:-it is mathematical function use to transform output from linear layers which are activations and transform that a little bit because of that function Different type of activation function are RELU,LEAKY-RELU,tanh,softmax,sigmoid.
Different type of layers which are used in designing architecture are Linear layers,Convolution Neural Networks,Recurrent Neural Networks,LSTM(Long Short Term Memory),GRU(Gated Recurrent Unit),Batch Norm Layer,Dropout Layer and many more.To make you fully understand it will require blog for each layer.We will discuss some layers in some detail to give you purpose of each layer.