Deep Learning Summary

2021-09-01

tech

Loss

Cross entropy: see the ML part.
Regression loss: MSE

Forward & Backward Propagation

Convolution Backward Update 2

cherimola

Excellent Material

LeNet code

More ref: BP!, cs231n, BP2, BP3, BP4

Pooling layer forward & backward

就是对于max pooling，那就是只有max的地方才有梯度，其它是0。对于average pooling那就是平均梯度。

Max pooling forward & backward:

cherimola

Average pooling forward & backward:

cherimola

ReLU

取0或者1.

Batch Normalization

解决internal covariance shift的问题，处理梯度爆炸或者梯度消失。对于batch要求，越大越好，16 -》8，性能下降严重。

对于其它函数，可以通过导数的定义得到。

More ref:

https://www.jianshu.com/p/e160983d601a

https://www.zdaiot.com/MachineLearning/%E5%8D%B7%E7%A7%AF%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C/%E5%8D%B7%E7%A7%AF%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C%E5%89%8D%E5%90%91%E4%B8%8E%E5%8F%8D%E5%90%91%E4%BC%A0%E6%92%AD/

https://zhuanlan.zhihu.com/p/343564175

https://www.zhihu.com/question/465918667

https://zhuanlan.zhihu.com/p/358220419

https://zhuanlan.zhihu.com/p/359524837

https://zhuanlan.zhihu.com/p/360441891

https://github.com/godweiyang/NN-CUDA-Example

Component

Batch Normalization

解决internal covariance shift的问题，处理梯度爆炸或者梯度消失。对于batch要求，越大越好，16 -》8，性能下降严重。

ReLU

实现非线性化的部分。

Upsampling

cherimola

Transposed convolution

Fractionally-strided convolution; deconvolution.

To decode the learning parameters and reconstruct information.

cherimola

Best Network Architecture Recently

Conv method: EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
DETR ：End-to-End Object Detection with Transformers
MLP: MLP-Mixer: An all-MLP Architecture for Vision

Unbalance Label

Re-sampling: over sampling on small size label
Synthetic Samplings: add Gaussian noise into this data; interpolation. SMOTE
Re-weighting: according to the inverse label size -> Focal loss.
Transfer leaning: large sampling to train the network, small network, use the large sampling model to teach the samll network
metric learning
Decoupling of representation & classifier.

如何处理稀疏学习

Deep Learning Fin-tune skills

// Net depth will decide the learn ability. And width will decide the infomraiton of this layer.

// Perception field will be larger in the end.

// The pooling will be in the previous layers with higher frequency.

// Initialization is very important, the normal initialization will be much better than xavier.
// https://zhuanlan.zhihu.com/p/69026430

// 1 x N convolution, could improve the perception field.

// Learning first -> add BN/data augmentation -> adjust layers/convolution.

// Loss is better than accuracy

// receptive field decide.

// Training set & validation set, early stop, over-fitting/under-fitting.

// ReLU/Leaky-ReLU/Sigmoid/Tanh.

// Overfitting -> L1, L2 norm.

// Optimizer choose: Adagrad, Adadelta, RMSprop, Adam -> Adam

// Residual block & BN layer.

// relu激活函数初始化推荐使用He normal，tanh初始化推荐使用Glorot normal，其中Glorot normal也称作Xavier normal初始化；数据预处理方法一般也就采用数据归一化即可。

// Data Preprocessing

// New scenarios will cause the covariate shift. Training set distribution change; cross validation; expand test set; https://www.zhihu.com/question/289025990