Parameter is Not All You Need:Starting from Non-Parametric Networks for 3D Point Cloud Analysis read more
Towards Understanding Ensemble, Knowledge Distillation and Self-Distillation in Deep Learning read more
Deep Transformers without Shortcuts: Modifying Self-attention for Faithful Signal Propagation read more