-
Online Optimization Single pass of training data, the metric calculated before training.
- ML efficiency
Bandwidth(number of models can train concurrently); Latency(end-to-end evaluation time for a new model), throughput(models that can be trained per unit time)
- bottlenecks: wider is better, but reduce the embedding dimension is enough. Replace HW to HUV matrix factorization.
- AutoML. Weight-sharing network, RL controller, constraints.
- Data sampling. Re-balancing/Loss-based sampling.
- Loss Engineering
Distillatiion/Shampoo/DCN.
- Rankloss.Pariwise logic; combining rankloss with logitic loss.
- Distillation.
- Curriculums of losses. 2nd order optimization: Shampoo.
- Irreproducibility ReLU -> SmeLU
On the Factory Floor ML Engineering for industrial-scale Ads Recommendation Models
2022-01-02
