Systems Design Crash Course for MLE
Level of system understanding:
- Foundational knowledge: basic knowledge like clients/servers, network protocols. TCP: transmission control protocl. Send arbitrary of data and resend, handshake; HTTP: hypertext transer protocol, internet; HTTPS: with secure HTTP, has TLS and SSL. Peer to peer network: a collection of machines which divide workload between themselves to complete the workload faster.
- Characteristics: latency/availability/throughput/qps/redundancy/consistency. Data storage: structured/unstructured, consistency, reliability, persistence, availabity. Disk/memoery/index. Counting nines: 99.99%. Redundancy: passive redundancy; avtive redundancy. Polling(fix time to query)/Streaming(wait the request and process)
- Implementation: load balancer/rate limiting/leader elections/p2p network/logging and monitoring/Publish&Subscribe(add message/topics)
- Tech: Zookeeper(leader election), Redis, Kafka, gRPC
From here
Additional info:
- Vizier for the parameters tuning. Google Vizier: A service for Black-Box Optimization.
- Avoid DOS attach tech
What’s ML system design interview and how to prepare for it
Template:
- Problem clarification: understand problem/clarify questions and business related(QPS)/data size/latency/energy/memory constraint/corner case.
- Data collection: user interactive in system log/human label/transform.
- Exploratory data analysis: study of the features/feature importance analysis/dimenstion reduction. Data distribution(imbalance); calculate the feature distribution, covariance, correlation, PCA etc to select features.
- Model metric selection: model type, F1 score, accuracy/precision/recall, mAP, mIoU. Model type(classification, regression, etc); prefer to use one backbone, memory/computation usage. Business metric: classification -> accuracy/precision/recall/F1 score/PR-AUC/AR-AUC. regression => MSE/MAE/R-squared/adjusted-squred. object detection: IoU/AP. RL: reward/return/Q-Value. System/Hardware: latency/energy/power. Business-related: user retention, daily/month active users(DAU/MAU), CTR, new users, engage time.
- Model training: loss design, architecture, vanish gradient, optimizer, backpropagation etc. Loss function: cross entropy/MSE/MAE/Huber loss/Hinge loss. Regularization: L1/L2/entropy regularization/K-fold CV/dropout. Optimizer: SGS/AdaGrad/Momentum/RMSProp. Vanishing gradient. Activation functions: Linear/ELU/ReLU/Tanh/Sigmoid. Other: imbalanced data/overfitting/normalization.
- Offline/Online evaluation: A/B test, model monitor for feature skew/model freshness.
From here
