Machine Learning System Design Blogs/Website

2022-12-25

Systems Design Crash Course for MLE

Level of system understanding:

  1. Foundational knowledge: basic knowledge like clients/servers, network protocols. TCP: transmission control protocl. Send arbitrary of data and resend, handshake; HTTP: hypertext transer protocol, internet; HTTPS: with secure HTTP, has TLS and SSL. Peer to peer network: a collection of machines which divide workload between themselves to complete the workload faster.
  2. Characteristics: latency/availability/throughput/qps/redundancy/consistency. Data storage: structured/unstructured, consistency, reliability, persistence, availabity. Disk/memoery/index. Counting nines: 99.99%. Redundancy: passive redundancy; avtive redundancy. Polling(fix time to query)/Streaming(wait the request and process)
  3. Implementation: load balancer/rate limiting/leader elections/p2p network/logging and monitoring/Publish&Subscribe(add message/topics)
  4. Tech: Zookeeper(leader election), Redis, Kafka, gRPC

From here

Additional info:

  • Vizier for the parameters tuning. Google Vizier: A service for Black-Box Optimization.
  • Avoid DOS attach tech

What’s ML system design interview and how to prepare for it

Template:

  1. Problem clarification: understand problem/clarify questions and business related(QPS)/data size/latency/energy/memory constraint/corner case.
  2. Data collection: user interactive in system log/human label/transform.
  3. Exploratory data analysis: study of the features/feature importance analysis/dimenstion reduction. Data distribution(imbalance); calculate the feature distribution, covariance, correlation, PCA etc to select features.
  4. Model metric selection: model type, F1 score, accuracy/precision/recall, mAP, mIoU. Model type(classification, regression, etc); prefer to use one backbone, memory/computation usage. Business metric: classification -> accuracy/precision/recall/F1 score/PR-AUC/AR-AUC. regression => MSE/MAE/R-squared/adjusted-squred. object detection: IoU/AP. RL: reward/return/Q-Value. System/Hardware: latency/energy/power. Business-related: user retention, daily/month active users(DAU/MAU), CTR, new users, engage time.
  5. Model training: loss design, architecture, vanish gradient, optimizer, backpropagation etc. Loss function: cross entropy/MSE/MAE/Huber loss/Hinge loss. Regularization: L1/L2/entropy regularization/K-fold CV/dropout. Optimizer: SGS/AdaGrad/Momentum/RMSProp. Vanishing gradient. Activation functions: Linear/ELU/ReLU/Tanh/Sigmoid. Other: imbalanced data/overfitting/normalization.
  6. Offline/Online evaluation: A/B test, model monitor for feature skew/model freshness.

From here