Development Pipeline

  1. Datastore
  2. Auto-scaling
  3. GPU clusters
  4. Spot machines
  5. User management
  6. GKE support
  7. VS Code support
  8. Image management service
  9. Monitoring (grafana)
  1. Distributed Runs/Multi-node training (K8, Horovod)
  2. HP Search (Optuna, hyperopt)
  3. Dask clusters



Image management service

  1. Import docker images
  2. Python version + Framework version
  3. Save images created as base images