This is the checklist I use for assessing service production readiness. It is sorted by importance, from the most important items to “nice to have” things.

  • CI/CD (Routine tasks automated)
  • Basic logs with verbosity levels per subsystem
  • Application metrics
  • Dashboard
  • System metrics
  • Health-checks
  • Alerting and runbooks
  • Documented dependencies
  • Backups
  • SLO
  • Horizontal autoscaling
  • Graceful degradation in case of a dependency failure
  • Resource constraints
  • Rate-limiting
  • Feature-flags support for non-critical functions
  • Performance test
  • On-call rotation
  • Multitenancy support (Testing in production with different request contexts) - https://eng.uber.com/multitenancy-microservice-architecture/