Senior SRE Engineer

Underneath the Mox platform is a large-scale, microservice-based cloud architecture that facilitates performance, reliability, and scalability of our products. Mox SRE team is responsible for the observability, availability, reliability of the Mox platform. We are looking for motivated individuals to support, automate, and improve Mox’s infrastructure using the latest technology.

  • Serve as a steward of Mox production environment through providing oncall support, incident response, collaborative debugging, blameless post-mortems, and SRE best practices.
  • Provide monitoring of service availability, latency, and overall system health.
  • Implement systemic improvements to reliability and operational excellence.
  • Codify and rollout shared tooling and process to enable development teams to stay agile while improving system availability, performance, and maintainability.
  • Collaborate with engineers in peer teams to develop reliability solutions that work effectively in the Mox ecosystem.
  • Bachelor’s degree or equivalent practical experiences
  • 5 years of professional experiences in a reliability engineering or system engineering.
  • Demonstrate understandings in fundamentals of OS, networking, distributed computing, and cloud computing.
  • Recent experiences with Terraform/Ansible on cloud provisioning
  • Strong scripting skills -- Python, Bash
  • Strong knowledge of Linux/UNIX environment
  • Experiences participating in an on-call rotation
  • Experiences working with Cloud environments such as AWS
  • Understanding on DevOps toolchain: GitHub, CircleCI, Artifactory
  • Mindset for automation and continuously improving the infrastructure
  • Strong communication and collaboration skills