- Build effective monitoring systems to maintain the stability and reliability of applications and services.
- Work closely with internal teams to support troubleshooting and investigation.
- Manage on-call shift/rotation for incident management.
- Lead incident response and conduct post-mortem analysis.
- Other duties as required.
- Degree in IT, Computer Science/Engineering with at least 4 years of experience in a DevOps environment
- Familiar with monitoring and supporting large-scale production systems
- Familiar with cloud services (e.g. Azure Kubernetes) and DevOps/Automation tools (eg. Jenkins, Ansible, Selenium).