Back to jobs

Site Reliability Engineer

Job description


  • To oversee availability, reliability, resilience, performance, security, and monitoring of applications on Azure Cloud and various supporting platforms to ensure business operational SLA and SLO are met.
  • Conduct incident management, cost management and application health monitoring.
  • Link dev and ops by applying software engineering mindset and instilling Agile approach.
  • Maintain and improve the resiliency of core applications and infrastructure platforms through a continuous improvement backlog.
  • Possess a modern approach aligned to things such as Infrastructure as Code, Configuration as Code, and DevOps.


  • You are interested in service reliability, automation, monitoring, scalability, and high-availability systems.
  • Experienced in executing support function for customer-facing products and services handling incidents under a service management framework and agile methodologies.
  • Basic understanding of how Docker and Kubernetes work end to end.
  • Basic experience in coding or scripting to support our system whenever we have some problems

To apply, please click "APPLY NOW" or email Han at quoting reference number AGP272835
Data provided is for recruitment purposes only.

Due to the volume of applications received, we regret to inform you that only shortlisted candidates will be notified. *Li-IT
JTK Number: JTKSM 995 | Company Registration Number: 201301019088 (1048918-T)

If this job isn't quite right for you, but you know someone who would be great at this role, why not take advantage of our referral scheme? We offer MYR500 in shopping vouchers for every referred candidate who we place in a role. Terms & Conditions Apply.