The Senior Site Reliability Engineer will design and maintain Infrastructure as Code solutions, enhance cloud infrastructure, lead incident responses, and mentor junior engineers.
Joining Razer will place you on a global mission to revolutionize the way the world games. Razer is a place to do great work, offering you the opportunity to make an impact globally while working across a global team located across 5 continents. Razer is also a great place to work, providing you the unique, gamer-centric #LifeAtRazer experience that will put you in an accelerated growth, both personally and professionally.
Job Responsibilities :We are seeking a skilled and driven Senior Site Reliability Engineer (SRE) to join our growing infrastructure and platform engineering team. The ideal candidate will have hands-on experience in Amazon Web Services (AWS), strong troubleshooting capabilities, and a passion for building scalable, observable, and resilient systems using modern Infrastructure as Code (IaC) and automation tools.REQUIREMENTS:
- Bachelor’s degree in Computer Science, Software Engineering, Information Technology, or a related field.
- Minimum 3 years of experience in SRE, DevOps, cloud infrastructure, or system administration roles.
- Hands-on expertise with AWS Cloud Services, including:
- Compute & Containerization: EC2, Lambda, ECS, EKS, Auto Scaling
- Networking: Load Balancers, VPC, Route 53, Security Groups, Firewalls
- Storage & Databases: RDS, ElastiCache, Athena, S3
- Messaging: SQS, SES
- Deep understanding of Infrastructure as Code (IaC) tools such as Terraform and CloudFormation.
- Proficiency in at least one programming/scripting language: Python, Node.js, Bash, Ruby, or related.
- Experience operating and troubleshooting across Linux, Windows, and container-based environments.
- Strong understanding of distributed systems, cloud networking (routers, switches), firewalls, DNS, and HTTP/TLS.
- Experience implementing monitoring and alerting systems and working with incident management processes.
- Experience with Zero Downtime Deployments, blue/green or canary deployments.
- Familiarity with cost optimization and right-sizing AWS resources.
- Exposure to multi-region, multi-account AWS architecture.
- Understanding of API gateway, or edge networking (e.g., Akamai, CloudFront).
JOB DESCRIPTION:
- Design, implement, and maintain Infrastructure as Code (IaC) solutions using Terraform and/or CloudFormation across multi-account AWS environments.
- Collaborate with developers, architects, and DevOps teams to build scalable, secure, and observable cloud infrastructure.
- Lead and participate in architecture design sessions, focusing on system reliability, scalability, security, and performance.
- Implement and manage robust monitoring, alerting, and observability solutions (e.g., CloudWatch, Prometheus, ELK, Datadog).
- Set and monitor Key Performance Indicators (KPIs) for system uptime, latency, throughput, and overall reliability.
- Drive incident response processes, including coordination, triaging, resolution, documentation, and post-incident reviews (PIRs).
- Supervise and mentor junior SREs and infrastructure engineers, fostering knowledge-sharing and team growth.
- Collaborate across development, operations, and security teams to ensure secure and compliant deployments.
- Automate manual tasks and workflows through scripting and tooling (Python, Node.js, Bash, Ruby, JSON/YAML).
- Troubleshoot complex infrastructure issues across Linux, Windows, Docker, and cloud-native environments.
- Provide IaC and CI/CD best practices to ensure repeatability, scalability, and compliance across all environments.
- Provide on-call support, participate in incident rotations, and lead technical investigations during outages or degradations.
- Strong understanding and experience for Disaster Recovery (DR).
- Provide support and solution handling to incident and tickets assigned.
Pre-Requisites :
Are you game?
Top Skills
Amazon Web Services (Aws)
Bash
CloudFormation
Cloudwatch
Datadog
Docker
Elk
Linux
Node.js
Python
Ruby
Terraform
Windows
Similar Jobs
Fintech • Machine Learning • Payments • Software • Financial Services
The Principal Risk Specialist will assess, monitor, and mitigate risks, conduct controls testing, and manage audits. They will liaise with auditors and perform risk reviews.
Top Skills:
GrcRisk Management Frameworks
Artificial Intelligence • Enterprise Web • Information Technology • Productivity • Sales • Software • Database
Lead and enhance the onboarding experience for support teams, designing scalable training programs and measuring their success through data analysis and continuous improvement.
Top Skills:
Apollo.IoFeedback DashboardsGoogle SuiteIntercom SystemsLmsMaestroNotionSalesforceSurvey Tools
Big Data • Food • Hardware • Machine Learning • Retail • Automation • Manufacturing
As a Security Automation Developer, automate security operations, develop integration workflows, and collaborate with security teams to enhance automation solutions.
Top Skills:
Programming LanguagesSecurity Automation ToolsSoar Platforms
What you need to know about the Sydney Tech Scene
From opera to comedy shows, the Sydney Opera House hosts more than 1,600 performances a year, yet its entertainment sector isn't the only one taking center stage. The city's tech sector has earned a reputation as one of the fastest-growing in the region. More specifically, its IT sector stands out as the country's third-largest, growing at twice the rate of overall employment in the past decade as businesses continue to digitize their operations to stay competitive.



