The SRE role will span across software development and operations engineering. This role will be instrumental in building stable, scalable, and reliable systems for a growing set of Native SaaS suite of products. This role will help to reduce the friction for developing and deploying these products for the engineering team. You will help build a best-in-class Cloud Platform and services. You will be required to be part of the incident management process to maintain the predefined SLA’s and communication with internal and external stakeholders.
- Proactively monitor the availability and performance of the cloud offerings using key tools. Take a holistic view of system health.
- Participate and be responsible for creating and maintaining well-defined runbook and playbook recipes for quick MTTR’s.
- Effectively and quickly respond to Monitoring alerts and incident tickets coming into Site Reliability team
- Build software and systems to automate provisioning and management of foundation infrastructure and services
- Improve reliability, quality, and time-to-market of our suite of software solutions
- Gather and analyze metrics from systems and services to assist in performance tuning and fault finding with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve.
- Partner with development teams to reduce day to day toil, create sustainable systems and services through automation at all stages of the engineering and deployment process
- Participate in system design consulting, platform management, and capacity planning
- Balance feature development speed and reliability with well-defined SLI and SLO’s. Define, measure, and optimize SLI and SLO’s.
- Excellent communication skills; collaborative and personable
- Cloud Computing -SaaS, PaaS, IaaS fundamentals and experience designing and implementing SaaS solutions
- Hands-on experience with IaC(Infrastructure as Code), CI/CD, and modern tooling such as Terraform, CDK, CloudFormation, Gitlab CI.
- Cloud Certification and Experience in AWS/Azure/GCP.
- Experience in Windows, Unix environment
- Experience managing container-based services in production using tools such as Docker, Kubernetes
- Ability to script (or code ideally) with (Shell Scripting, Python, Go)