Senior Site Reliability Engineer

Remote, USA
$135k - $194k

Your passion for uptime was forged deep within Fires of Production and tempered on the Great Anvil of Incident Response. You're an Expel Site Reliability Engineer - a protector and champion of Expel's reputation for service reliability.

You understand that operational reliability is a shared mission across all of engineering, and that your role is to make it as easy as possible for Expel to achieve that mission. And the only thing that excites you more than successfully diagnosing that cryptic kubelet error is gleefully watching as feature teams leverage a well-defined error budget to make effective tactical bets and trade-offs.

You use your passion for reliability and connection with the broader SRE community to ensure Expel is sticking to best reliability practices within the cloud native ecosystem.

Innovation comes naturally to you, but you're also eager to help others. You thrive when operating in a support or consultative capacity.

We're extending the Expel Core Platform by building features that enable users to self-service their own service reliability, monitoring, SLO, and incident response needs. Does that idea whet your appetite? If so, we should chat!

  • What Expel Can Do For You
  • Provide experience growing and maintaining reliability-focused platform features within a cloud native engineering platform using cutting edge infrastructure and tooling (Kubernetes/GKE runtime, Hashicorp toolset, etc)
  • Provide exposure to the information security space.
  • Work as part of a geographically distributed team in a highly collaborative culture where team members learn from and support each other.
  • What You Can Do For Expel
  • Play your part through project work to build and maintain platform features that cut across the Expel product's reliability, networking, and cloud infrastructure.
  • Contribute by pushing IaC commits daily, with occasional opportunities to write and test application code in Python, Golang, and Javascript
  • Mentor service owners on how to use the platform in order to deploy, measure, monitor, and operate their own services at scale.
  • Participate in a weekly support rotation that includes taking the on-call pager and providing nearly on-demand working-hours support to platform users.
  • Provide incident response, triage, and root cause analysis support
  • Poke fun at our leadership team in creative ways.
  • What You Should Bring With You
  • A passion for learning and improving your work product
  • Significant experience operating Kubernetes within highly distributed environments
  • Experience running systems in GCP or AWS
  • Exposure to monitoring and observability infrastructure and standard methodologies
  • An understanding of infrastructure-as-code practices, tools, and patterns
  • Some experience developing software in Linux environments, preferably with Python and/or Golang
  • A customer-minded approach that enables the success of platform users as well as building trust across the organization.
  • A collaborative disposition that allows you to work optimally on and across teams
  • Four years of systems experience either in operations or development
  • Missing some items on the list? That's ok! We still want to talk to you!

How Our Team Works Together

We build and run teams where everyone is pulling in the same direction and is learning from each other:
  • We work out of a shared backlog
  • We pair program weekly, as it makes sense
  • We peer-review everything
  • We do weekly blame-free retros to reinforce what's going well, so we do more of it, and surface what's not going well, so we can do something about it. Same thing for projects and significant operational problems.

Our hiring process

We respect your time. You'll hear from us by the end of the next business day after completing an interview.

We also have a goal that all Expletives have a great manager and have a voice in how their team is run and who runs it. It's not the shortest process in the industry, but you'll get to meet nearly everyone you'll work with day-to-day and your Engineering leadership. New Expletives consistently say our interview process gave them an accurate picture of what it's like to work here.

Here's our 3-stage process for this position (5.5 hours total interviewing time):
  • Chat with a recruiter (30 min)
  • Video interview with hiring manager (Engineering Manager) (60 minutes)
  • Pair programming interview (with two engineers) (60 minutes)
  • "Virtual onsite interview" (can be scheduled contiguous or broken up, 60 minutes each):
  • Engineering leadership (Engineering Director and Manager of Delivery Experience)
  • System design interview (with two engineers)
  • Technology and skills interview (with two engineers)

The Details

The base salary range for this role is between $135,000 USD and $194,400 USD + bonus eligibility and equity.

We believe in paying transparently and equitably. Your salary will ultimately be based on factors such as your experience, skills, team equity, and market data. You'll also be eligible for unlimited PTO (which we model and encourage), work location flexibility, up to 24 weeks of parental leave, and really excellent health benefits.

This role will be based out of our offices in Herndon, Virginia. We will consider remote work for this position.

At Expel, we ask our crew to provide their covid-19 vaccination status because it's helpful to understand this data at a company level and we expect that customers, partners, and conferences will start asking us to attest to the vaccination status of our people.

We're only hiring those authorized to work in the United States. We do not currently sponsor immigration visas.

We're an Equal Opportunity Employer: You'll receive consideration for employment without regard to race, sex, color, religion, sexual orientation, gender identity, national origin, protected veteran status, or on the basis of disability.

We'll ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please let us know if you need accommodation of any kind.

We will ensure that individuals with disabilities are provided reasonable accommodation to participate in the job application or interview process, to perform essential job functions, and to receive other benefits and privileges of employment. Please contact us to request accommodation.


Apply for this job
Share this job opening

DevOps and Dev jobs in your inbox every week.

Thank you! You'll receive a confirmation shortly
Oops! Something went wrong while submitting the form.
Made with love️ by Mohamed Labouardy.