Site Reliability Engineer

Remote, anywhere

Sound is hiring a Site Reliability Engineer to help shape the future of a new music economy that values artists and their music while connecting fans more closely to the music they love.

Sound is a suite of web3-native music and economic tools powering the next generation of artists and their communities. We’re passionate about helping artists capture more value from their art, and connecting fans more closely to the music they love. Since launch, we’ve onboarded over 200 artists (including Snoop Dogg, Pussy Riot, Salem Ilese, RAC, Soulection, and more) and generated over $3.5 million in proceeds that have gone directly to those artists.

As a Site Reliability Engineer, you’ll be responsible for providing a world class experience to our users and artists by detecting and resolving incidents, minimizing outages, and collaborating with product owners to understand user engagement on Sound. We’re looking for a software engineer with a curious mind and the passion to build the next generation of SRE tools and processes within Sound.

What you'll be doing:

  • Expand and keep customer-facing services available at top performance by maintaining the health of supporting systems
  • Work closely with our engineering team to define best practices and goals around availability and resiliency
  • Act in key response roles during major incidents and participate in the technical review of each incident
  • Contribute to technical design and architecture discussions and decisions as well as technical troubleshooting across our stack
  • Design, build and operate core infrastructure that enables scaling to support hundreds of thousands of concurrent users
  • Setup and perform regular load and stress testing, interpreting the results and leading the implementation of improvements to address bottlenecks, increase resiliency and improve scalability
  • Develop, manage and operate real-time production monitoring, instrumentation and telemetry
  • Ability to operate in a fast paced environment and troubleshoot complex issues quickly while successfully juggling multiple priorities

Who we're looking for:

  • 4+ years of experience in a senior hands-on site/system reliability role
  • Proficiency in TypeScript and GraphQL
  • Experience with building and scaling services using technologies from AWS and CloudFlare
  • Experience deploying and operating EKS services and databases such as Postgres and Redis
  • Ability to participate in an on-call rotation and to work independently with minimal supervision
  • Excellent problem solving skills with a systematic and thorough approach and a bias for action
  • Track record for being able to diagnose problems within complex systems


  • Experience with message queues and caching infrastructure at-scale
  • Experience designing and implementing microservices and event-driven architectures
  • Experience with modern frameworks (React, Relay, Next.js)
  • Understanding of Ethereum, Arweave, IPFS architectures
  • History of open source contributions

Benefits at Sound:

  • We offer top-of-the-line benefits, including health, mental health, dental, and vision insurance.
  • Remote-first teamwork with team and community members around the world
  • Work-from-home/remote office stipend
  • Team offsites for periodic collaborative strategy sessions in person
  • Passionate, supportive team dedicated to learning and growing together in web3

Sound is an equal opportunity employer. We do not discriminate based on gender, ethnicity, sexual orientation, religion, age, civil or family status, disability or race.


Apply for this job
Share this job opening

DevOps and Dev jobs in your inbox every week.

Thank you! You'll receive a confirmation shortly
Oops! Something went wrong while submitting the form.
Made with love️ by Mohamed Labouardy.