AI Infrastructure Engineer

Remote, USA

As a member of the AI Infrastructure team at Scale, you will be responsible for building systems to accelerate the development and deployment of machine learning models built by our Research team. Our models span the range from computer vision, deep learning, and natural language processing, and are trained on massive datasets to deliver improvements to our customers.

We are building a large hybrid human-machine system in service of ML pipelines for dozens of industry-leading customers. We currently complete millions of tasks a month, and will grow to complete billions of tasks monthly.

You will:

  • Build elastic data pipelines that process billions of events per day.
  • Build highly available and observable model inference services.
  • Work with our ML research to automate aspects of our pipeline and deploy research models in production.
  • Work with our Infrastructure team to build core abstractions and create standards and best practices for building systems.
  • Be a self-starter who can own projects end-to-end, from requirements, scoping, design, to implementation.
  • Have good taste in building systems and tools and know when to make build vs. buy tradeoffs, as well as having an eye for cost efficiency.
  • Have attention to detail and a good sense for automation, debugging, and troubleshooting.

Ideally you'd have:

  • Solid background in algorithms, data structures, and object-oriented programming.
  • Experience in building scalable and fault-tolerant distributed systems that process large volumes of data.
  • Degree in computer science or related field.
  • Please note, this role is not open to new grads or interns at this time

Nice to have:

  • Experience working with a cloud technology stack (eg. AWS or GCP).
  • Experience building machine learning training pipelines or inference services in a production setting.
  • Experience building, deploying, and monitoring complex microservice architectures.
  • Experience with machine learning frameworks and libraries (PyTorch, Tensorflow, Kubeflow, Seldon).
  • Experience with big data tools (Spark, Flink, Hadoop) and building ETL and streaming pipelines.
  • Experience with Python, Docker, Kubernetes, and Infrastructure as code (e.g. terraform).

About Us:

At Scale, we believe that the transition from traditional software to AI is one of the most important shifts of our time. Our mission is to make that happen faster across every industry, and our team is transforming how machine learning can build innovative products. Our products provide access to human-powered data for hundreds of use cases and are used by industry leaders such as Open AI, Lyft, GM, Samsung, Airbnb, NVIDIA, and many more. We've recently raised $325 million in Series E funding at a valuation of $7B+ and are expanding our team to accelerate the development of AI applications.

We believe that everyone should be able to bring their whole selves to work, which is why we are proud to be an inclusive and equal opportunity workplace. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability status, gender identity or Veteran status.

We are committed to working with and providing reasonable accommodations to applicants with physical and mental disabilities. If you need assistance and/or a reasonable accommodation in the application or recruiting process due to a disability, please contact us at Please see the United States Department of Labor's EEO poster and EEO poster supplement for additional information.


Apply for this job
Share this job opening

DevOps and Dev jobs in your inbox every week.

Thank you! You'll receive a confirmation shortly
Oops! Something went wrong while submitting the form.
Made with love️ by Mohamed Labouardy.