DevOps and Site Reliability Engineer (SRE)
About Game Analytics
From indie developers, games studios, to established publishers, GameAnalytics is currently the #1 analytics tool for anyone building a mobile game. Our network is approaching 100,000 games, which are played by more than 1 billion people each month.
What's our mission? To help game developers make the right decisions based on data. And by joining our team, you'll be working on new and innovative products to help tens of thousands of people in the industry do just that.
All of GameAnalytics infrastructure is hosted in AWS and managed using Terraform. We operate numerous Erlang and Elixir clusters handling over 4.5 million requests per minute. The backend services are being built and deployed automatically and reliably using a combination of Terraform, Ansible, Packer, Docker and CI tools. Our data pipeline feeds various sources using real-time streams and batch ingestion jobs. We use Graphite and CloudWatch to aggregate application and host metrics, with PagerDuty to raise alerts.
At GameAnalytics we value good quality code and take reliability very seriously. Despite a rapid growth of the business in the past two years, we managed to achieve a 100% uptime of our core systemsand plan on retaining that result. We love open source and welcome contributions. See our GitHub page.
What you’ll do
As a DevOps and Site Reliability Engineer you will be the key person ensuring that our cloud infrastructure continues to be secure, stable and cost-efficient by building continuous integration and deployment pipelines, setting up monitoring tools and promoting best practices while working closely with engineering teams on both new and existing projects.
- Experience deploying and operating distributed systems at scale
- Experience working with cloud providers (AWS, GCP, Azure)
- Experience with popular infrastructure provisioning and management tools, such as Terraform, Ansible, Packer
- Good understanding of continuous integration/deployment tools and practices
- Security-oriented mindset, understanding of good practices
- In-depth knowledge of Linux internals, networks and filesystems
- Experience with monitoring and alerting systems (we use Graphite, CloudWatch and PagerDuty)
- Experience maintaining production databases (we use Apache Druid, Redis and various RDBs)
- Knowledge of streaming and big data technologies such as Hadoop and Kinesis/Kafka
- Willing to take part in on-call Rota along with other engineers
- Food, snacks and drinks
- A fun and supportive working environment
- Office in Central London (with bike racks) and entertainment area incl. newest consoles and popular games
- Opportunity to work with the biggest and most innovative gaming companies in the world
- Number of paid holidays (25) excluding bank holidays
- Working from Home policy
- Work laptop (Mac/Linux) depending on your preference