Site Reliability Program Manager

Salary Competitive

At Flexera Site Reliability Engineering is responsible for the reliability of our SaaS offerings. This team works with product development to define our Service Level Objectives and performs the work required to ensure we meet those SLOs. These teams employ agile and lean principles in a culture of constant learning and improving


As a Program Manager, you will be responsible for ensuring that SRE is working well with Product Development teams, sync'ing with Product Owners to help plan for the roadmap and identify risk early on and sync'ing with Product Development leads to help identify pain points with our existing systems. You will also be responsible for ensuring that our current product services meet the standard of SRE; ensuring that we have the appropriate metrics in place for a product service, that our deployment pipelines are optimal, that we're adhering to any contractual and regulatory obligations that we may have etc.

Responsibilities

- Own end-to-end availability for a product service

- Work with product service teams to establish SLIs and error budget's, and nurture an environment that appreciates the value that they add

- Identify opportunity for increased monitoring capabilities (white-box & black-box)

- Identify long-term trends for product services (how is my traffic growing over time? How big is the database getting? What does our resource usage patterns look like over time?)

- Ensuring that short-term hacks, are replaced with long-term solutions

- Co-ordinating incident response as part of an on-call rotation, ensuring the SREs aren't being overloaded by on-call, and continually refine the process and tools that enable us to do incident response successfully

- Ensuring that RCAs are being carried out effectively, and that they are being done in a blame-free manner

- Attend the portfolio management team meetings to flag reliability considerations for upcoming work, and to reason about any reliability concerns from other stakeholders

- Populate the SRE backlog

- Identify requirements surrounding load testing, security testing, availability and disaster recovery

- Help mature the delivery process for teams; defining Jenkins pipelines, designing canary release deploys, building in automated fallbacks, optimizing the build chain etc

- Optimise product service code to ensure that it's secure, scalable and performant

- Optimise release engineering code to ensure that it's stable, repeatable and fast

- Improve the fault detection for our services

- Create dashboards which help communicate the metrics for a given product service

- Work with product owners and product engineering teams to perform capacity planning

- Work with product engineering teams to understand performance and behavior patterns

- Help carry out root cause analysis for incidents, and design solutions (both software and human processes) that will help to ensure the same problem doesn't happen in the same way again

Minimum Qualifications

Computer Science degree, or related industry experience managing a mission critical production team for at least 2 years

Critical Skills / Competencies

- Comfortable writing code with one or more of the following languages: Python / Go / Java / C# / C / C++

- Experience working with product owners and product development to prioritise work, flag risk and identify potential production engineering issues (e.g. scalability, resiliency, performance)

- A positive attitude and willingness to learn

- Experience with IaaS and Serverless services from a cloud provider

- An understanding in TCP/IP, DNS and experience designing networks

- Linux system administration experience

- Strong conflict resolution competence

- Excellent written and verbal communication skills

- Experience implementing fault detection, and automating fixes

- An understanding of a range of data storage technologies, including SQL databases

- Experience designing scalable services

- Experience designing distributed, fault-tolerant systems

- Experience managing services in AWS

- Detail oriented. The ideal candidate is one who naturally digs as deep as they need to understand the why

Bonus Skills

The following list of items are not pre-requisites for the role, but might give you a bit more of an idea about what you may expect to come across in your SRE – Program Manager role at Flexera:

- Python / Golang / Java / C# / C / C++ / Bash experience

- Jenkins pipelines

- MSSQL, Informix, Elasticsearch

- Terraform, Packer & Docker

- Zabbix, New Relic, ELK, Prometheus, Datadog

- Security background

Perks and benefits

This job comes with several perks and benefits

Free coffee / tea
Free coffee / tea

Get your caffeine fix to get you started and keep you going.

Maternity / paternity leave
Maternity / paternity leave

Kids are the future, go spend time with them.

Pension plan
Pension plan

We take care of you, even when you are old and wrinkly.

Social gatherings
Social gatherings

Social gatherings and games; hang out with your colleagues.

Flexible working hours
Flexible working hours

Time is precious. Make it count. Morning person or night owl, this job is for you.

Near public transit
Near public transit

Easy access and treehugger friendly workplace.

Working at
Flexera

Flexera and BDNA have built the largest and most comprehensive repository of market intelligence on technology assets on the planet. We connect decision makers to the systems and information they need by enabling a common data language and view across their business. The world’s largest repository of software and hardware asset, vulnerability, and open source data platform will unite the software industry and strengthen the supply chain everyone depends upon.   We’re the best place ever for people looking for great camaraderie, high energy and impactful work. Talent. Experience. A desire to upend the software business. And give back in ways that matter. You in? We want to reimagine how software is bought, sold, managed, and secured. We’re the best place ever for people who want to have a say in how this gets done.

Read more about Flexera

company gallery image