reliability engineer
20 the last 253 days, recently 2023-07-22
Responsibility
- you will have proven experience in managing cloud-based production systems at scale, including design, capacity planning, deployment, performance monitoring and troubleshooting
- accountability for the execution of the technical vision and ensure it is aligned with business goals
- tech stack
Show more +52 - you will be accountable for designing, implementing and maintaining systems that ensure the reliability, scalability and availability of our software products
- about the Team
- gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
- starting from scratch, we have built a deep tech-stack with AWS services at its core
- create and execute a strategy for monitoring, alerting & automation tools that improve system reliability, scalability & stability
- we’re a team of around 45 engineers and digital specialists of mixed disciplines and experience distributed across the UK and US
- identifying platform bottlenecks and turning them into deliverables
- partner with development teams to improve services through rigorous testing and release procedures
- taking an active role in our world class Incident Response team
- proactively identify, monitor and alert possible improvements
- about the role
- lead incident management response in production systems
- run our SRE team, including management of all critical infrastructure, reliability, and performance controls
- monitoring tools and instrumentation, Datadog, or similar observability platforms
- participate in system design consulting, platform management, and capacity planning
- problem solving and resolution to incidents caused by application or infrastructure issues for clients
- as a Site reliability engineer at Citizens Advice you will develop and advocate for best practices, engineering standards and improved operational processes
- BPP Education is entering a new phase of its growth and evolution, attracting thousands more students each year and expanding into new verticals and new markets globally
- novata is a public benefit corporation created and funded by a unique consortium of foundations and private sector companies including the Ford Foundation, the Omidyar Network, S&P Global, and Hamilton Lane
- you will contribute to building reliable, performant, auto-scalable and highly available systems
- expand, extend and maintain our existing automation and operations infrastructure
- AWS expertise; familiarity with core services
- picking up support escalation issues related to platform performance
- analyse system metrics and logs to identify opportunities for improvement and prevent future or recurring incidents
- create sustainable systems and services through automation and uplifts
- designing, developing, testing and deploying infrastructure using IaC
- you will be building and continuously improving core automation tools and frameworks to help our digital teams release code safer and faster
- the BPP Product & Technology organisation is evolving rapidly, and driving transformation of its platforms, digital products and experiences, in order to help BPP Education scale and meet the growth of the business in the coming years
- we are a for-profit, mission-driven company with the goal of empowering the private markets to build a more inclusive and sustainable form of capitalism
- improve on-call processes
- collaborate with your peers in architecture, product, design, data and security to identify & mitigate risks to the system reliability
- you will have support of the existing Platform Infrastructure team
- software development or strong scripting experience including but not limited to Golang, Python, or Bash
- balance feature development speed and reliability with well-defined service level objectives
- actively maintaining customer environments to best practice standard e.g. patching, upgrades etc
- we value passion, creativity, quality, and simplicity. You’ll be given the opportunity to make an impact and directly contribute to our success, bring new ideas, and have full ownership of your area
- you will have gained a solid programming background and exceptional DevOps literacy
- we’re looking for a talented principal software reliability engineer to help us build best in class products and deliver amazing user experiences, to deliver scalable, secure and performant experiences that delight and engage learners during their time studying with BPP and beyond, throughout their working lives
- by bringing together the brightest minds in financial data, private equity, social justice and inclusive capitalism, we aim to build the best technology and tools for the private markets to better collect, analyze, benchmark and report ESG data
- the application of our engineering principles through a SRE lens
- pagerDuty, Slack, and related tooling integrations
- building solutions, mostly in Java, to address the solutions identified during investigation phases
- we are looking for someone who has a track record of operating complex systems guided by industry best practice and experience
- contribute and evolve the internal software engineering practices and standards as the team scales
- pro-actively document solutions and team knowledge
- devOps oriented client work such as building CI / CD pipelines
- full onboarding support and continued development opportunities
- our global team, of more than 35 nationalities, have fun together, whether it is during a typical Swedish Fika, a workshop, going for a run with a colleague, or spending time together during the annual company Goal trip, team kickoffs, or after-work socials
- as the Principal Software Reliability Engineer, you will report to the Engineering Manager, bringing your technical expertise to our growing product engineering teams, leveraging modern software development practices that will deliver business value at pace
- we have an experienced and diverse team who are as ambitious about growing a successful company as we are about making an impact
- raising medium and large identified deliverables into internal projects
- be up-to-date with industry best practices, new technologies, and emerging trends
Requirements
- site reliability engineering principles
- performance, scalability
- working with Linux platforms
Show more +35 - experienced in implementing cloud solutions
- ability to contribute to purpose driven, people focused, collaborative and transparent organisation and team culture
- HTTP, web services, REST
- experience working on technical projects within an agile team
- proven ability to keep up to date with technology and take individual responsibility for learning new technologies
- proven experience in a similar software engineering or SRE role working in an agile environment
- containers, cloud
- infrastructure as code
- familiarity with Bash, SQL and NoSQL, IoT
- involved in deploy/manage/operate of medium to large scale production systems
- core Java and EE Technologies; EJB, JPA, JTA, JMS
- experience of Ansible to automate configuration ad patching
- familiarity with at least one scripting or programming language - Bash, Golang, Python, SQL and NoSQL, IoT
- familiarity with Kubernetes clusters on premises or in the cloud
- terraform
- deep knowledge of cloud networking, security and native functionality in AWS
- configuration Management
- great communication skills/team player/teacher
- understanding of Linux as a runtime environment
- apache Ignite/Gridgain
- working with the Atlassian suite of products
- great communication skills/team player / teacher
- willingness to occasionally travel within the UK for meetings
- kusto
- testing, reliability, monitoring
- must-have
- commercial experience with Software Engineering, Software Development, or system operations/administration
- experience of coding in Ruby or Python
- expertise in Infrastructure as Code using frameworks such as Terraform
- cloud experience - AWS infrastructure
- measure what you make - We live and breathe data
- familiar to Cloud native concepts and virtualisation
- java profiling skills (heap dump analysis, jstack investigations, gc logs processing, etc
- interest in developing no-code tools
- SQL Server and SQL scripting
Salary in other companies in the position reliability engineer
HARTLEYCO | 16666 £ | 10833 £ 16666 £ |
MORGAN MCKINLEY GROUP | 14400 £ | 10833 £ 16666 £ |
CONCEPT RESOURCING | 14000 £ | 10833 £ 16666 £ |
HEAT RECRUITMENT | 14000 £ | 10833 £ 16666 £ |
SERVICE CARE SOLUTIONS | 13000 £ | 10833 £ 16666 £ |
EXPERIS | 12800 £ | 10833 £ 16666 £ |
SQUARE ONE PHARMA RESOURCES | 12725 £ | 10833 £ 16666 £ |
OXFORD KNIGHT | 12500 £ | 10833 £ 16666 £ |
ZENITH PEOPLE | 11000 £ | 10833 £ 16666 £ |
TRIA RECRUITMENT | 10833 £ | 10833 £ 16666 £ |