Change Management Week

Everyday this week!


Site Reliability Engineering Foundation (SRE)



Site Reliability Engineering Foundation (SRE)


Introduces a range of practices for improving service reliability through a  mixture of automation, working methods and organizational re-alignment.  Tailored for those focused on large-scale service availability.  

The SRE (Site Reliability Engineering) Foundation℠ course is an introduction to the principles and practices that enable an organization to scale critical services reliably and economically. Introducing a site-reliability dimension requires organizational re-alignment, a new focus on engineering and automation, and the adoption of a variety of new work paradigms.

The course highlights the evolution of SRE and its future direction, and equips participants with the practices, methods, and tools to engage people across the organization involved in the reliability and stability evidenced thanks to the use of real-life cases and scenarios. Once the course is completed, the students will have tangible elements that they can apply in their daily work such as establishing and tracking the Service Level Objectives (SLOs).

The course was developed by leveraging key SRE sources, interacting with thought-leaders in the SRE space, and working with organizations adopting SRE to extract real-life best practices and has been designed to teach key principles and practices needed to initiate adoption of SRE.

Attending the class prepares individuals to take the exam and earn the Site Reliability Engineering (SRE) Foundation℠ certification.


At the end of this course students will have a practical understanding of:

  • The history of SRE and its appearance on Google
  • The interrelation of SRE with DevOps and other well-known frameworks
  • The principles behind SRE
  • Service Level Objectives (SLOs) and their focus on the user
  • Service Level Indicators (SLIs) and the current monitoring landscape
  • Error budgets and associated policies
  • Work that causes wear (toil) and its effect on the productivity of the organization
  • Some practical steps to help eliminate wear and tear work (toil)
  • SRE tools, automation techniques and the importance of security
  • Anti-brittleness, the approach to failure and failure testing
  • The organizational impact that SRE introduces

Student Profile

This course is designed for professionals including:

  • Anyone who is just starting out on reliability issues
  • Anyone interested in IT leadership and organizational change approaches
  • Business managers
  • Agents of change
  • Consultants
  • Engineers
  • IT Directors
  • IT managers
  • Product owners
  • Scrum Masters
  • Software engineers
  • Site Reliability Engineers (SRE)
  • Systems integrators
  • Tool Providers


There are no prerequisites to attend this course, but students should have an understanding and knowledge of common DevOps terminology and concepts and related work experience.

Course Materials

Each student will receive a copy of the course documentation prepared by DevOps Institute.


Engaging and interactive course. Our instructors teach all course materials using the demonstrative method; the participants learn new concepts through exercises and real application practices.


At the end of the training session, students will be able to obtain the DevOps Institute SRE Foundation certification by successfully passing the Site Reliability Engineering Foundation exam. Get the exam details.

Additionally, students will earn 16 credit hours for their attendance.


A certificate of attendance will be issued to students who attend the course for at least 75% of the duration.

Course Outline

  1. SRE principles and practices
    • What is Site Reliability Engineering?
    • SRE and DevOps. What is the difference?
    • SRE principles and practices
  2. Service Level Objectives (SLO) and error budgets
    • Service Level Objectives (SLO)
    • Bug budgets
    • Bug Budgeting Policies
  3. Reduce work that generates wear (toil)
    • What is the toil?
    • Why is the toil bad?
    • Actions before the toil
  4. Monitoring and Service Level Indicators (SLI)
    • Service Level Indicators (SLI)
    • Monitoring
    • Observability
  5. SRE tools and automation
    • Defined automation
    • Focus on automation
    • Hierarchy of automation types
    • Safe automation
    • Automation tools
  6. Anti-brittleness and learn from failure
    • Why learn from failure
    • Benefits of anti-brittleness
    • Change the organizational balance
  7. Organizational impact of SRE
    • Why Organizations Choose SRE
    • Patterns for SRE Adoption
    • On-call needs
    • Post-mortem without guilt
    • SRE and scaling

Public Classes

Currently, we don't have any public sessions of this course scheduled. Please let us know if you are interested in adding a session.

See Public Class Schedule

Course Details


JST 355


2 days

Delivery Mode

Virtual, Face-to-Face

Related Courses

Our Recent Insights

Onsite Training Request

Please provide the information below to help us to customize your solution. 

Contact Us

Netmind US
3372 Peachtree Rd NE, Ste 115
Atlanta, GA 30326
T. +1 (678) 366.1363

Office Hours:
Monday – Friday, 8:30-5:00EST

General Inquiries:
[email protected]

Sales Inquiries:
[email protected]

Request Information