The Art and Science of Site Reliability Engineering with Liz Fong-Jones
2024/10/09
再生時間： 33 分
ポッドキャスト

カートのアイテムが多すぎます

ご購入は五十タイトルがカートに入っている場合のみです。

カートに追加できませんでした。

しばらく経ってから再度お試しください。

ウィッシュリストに追加できませんでした。

しばらく経ってから再度お試しください。

ほしい物リストの削除に失敗しました。

しばらく経ってから再度お試しください。

ポッドキャストのフォローに失敗しました

ポッドキャストのフォロー解除に失敗しました

The Art and Science of Site Reliability Engineering with Liz Fong-Jones

無料で聴く

ポッドキャストの詳細を見る

サマリー
In this exciting episode of Cloud Dialogues, we are joined by Liz Fong-Jones, Field CTO at Honeycomb and former Google SRE, to explore the fascinating world of Site Reliability Engineering (SRE)—a game-changer for scaling and automating large systems.

What We Covered:

1. Meet Liz Fong-Jones: Liz brings over a decade of SRE experience from her time at Google and Honeycomb, helping companies revolutionize how they manage reliability and automation.

2. The Origin Story: SRE actually predates the cloud! Born at Google in the early 2000s, SRE started as a way to automate manual system administration tasks and has since evolved into its own discipline, running parallel to DevOps.

3. SRE at Its Core: - Minimize repetitive work (aka "toil") by automating everything you can. - Use Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to measure and maintain reliability.

4. Different SRE Models: There are different ways to implement SRE: - Tools-based within platform teams - Consultative SREs parachuting in to help teams - Embedded SREs integrated within every team

5. The SRE Mindset: Curiosity and empathy are essential for SREs. Teams need a culture of psychological safety where concerns can be raised without fear.

6. The Magic of SLOs and SLIs: SLOs set reliability targets (like aiming for 99.5% uptime), while SLIs measure performance against those targets. Together, they ensure your systems are running smoothly.

7. FinOps Meets SRE: Liz explains how SREs can help balance reliability, performance, and costs using SLOs to allocate resources more efficiently.

8. Disaster Testing: Want proof SREs are ready for anything? Honeycomb regularly tests its disaster recovery by taking down an entire availability zone—on purpose!

9. Pro Tips for Executives: Thinking about implementing SRE at your company? Liz suggests starting with your biggest challenges, offering executive support, and setting clear, achievable SLOs.

10. Why Observability Matters: Observability is the backbone of SRE. Having real-time, actionable data is key for setting and managing effective SLOs.

Plus, Liz gives covers off on her favorite ARM processors (for cost and environmental savings) and shares insights from her book Observability Engineering.

This episode is a deep dive into SRE, filled with actionable insights and strategies for leaders looking to supercharge their reliability game. You won’t want to miss it!

続きを読む一部表示

あらすじ・解説

In this exciting episode of Cloud Dialogues, we are joined by Liz Fong-Jones, Field CTO at Honeycomb and former Google SRE, to explore the fascinating world of Site Reliability Engineering (SRE)—a game-changer for scaling and automating large systems.

What We Covered:

1. Meet Liz Fong-Jones: Liz brings over a decade of SRE experience from her time at Google and Honeycomb, helping companies revolutionize how they manage reliability and automation.

2. The Origin Story: SRE actually predates the cloud! Born at Google in the early 2000s, SRE started as a way to automate manual system administration tasks and has since evolved into its own discipline, running parallel to DevOps.

3. SRE at Its Core: - Minimize repetitive work (aka "toil") by automating everything you can. - Use Service Level Objectives (SLOs) and Service Level Indicators (SLIs) to measure and maintain reliability.

4. Different SRE Models: There are different ways to implement SRE: - Tools-based within platform teams - Consultative SREs parachuting in to help teams - Embedded SREs integrated within every team

5. The SRE Mindset: Curiosity and empathy are essential for SREs. Teams need a culture of psychological safety where concerns can be raised without fear.

6. The Magic of SLOs and SLIs: SLOs set reliability targets (like aiming for 99.5% uptime), while SLIs measure performance against those targets. Together, they ensure your systems are running smoothly.

7. FinOps Meets SRE: Liz explains how SREs can help balance reliability, performance, and costs using SLOs to allocate resources more efficiently.

8. Disaster Testing: Want proof SREs are ready for anything? Honeycomb regularly tests its disaster recovery by taking down an entire availability zone—on purpose!

9. Pro Tips for Executives: Thinking about implementing SRE at your company? Liz suggests starting with your biggest challenges, offering executive support, and setting clear, achievable SLOs.

10. Why Observability Matters: Observability is the backbone of SRE. Having real-time, actionable data is key for setting and managing effective SLOs.

Plus, Liz gives covers off on her favorite ARM processors (for cost and environmental savings) and shares insights from her book Observability Engineering.

This episode is a deep dive into SRE, filled with actionable insights and strategies for leaders looking to supercharge their reliability game. You won’t want to miss it!

続きを読む一部表示