CDN Site Reliability Engineer L4/ L5 - Live Streaming, Open Connect CDN - Worldwide
25 days ago
Netflix
Experienced
Full Time
Europe
About the role
In this role, you will support the CDN delivery and day-to-day live-streaming operations for Netflix. As a Live CDN SRE, you will be participating in the preparation, validation, and execution of live streaming focused initiatives in collaboration with related production and engineering teams. You will impact multiple areas of the live event lifecycle, from the planning phase through testing and event launch days. You will be leading innovation initiatives, implementing new features, and driving enhancements in the streaming services delivery.Responsibilities
- Drive continual improvement in resilience, observability, monitoring, instrumentation, and automation with the primary goal to maintain highly scalable and reliable CDN services worldwide with excellent quality of experience (QoE)
- Implement, automate, execute, and analyze the results from a broad range of streaming CDN delivery focused functional, performance, resilience, and fault injection testing
- Coordination, collaboration, and partnership across multiple stakeholders for the smooth execution of live-streaming events
- Aggregate, analyze, and correlate large amounts of server and application performance data. Use the innovative Netflix Big Data platform as a highly flexible, specialized and efficient toolset for service delivery optimization and system reliability improvements
- Participate in an on-call rotation and be able to work with flexible hours based on the live events schedule
Qualifications
- 3+ years service reliability/operational experience running large scale, high performance systems internet services with focus on live-streaming and video-on-demand (VOD) delivery
- Knowledge of and proven experience with CDNs and HTTP cache/proxy technologies. Experience supporting live-streaming CDN delivery on a large scale is a plus
- Expert-level knowledge of Unix or Linux system engineering fundamentals (networking, storage, operating systems) at scale. We happen to use FreeBSD
- Proficient understanding of networking principles, transport, and application protocols, especially TCP/IP, BGP, DNS, TLS, and HTTP/S
- Experience with using distributed analytic processing technologies (Hive, Presto/Trino, Spark SQL, etc)
- Proficient in a programming language such as Python or Go
- Ability to work in a highly collaborative environment and to communicate effectively with internal and external partners
- Preferred - B.S. in Computer Science, Electrical or Computer Engineering (or equivalent professional experience)
Things that show how we think
- Resiliency Practices in Managing CDN
- Measuring Real-Life Latency of the Internet: A Netflix Story
- Mastering Near-Real Time Telemetry and Big Data
- FreeBSD optimization used by Netflix to serve 800 Gb/s from a single server
Originally posted on Himalayas