INDUSTRY: Media & entertainment
SOLUTION: Personalized experience
TECHNICAL USE CASE: Data ingest and ETL, machine learning, deep learning
As a global technology and media company connecting millions of customers to personalized experiences, Comcast struggled with massive data, fragile data pipelines, and poor data science collaboration. With Databricks including Delta Lake and MLflow, they can build performant data pipelines for petabytes of data and easily manage the lifecycle of 100s of models to create a highly innovative, unique and award winning viewer experience using voice recognition and machine learning.
Infrastructure unable to support data and ML needs
Instantly answering a customer’s voice request for a particular program while turning billions of individual interactions into actionable insights, strained Comcast’s IT infrastructure and data analytics and data science teams. To make matters more complicated, Comcast needed to deploy models to a disjointed and disparate range of environments: cloud, on-prem, and even directly to devices in some instances.
- Massive data: billions of events generated by our entertainment system and 20+ million voice remotes resulting in petabytes of data that need to be sessionized for analysis.
- Fragile pipelines: complicated data pipelines that frequently failed and were hard to recover. Small files were difficult to manage, slowing data ingestion for downstream machine learning.
- Poor collaboration: globally dispersed data scientists working in different scripting languages struggled to share and reuse code.
- Manage management of ML models: Developing, training, and deploying 100s of models was highly manual, slow, and hard to replicate, making it difficult to scale.
- Friction between dev and deployment: dev teams wanted to use latest tools and models while ops wanted to deploy on proven infrastructure.
With Databricks, we can now be more informed about the decisions we make, and we can make them faster.