Building databases that track real patients’ stories over time is essential for medical research, drug development, clinical quality improvement, population health, and chronic disease management. Doing this well traditionally presents three key challenges. First, a lot of relevant information such as patient demographics, comorbidities, history, and social determinants of health is only available in free-text documents and notes. Second, there are gaps and conflicts between different data points about each patient which must be resolved. Third, a large number of both patients and variables are required to make most analyses useful – which in turn means that building these databases manually is often impractical. This session describes these challenges in the context of real-world projects and use cases. We’ll then cover how recent advances in natural language processing (NLP) and transfer learning have changed the game in terms of achievable accuracy and scale. Results and benchmarks from doing so using Spark NLP for Healthcare will be shared, as well as best practices and lessons learned from early adopters of the technology.