Today we’re diving deeper into the US Consumer Financial Protection Bureau’s Financial Services Consumer Complaint database to look at the text of the complaints filed against companies. The question: what words (from complaints) are distinctly Equifax-y? We’re going to be looking at text cleaning, tokenization, and lemming with Spark-NLP, counting with PySpark, and tf-idf (term frequency-inverse document frequency) analysis.