was successfully added to your cart.

    Applying Context Aware Spell Checking in Spark NLP

    Avatar photo
    Senior data scientist on the Spark NLP team

    Today we are exploring Spell Checking, a very important task in any serious NLP pipeline that needs to deal with noisy, incorrect data that has been generated in the wild.

    Take for example the case of tweets, instant messaging, blog posts, OCR, or any other user generated text content. Being able to rely on correct data, without spelling problems reduces vocabulary sizes at different stages in the pipeline, and improves the performance of all the models in the pipeline.

    How useful was this post?

    Avatar photo
    Senior data scientist on the Spark NLP team
    Our additional expert:
    Alberto Andreotti is a senior data scientist on the Spark NLP team at John Snow Labs, where he implements state-of-the-art NLP algorithms on top of Spark. He has a decade of experience working for companies and as a consultant, specializing in the field of machine learning. Alberto has written lots of low-level code in C/C++ and was an early Scala enthusiast and developer. A lifelong learner, he holds degrees in engineering and computer science and is working on a third in AI. Alberto was born in Argentina. He enjoys the outdoors, particularly hiking and camping in the mountains of Argentina.

    Spark NLP 2.5 delivers state-of-the-art accuracy for spell checking and sentiment analysis

    John Snow Labs is thrilled to announce the immediate availability of the new major version of Spark NLP 2.5 with spell checking...
    preloader