Hallucinations pose a significant challenge when operating Large Language Models (LLM) such as GPT-4, Llama-2, or Falcon, as they can notably compromise the application’s trustworthiness. Utilizing external knowledge sources allows us to operate LLMs using any data and helps reduce hallucinations. This can be achieved by using Retrieval Augmented QA, a technique that retrieves relevant information from your own dataset and feeds it to the Large Language Model for more tailored responses.
Additionally, implementing Retrieval Augmented QA can effectively address the issues of data freshness and the use of custom datasets. This is crucial, as even some of the world’s most powerful Large Language Models, such as GPT-4 or Llama-2, are unaware of recent events or private data stored in your databases. Retrieval Augmented Generation (RAG) is a feature that empowers Large Language Models (LLMs) to generate responses using your unique data.
However, it’s important to highlight that the process of vectorizing large volumes of text to populate Vector Databases can create a bottleneck in NLP pipelines. This challenge emerges because many NLP libraries, if not all, are not built to process millions of documents effectively, particularly when using state-of-the-art embedding models like BERT, RoBERTa, DeBERTa, or any other Large Language Models used for generating text embeddings.
Join us to explore the rapidly advancing field of Text embedding and Vector Databases. Deep dive into recently released Spark NLP 5.0, featuring advanced embedding models like INSTRUCTOR and E5. Learn how to enhance your CPU inference using ONNX and take advantage of native Cloud support to substantially boost your text vectorization process. Learn how to extend the knowledge sources of your Large Language Models (LLMs) to overcome common limitations, such as a restricted scope due to training data and an inability to incorporate new or specific datasets, including internal company documents.
In this webinar Maziyar Panahi will provide an in-depth understanding of how to exploit Spark NLP 5.0 to enhance your LLM’s efficiency, reliability, and scalability, improving your application’s overall performance. It’s an opportunity to learn practical strategies to boost retrieval for enterprise search, empowering businesses to take full advantage of technologies like OpenAI’s GPT-4 and Meta’s Llama-2 models.
Tech stack used in this Webinar:
– Could and managed services:
– AWS (Glue 3.0/4.0 & EMR)
– Generative AI (commercial and open-source LLM models):
– GPT-3.5 and GPT-4 by OpenAI
– Llama-2 13B & 70B fine-tuned for chat
– Falcon 40B fine-tuned on instructions
– Vector Database for RAG
– Elasticsearch (Elastic Stack)
– On-prem infrastructure
– HPE bare-metal server with Nvidia A100 & AMD EPYC
About the speaker
Maziyar Panahi is a Principal AI / ML engineer and a senior Team Lead with over a decade-long experience in public research. He leads a team behind Spark NLP at John Snow Labs, one of the most widely used NLP libraries in the enterprise.
He develops scalable NLP components using the latest techniques in deep learning and machine learning that includes classic ML, Language Models, Speech Recognition, and Computer Vision. He is an expert in designing, deploying, and maintaining ML and DL models in the JVM ecosystem and distributed computing engine (Apache Spark) at the production level.
He has extensive experience in computer networks and DevOps. He has been designing and implementing scalable solutions in Cloud platforms such as AWS, Azure, and OpenStack for the last 15 years. In the past, he also worked as a network engineer in high-level places after he completed his Microsoft and Cisco training (MCSE, MCSA, and CCNA).
He is a lecturer at The National School of Geographical Sciences teaching Big Data Platforms and Data Analytics. He is currently employed by The French National Centre for Scientific Research (CNRS) as IT Project Manager and working at the Institute of Complex Systems of Paris (ISCPIF).