Large Language Models Archives

Large Language Models Blog

Everybody loves vector search and enterprises now see its value thanks to the popularity of LLMs and RAG. The problem is that prod-level deployment of vector search requires boatloads of CPU, for search, and GPU, for inference, compute.The bottom line is that if deployed incorrectly vector search can be prohibitively expensive compared to classical alternatives. The solution: quantizing vectors, leveraging hardware-accelerated optimizations and performing adaptive retrieval. These techniques allow you to scale applications into production by allowing you to balance and tune memory costs, latency performance, and retrieval accuracy very reliably.This session shows how you use the open-source Weaviate vector database to perform real-time billion-scale vector searches – on your laptop! This includes covering different quantization techniques, including product, binary, scalar, and matryoshka quantization that can be used to compress vectors trading off memory requirements for accuracy. I’ll also introduce the concept of adaptive retrieval where you first perform a cheap hardware-optimized low-accuracy search to identify retrieval candidates using compressed vectors followed by a slower, higher-accuracy search to rescore and correct. These quantization techniques when used with well-thought-out adaptive retrieval can lead to a 32x reduction in memory cost requirements at the cost of ~ 5% loss in retrieval recall in your RAG stack.

Blog

Large Language Models Blog

Blog

Vector Search Without Breaking the Bank: Quantization and Adaptive Retrieval

Using Generative AI to Empower Procurement Teams with Real-Time Competitor Analysis

Using Generative AI to Empower Procurement Teams with Real-Time Competitor Analysis

Reasoning in Natural Language: Assessing Large Language Model capabilities in Sentiment Analysis

Delivering Insights to Clinicians with NLQ for Patients’ Data

Join the Global Healthcare AI Community

The Technology

The Technology in Action

Industry Trends

Large Language Models Blog

Blog

Vector Search Without Breaking the Bank: Quantization and Adaptive Retrieval

Using Generative AI to Empower Procurement Teams with Real-Time Competitor Analysis

Using Generative AI to Empower Procurement Teams with Real-Time Competitor Analysis

Reasoning in Natural Language: Assessing Large Language Model capabilities in Sentiment Analysis

Delivering Insights to Clinicians with NLQ for Patients’ Data