GenAI is the new solution that has everybody looking at their problems to see where it can be applied. In the process, organizations are uncovering use cases previously considered impractical or impossible.
As a consultant, I discuss use cases with prospective customers all the time. GenAI is where the conversation starts... but the end solution often involves a combination of GenAI, ML, and Advanced Analytics.
First... let's clarify the meaning for each of these terms.
Machine Learning encompasses a broad set of techniques that enable systems to learn patterns from data and make predictions or decisions without being explicitly programmed. ML models are often used for fraud detection, customer segmentation, recommendation engines, and forecasting, as well as visual processing such object detection and facial recognition.
Generative AI is a subset of Machine Learning, leveraging approaches such as deep learning and neural networks. In this context, we focus on Large Language Models (LLMs), which are trained on vast amounts of text data, learning the patterns and relationships between tokens that represent words and phrases. In operation, an LLM accepts text as input and predicts the most statistically likely text to follow — generating a response one token at a time.
LLMs are trained with massive data sets, but runtime inferences must fit within a context window which limits the number of tokens that can be processed at once. While some models can process up to one million tokens in a single inference, this is still a relatively small amount of data.
Advanced Analytics goes beyond pivot reports and visualization with complex SQL, statistical modeling, non-SQL code, multi-step processes, and other approaches. Advanced Analytics tends to be descriptive, providing insights based on observed data, as opposed to predictive, which is where ML and AI come in.
Advanced analytics operates on data volumes ranging from small, curated datasets to petabyte-scale data warehouses. Analytical queries may scan millions to billions of rows, making data organization strategies like partitioning, indexing, and aggregation critical. The emphasis is on how efficiently meaningful insights can be extracted from large, structured historical datasets.
In practice, a modern customer segmentation application might involve the following steps:
A predictive maintenance application would look somewhat different:
SQL and ML are not just making up for weak points of GenAI. By offloading some of the processing, GenAI can focus on one of its biggest strengths, which is analyzing and summarizing data provided within the context window.
Working with Structured Data at Scale
The use cases that branch into ML and Advanced Analytics we've put forward involve structured data, such as transactions, event logs, and IoT (or other) streams. GenAI can play an important role in answering questions about these data sets, but keep in mind that LLMs:
Can be inefficient when processing an inference for millions of single data points
Are only capable of processing data that fits within the context window
SQL and ML classification provide scalability enabling better input data to the LLM context window.
When approaching your use cases, consider that the best solution may involve a combination of GenAI, ML, and Advanced Analytics. GenAI enables these previously impractical solutions, but it may not be the only technology in play for the solution.
The resulting solution will cost less and be more efficient, with much more room for future scaling.
From an organizational perspective, this allows for leveraging existing SQL talent, while highlighting the need for some data science upskilling.