How Generative AI Transforms Enterprise Data Insights with Google Gemini and Teradata

80% of all the world’s data is unstructured — think about text, emails, customer reviews, voice transcripts and more. For decades organizations have struggled to turn unstructured data into actionable insights, relying on manual, labor-intensive techniques that take months to deliver insights. But what if these processes could be accelerated to just a few hours?
GenAI tools, like Google Gemini and Teradata Vantage are transforming the way businesses analyze and operationalize vast amounts of unstructured data. Google Gemini’s Large Language Models (LLMs) provide the ability to quickly understand and generate insights from unstructured data, while Teradata Vantage ensures these insights are operationalized across massive enterprise datasets, in various formats including OTF, to provide scalability, reliability and seamless integration with mission critical systems.
This article explains and demonstrates how generative AI via LLMs is transforming enterprise data workflows to extract actionable insights.
Table of Contents:
- Traditional challenges of unstructured data analysis
- How generative AI accelerates insights
- Solution Details / Demo: Analyzing Customer Complaints with Gemini
- Outcomes and Benefits
- Additional generative AI solutions
- Teradata and Google Cloud: Ushering in a data-driven future for enterprises
Traditional challenges of unstructured data analysis
Traditionally, data scientists have relied on a variety of Natural Language Processing (NLP) techniques to analyze unstructured data such as:
- Tokenization to split text into smaller units such as words or phrases for easier analysis.
- Named Entity Recognition (NER) to identify and classify entities like names, dates, and locations within the text.
- Part-of-Speech (POS) Tagging to label grammatical roles of words (noun, verb, adjective) and help the machine understand sentence structure and syntax.
- Text Similarity to measure how similar a body of text is to another.
These NLP techniques are often paired with Machine Learning (ML) algorithms for deeper analysis. For example:
- Naïve Bayes can classify sentiment based on labeled data.
- Clustering algorithms like K-means can group similar text to identify common themes across text such as customer reviews without relying on predefined labels.
While these methods are effective, they are often time-consuming, require extensive manual data preparation and expertise to implement. These methods may miss important context that reduces the effectiveness of the analytics.
How Generative AI Accelerates Insights
Generative AI, powered by LLMs, introduces an efficient way for businesses to handle unstructured data and extract actionable insights with semantic comprehension. LLMs can automate tasks such as sentiment analysis, semantic search, document classification, and summarization, thus eliminating the need for lengthy data preparation and significantly reducing development timelines. In fact, Andrew Ng, a leading voice in AI, Founder of DeepLearning.AI and Managing General Partner at AI Fund, highlights this efficiency in his recent talk titled “Opportunities in AI”. He explains that the time required to build certain AI applications has been significantly shortened with just prompt engineering and LLMs.
To reinforce this, Andrew Ng provides an example of developing a restaurant customer communication review system with traditional Machine Learning techniques, a process that might take 6–12 months.
- 1 Month: Collecting and preparing labeled data.
- 3 Months: Fine-tuning and training the model for optimal performance.
- 3 Months: Identifying and integrating with a cloud service provider for deployment.
In contrast, with generative AI we can achieve similar, if not better results in a fraction of the time — from a couple of days to weeks compared to traditional machine learning methods — according to Ng.
Solution Details / Demo: Analyzing Customer Complaints with Gemini
To demonstrate how generative AI accelerates extracting insights from unstructured data, let’s examine a bank’s customer communications using Google Gemini. In under ten minutes we classify the sentiment of customer communications, identify key topics, summarize the issues, and recommend strategies for resolution — all using Gemini’s-1.5-flash LLM and prompt engineering.
Imagine working for a large bank that uses a Customer360 Data Manager to unify customer data under a single ID and profile for each customer. By integrating Teradata Vantage, Google Gemini, and a Customer360 degree view, the bank can enhance customer satisfaction by analyzing complaints in real time. When a customer submits a complaint online, the system can analyze and categorize the issue and recommend the best course of action immediately. Tasks that traditionally took weeks can now be completed in minutes, enhancing customer satisfaction and operational efficiency.
Let’s get started!
Prerequisites
- ClearScape Analytics Experience Account: Teradata’s free online learning site provides a fully interactive environment to analyze data, build AI/ML models, and develop GenAI applications. You can create your free account here and sign in.
- Gemini API Key: To use the Gemini API, you need an API key.
- Independent Developers and Small Teams: Quickly generate an API key in Google AI Studio with just a few clicks. Get started here.
- Enterprise Teams: Use Gemini models to build production-ready applications seamlessly within Vertex AI. Learn more here.
Watch the Demo Walkthrough:
Click the link below to watch a step-by-step video walkthrough of this demo:
1. Create an account and log in to ClearScape Analytics Experience
Run this demo on ClearScape Analytics Experience using the integrated Jupyterlab environment.
Follow the steps below:
a. Create an environment.
b. Select Run Demos.
c. Filter by Cloud Provider: Google
d. Select Customer Complaints Analysis with Customer360 with Google Gemini
2. Install Dependencies and Libraries
Begin by importing specific libraries including pandas for working with data frames, teradataml — Teradata’s Python library that allows you to connect to Vantage systems and gives you access to in-database ML and SQL functions — and `google-generativeai` package, which enables making calls to Gemini via an API.
import pandas as pd
from tqdm import tqdm
from teradataml import *
# GenAI libs
import google.generativeai as genai
display.max_rows = 5
pd.set_option('display.max_colwidth', 50)
pd.set_option('display.max_rows', 5)
3. Connecting to Vantage
Establish a connection to your Vantage environment using the host uri, username found in your dashboard, and the password you set for your environment.
%run -i ../startup.ipynb
eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)
print(eng)
execute_sql('''SET query_band='DEMO=Complaint_Analysis_Customer360.ipynb;' UPDATE FOR SESSION;''')
4. Getting Data for the Demo
We have demo data that resides in Google Cloud for this demonstration. We have the option to run this demo using foreign tables without using any additional storage on our environment or downloading the data to our Vantage environment for faster execution. Let’s work on the data where it resides.
%run -i ../run_procedure.py "call get_data('DEMO_ComplaintAnalysis_cloud');" # Takes 1 minute
# %run -i ../run_procedure.py "call get_data('DEMO_ComplaintAnalysis_local');" # Takes 2 minutes
5. Configure API Key and define the Gemini model
Configure your Google API Key and define the type of Gemini model you want to use. For this demo, we will work with gemini-1.5-flash as it has a million-context window and up to 15 RPM and 1500 RPD.
GOOGLE_API_KEY = getpass.getpass(prompt = 'Please enter GOOGLE_API_KEY: ')
genai.configure(api_key = GOOGLE_API_KEY)
from google.generativeai.types import HarmCategory, HarmBlockThreshold
model = genai.GenerativeModel(
model_name = "models/gemini-1.5-flash"
6. Using LLM for Sentiment Analysis, Topic Modelling and Complaint Summarization
Now we are ready to perform sentiment analysis, topic modeling, and customer complaint summarization using a generative AI process powered by LLMs.
As a first step, let’s analyze the sample data. This comprises two sample datasets: one of customer details and another of customer complaints. This data is similar to what a Customer360 data management platform, used by a bank, would contain.
Using the TeradataML library, we have access to Teradata DataFrames that are highly performant for analysis tasks. The first dataset includes details such as the customer identification, name, city, state, and other personal information.
customer_data = DataFrame(in_schema('DEMO_ComplaintAnalysis', 'Customer_360_Details'))
customer_data
The second dataset contains customer communications, we will process this data using Generative AI.
complaints_data = DataFrame(in_schema('DEMO_ComplaintAnalysis', 'Customer_360_Complaints'))
complaints_data
We’ll convert our DataFrame to pandas and quickly append four columns that will house our sentiment, topic, summary and strategy produced by our LLM.
pd_df = complaints_data.to_pandas()
pd_df['Sentiment'] = ""
pd_df['Topic'] = ""
pd_df['Summary'] = ""
pd_df['Strategy'] = ""
pd_df
We then prompt engineer. Remember, a prompt is the meticulous crafting of instructions that will guide our large language model. Effective prompt engineering is important because it determines how well the model understands the task and delivers relevant outputs. For instance, a clear and concise prompt can ensure accurate sentiment categorization and precise summarization, reducing the need for post-processing or manual corrections. For all four tasks we will follow a similar format. We will provide context, iterate through all rows in our “consumer_complaint_narrative”, and provide instructions.
# Sentiment
for i in tqdm(range(len(pd_df))):
try:
prompt = f'''
User prompt:
The following is text from a review:
“{pd_df['consumer_complaint_narrative'][i]}”
Categorize the review as one of the following:
Positive
Negative
Neutral
- Important: Do not add any formatting into the output.
- Return just one of the above options, Do not return explanation
'''
output = model.generate_content([prompt])
sentiment = output.candidates[0].content.parts[0].text
pd_df['Sentiment'][i] = sentiment
except:
pass
We follow a similar pattern for topic categorization, summarization, and strategy.
# Topic
for i in tqdm(range(len(pd_df))):
try:
prompt = f'''
User prompt:
The following is text from a complaint:
“{pd_df['consumer_complaint_narrative'][i]}”
Identify the topic of the complaint and categorize into one of the following topics. Only output one of the following options:
- Mortgage Application
- Payment Trouble
- Mortgage Closing
- Report Inaccuracy
- Payment Struggle
- Important: Do not add any formatting into the output. For example **Mortgage Application** or **Report Inaccuracy** refrain from such formating in the response.
- Return just one of the above options
'''
output = model.generate_content([prompt])
topic = output.candidates[0].content.parts[0].text
pd_df['Topic'][i] = topic
except:
pass
# Summary
for i in tqdm(range(len(pd_df))):
try:
prompt = f'''
The following is text from a Bank Review:
“{pd_df['consumer_complaint_narrative'][i]}”
Summarize the Bank Review in one sentence
'''
output = model.generate_content([prompt])
summary = output.candidates[0].content.parts[0].text
pd_df['Summary'][i] = summary
except:
pass
# Strategy
for i in tqdm(range(len(pd_df))):
try:
prompt = f'''
User prompt:
The following is text from a complaint:
“{pd_df['consumer_complaint_narrative'][i]}”
Suggest the best course of action for the bank from the following:
- Wealth Manager to contact customer immediately
- Send Policy Letter from Mortgage Servicing
- Send Policy Letter from Executive Office
- Mortgage Banker to follow-up with Title Company for documentation and contact customer
- Branch Manager to contact customer immediately
- Important: Do not add any formatting into the output.
- Return just one of the above options
'''
output = model.generate_content([prompt])
strategy = output.candidates[0].content.parts[0].text
pd_df['Strategy'][i] = strategy
except:
pass
We then take our customer complaint DataFrame and prepare it for joining by stripping it of any extra white spaces and combining it with our customer information DataFrame.
pd_df['Sentiment'] = pd_df['Sentiment'].apply(lambda x: x.strip())
pd_df['Topic'] = pd_df['Topic'].apply(lambda x: x.strip())
pd_df['Summary'] = pd_df['Summary'].apply(lambda x: x.strip())
pd_df['Strategy'] = pd_df['Strategy'].apply(lambda x: x.strip())
combined_df = customer_data.to_pandas().join(pd_df)
We now have our Customer360 data integrated with our LLM analysis! We have the sentiment, topic, summary of each customer communication as well as the appropriate strategy for handling the customer communication.
7. Integrated data with customer 360
pd.set_option('display.max_colwidth', None)
combined_df[["complaint_id","Customer_ID","Sentiment","Topic","Summary", "Strategy"]]
Outcomes and Benefits
With Teradata Vantage and Google Gemini we’ve successfully demonstrated a seamless way to optimize customer communication analysis and accelerated the resolution process. This demonstrates how Generative AI can transform unstructured data into insights a lot quicker than before with traditional Machine Learning methods. With this LLM powered customer communication analysis developers can quickly unlock insights and automatically assign the best next course of action and increase customer satisfaction and retention.
Additional generative AI solutions
The table below highlights additional GenAI use cases by techniques, and industries, along with links to Jupyter notebook demos available for free on the ClearScape Analytics Experience site. The first-row groups examples of generative AI applications with unstructured data and just prompt engineering and Gemini. The other rows showcase advanced applications combining prompt engineering with RAG (Retrieval-Augmented Generation).
Teradata and Google Cloud: Ushering in a data-driven future for enterprises
There is significant value in unstructured data stored in formats such as text, audio, and more, which you can leverage to achieve this goal.
Advanced Large Language Models (LLMs), like Google’s Gemini, can simplify the process of introducing structure into unstructured data, enabling individuals and organizations to derive insights that better serve their customers.
Teradata, the Trusted AI company, helps enterprises create value by providing the most complete cloud analytics and data platform for AI.
Together, Teradata and Google Cloud offer the expertise, scale, and technology for enterprises to accelerate time-to-value and increase ROI — all while delivering trusted data and Trusted AI.
Key benefits of this collaboration for accelerating your AI innovation from development to production, include:
- Get faster results from your AI/ML initiatives by quickly building and training ML models with Vertex AI and the powerful in-database analytics functions of ClearScape Analytics
- Easily build and deploy powerful gen AI solutions with Teradata VantageCloud Lake, Vertex AI, and Gemini
- Transform customer complaint management through advanced generative AI for precise and automated classification.
Ready to start?
ClearScape Analytics, a powerful analytics engine in Teradata Vantage, delivers unmatched performance, value, and scalability. It empowers enterprises and developers with the most powerful, open and connected AI/ML capabilities available today.
Experience ClearScape Analytics and Teradata Vantage, in a non-production setting, through ClearScape Analytics Experience.
Gemini’s ecosystem of products and models can help developers and businesses get the most out of Google AI, from building with Gemini models to using Gemini as your AI assistant.
Try Gemini 2.0 models — the latest and most advanced multimodal models from Google. See what you can build with up to a 2M token context window.
BUILD WITH GEMINI MODELS
- Google AI Studio Experiment, prototype, and deploy. Google AI Studio is the fast path for developers, students, and researchers who want to try Gemini models and get started building with the Gemini Developer API.
- Vertex AI Build AI agents and integrate generative AI into your applications, Google Cloud offers Vertex AI, a single, fully-managed, unified development platform for using Gemini models and other third party models at scale.
Special thanks to Daniel Herrera, Rosalie Bartlett, Merlin Yamssi, Matt Mazzarell, Chetan Hirapara, and Pratik Somwanshi for their valuable contributions to this piece.