How Generative AI Transforms Enterprise Data Insights with Google Gemini and Teradata

Published in

Google Cloud - Community

10 min readJan 29, 2025

GOOGLE GEMINI AND TERADATA VANTAGE

80% of all the world’s data is unstructured — think about text, emails, customer reviews, voice transcripts and more. For decades organizations have struggled to turn unstructured data into actionable insights, relying on manual, labor-intensive techniques that take months to deliver insights. But what if these processes could be accelerated to just a few hours?

GenAI tools, like Google Gemini and Teradata Vantage are transforming the way businesses analyze and operationalize vast amounts of unstructured data. Google Gemini’s Large Language Models (LLMs) provide the ability to quickly understand and generate insights from unstructured data, while Teradata Vantage ensures these insights are operationalized across massive enterprise datasets, in various formats including OTF, to provide scalability, reliability and seamless integration with mission critical systems.

This article explains and demonstrates how generative AI via LLMs is transforming enterprise data workflows to extract actionable insights.

Traditional challenges of unstructured data analysis

Traditionally, data scientists have relied on a variety of Natural Language Processing (NLP) techniques to analyze unstructured data such as:

Tokenization to split text into smaller units such as words or phrases for easier analysis.
Named Entity Recognition (NER) to identify and classify entities like names, dates, and locations within the text.
Part-of-Speech (POS) Tagging to label grammatical roles of words (noun, verb, adjective) and help the machine understand sentence structure and syntax.
Text Similarity to measure how similar a body of text is to another.

These NLP techniques are often paired with Machine Learning (ML) algorithms for deeper analysis. For example:

Naïve Bayes can classify sentiment based on labeled data.
Clustering algorithms like K-means can group similar text to identify common themes across text such as customer reviews without relying on predefined labels.

While these methods are effective, they are often time-consuming, require extensive manual data preparation and expertise to implement. These methods may miss important context that reduces the effectiveness of the analytics.

How Generative AI Accelerates Insights

Generative AI, powered by LLMs, introduces an efficient way for businesses to handle unstructured data and extract actionable insights with semantic comprehension. LLMs can automate tasks such as sentiment analysis, semantic search, document classification, and summarization, thus eliminating the need for lengthy data preparation and significantly reducing development timelines. In fact, Andrew Ng, a leading voice in AI, Founder of DeepLearning.AI and Managing General Partner at AI Fund, highlights this efficiency in his recent talk titled “Opportunities in AI”. He explains that the time required to build certain AI applications has been significantly shortened with just prompt engineering and LLMs.

To reinforce this, Andrew Ng provides an example of developing a restaurant customer communication review system with traditional Machine Learning techniques, a process that might take 6–12 months.

1 Month: Collecting and preparing labeled data.
3 Months: Fine-tuning and training the model for optimal performance.
3 Months: Identifying and integrating with a cloud service provider for deployment.

In contrast, with generative AI we can achieve similar, if not better results in a fraction of the time — from a couple of days to weeks compared to traditional machine learning methods — according to Ng.

Solution Details / Demo: Analyzing Customer Complaints with Gemini

To demonstrate how generative AI accelerates extracting insights from unstructured data, let’s examine a bank’s customer communications using Google Gemini. In under ten minutes we classify the sentiment of customer communications, identify key topics, summarize the issues, and recommend strategies for resolution — all using Gemini’s-1.5-flash LLM and prompt engineering.

Imagine working for a large bank that uses a Customer360 Data Manager to unify customer data under a single ID and profile for each customer. By integrating Teradata Vantage, Google Gemini, and a Customer360 degree view, the bank can enhance customer satisfaction by analyzing complaints in real time. When a customer submits a complaint online, the system can analyze and categorize the issue and recommend the best course of action immediately. Tasks that traditionally took weeks can now be completed in minutes, enhancing customer satisfaction and operational efficiency.

Let’s get started!

Prerequisites

ClearScape Analytics Experience Account: Teradata’s free online learning site provides a fully interactive environment to analyze data, build AI/ML models, and develop GenAI applications. You can create your free account here and sign in.
Gemini API Key: To use the Gemini API, you need an API key.

Independent Developers and Small Teams: Quickly generate an API key in Google AI Studio with just a few clicks. Get started here.
Enterprise Teams: Use Gemini models to build production-ready applications seamlessly within Vertex AI. Learn more here.

Watch the Demo Walkthrough:

Click the link below to watch a step-by-step video walkthrough of this demo:

Customer Complaints Analysis with Customer360 with Google Gemini

1. Create an account and log in to ClearScape Analytics Experience

Run this demo on ClearScape Analytics Experience using the integrated Jupyterlab environment.

Follow the steps below:

a. Create an environment.

b. Select Run Demos.

Teradata ClearScape Analytics Experience dashboard displaying an active environment named ‘demo’ with 60 days remaining. The interface includes options to run demos, connection details for the Teradata Vantage Database (host, username, and password), and links to API tokens, REST API docs, and sign-out functionality. The sidebar shows the environment details and status. — ClearScape Analytics Experience dashboard

c. Filter by Cloud Provider: Google

d. Select Customer Complaints Analysis with Customer360 with Google Gemini

ClearScape Analytics Experience JupyterLab environment displaying the ‘Demo.index’ page with various use cases arranged in a grid format. Each use case provides options to select either a ‘Read Only Version’ or ‘Python Version,’ allowing users to explore functionalities such as ‘Complaints Summarization with Google Gemini,’ ‘Customer Complaint Analysis,’ and ‘Sentiment Analysis.’ The interface includes a sidebar on the left for filtering by analytic function or third-party tools and CSP. — demo.index

2. Install Dependencies and Libraries

Begin by importing specific libraries including pandas for working with data frames, teradataml — Teradata’s Python library that allows you to connect to Vantage systems and gives you access to in-database ML and SQL functions — and `google-generativeai` package, which enables making calls to Gemini via an API.

import pandas as pd
from tqdm import tqdm
from teradataml import *

# GenAI libs
import google.generativeai as genai

display.max_rows = 5
pd.set_option('display.max_colwidth', 50)
pd.set_option('display.max_rows', 5)

3. Connecting to Vantage

Establish a connection to your Vantage environment using the host uri, username found in your dashboard, and the password you set for your environment.

%run -i ../startup.ipynb
eng = create_context(host = 'host.docker.internal', username='demo_user', password = password)
print(eng)
execute_sql('''SET query_band='DEMO=Complaint_Analysis_Customer360.ipynb;' UPDATE FOR SESSION;''')

4. Getting Data for the Demo

We have demo data that resides in Google Cloud for this demonstration. We have the option to run this demo using foreign tables without using any additional storage on our environment or downloading the data to our Vantage environment for faster execution. Let’s work on the data where it resides.

%run -i ../run_procedure.py "call get_data('DEMO_ComplaintAnalysis_cloud');"        # Takes 1 minute
# %run -i ../run_procedure.py "call get_data('DEMO_ComplaintAnalysis_local');"        # Takes 2 minutes

5. Configure API Key and define the Gemini model

Configure your Google API Key and define the type of Gemini model you want to use. For this demo, we will work with gemini-1.5-flash as it has a million-context window and up to 15 RPM and 1500 RPD.

GOOGLE_API_KEY = getpass.getpass(prompt = 'Please enter GOOGLE_API_KEY: ')
genai.configure(api_key = GOOGLE_API_KEY)

from google.generativeai.types import HarmCategory, HarmBlockThreshold

model = genai.GenerativeModel(
model_name = "models/gemini-1.5-flash"

6. Using LLM for Sentiment Analysis, Topic Modelling and Complaint Summarization

Now we are ready to perform sentiment analysis, topic modeling, and customer complaint summarization using a generative AI process powered by LLMs.

As a first step, let’s analyze the sample data. This comprises two sample datasets: one of customer details and another of customer complaints. This data is similar to what a Customer360 data management platform, used by a bank, would contain.

Using the TeradataML library, we have access to Teradata DataFrames that are highly performant for analysis tasks. The first dataset includes details such as the customer identification, name, city, state, and other personal information.

customer_data = DataFrame(in_schema('DEMO_ComplaintAnalysis', 'Customer_360_Details'))
customer_data

Table displaying a Customer 360 details with columns including ‘Customer Identifier,’ ‘Name,’ ‘City,’ ‘State,’ ‘Customer Type,’ ‘Product Holdings,’ ‘Total Deposit Balance,’ ‘Total Credit Balance,’ ‘Total Investments AUM,’ ‘Customer Profitability,’ ‘Customer Lifetime Value,’ ‘Bank Tenure,’ ‘Affluence Segment,’ ‘Digital Banking Segment,’ and ‘Branch Banking Segment.’ Each row represents individual customer data, highlighting different attributes. — Customer 360 Details DataFrame

The second dataset contains customer communications, we will process this data using Generative AI.

complaints_data = DataFrame(in_schema('DEMO_ComplaintAnalysis', 'Customer_360_Complaints'))
complaints_data

Table displaying customer complaints with columns such as ‘Date Received,’ ‘Product,’ ‘Sub-Product,’ ‘Issue,’ ‘Sub-Issue,’ and ‘Consumer Complaint Narrative.’ Each row contains detailed information about individual customer complaints, including the product (e.g., mortgage), sub-product, the specific issue or sub-issue, and the full narrative of the customer’s complaint. — Customer 360 Complaints DataFrame

We’ll convert our DataFrame to pandas and quickly append four columns that will house our sentiment, topic, summary and strategy produced by our LLM.

pd_df = complaints_data.to_pandas()
pd_df['Sentiment'] = ""
pd_df['Topic'] = ""
pd_df['Summary'] = ""
pd_df['Strategy'] = ""


pd_df

Customer 360 Complaints DataFrame displaying rows of customer complaints with appended empty columns. — Customer 360 Complaints DataFrame with appended columns

We then prompt engineer. Remember, a prompt is the meticulous crafting of instructions that will guide our large language model. Effective prompt engineering is important because it determines how well the model understands the task and delivers relevant outputs. For instance, a clear and concise prompt can ensure accurate sentiment categorization and precise summarization, reducing the need for post-processing or manual corrections. For all four tasks we will follow a similar format. We will provide context, iterate through all rows in our “consumer_complaint_narrative”, and provide instructions.

# Sentiment
for i in tqdm(range(len(pd_df))):
    try:
        prompt = f'''
        User prompt: 
        The following is text from a review:

        “{pd_df['consumer_complaint_narrative'][i]}”

        Categorize the review as one of the following:

        Positive
        Negative
        Neutral

        - Important: Do not add any formatting into the output.
        - Return just one of the above options, Do not return explanation
        '''
        output = model.generate_content([prompt])
        sentiment = output.candidates[0].content.parts[0].text

        pd_df['Sentiment'][i] = sentiment
    except:
        pass

We follow a similar pattern for topic categorization, summarization, and strategy.

# Topic
for i in tqdm(range(len(pd_df))):
    try:
        prompt = f'''
        User prompt:
        The following is text from a complaint:

        “{pd_df['consumer_complaint_narrative'][i]}”

        Identify the topic of the complaint and categorize into one of the following topics. Only output one of the following options:

        - Mortgage Application
        - Payment Trouble
        - Mortgage Closing
        - Report Inaccuracy
        - Payment Struggle

        - Important: Do not add any formatting into the output. For example **Mortgage Application** or **Report Inaccuracy** refrain from such formating in the response.
        - Return just one of the above options
    '''

        output = model.generate_content([prompt])
        topic = output.candidates[0].content.parts[0].text

        pd_df['Topic'][i] = topic
    except:
        pass

# Summary
for i in tqdm(range(len(pd_df))):
    try:
        prompt = f'''
            The following is text from a Bank Review:
            “{pd_df['consumer_complaint_narrative'][i]}”
            Summarize the Bank Review in one sentence
        '''

        output = model.generate_content([prompt])
        summary = output.candidates[0].content.parts[0].text

        pd_df['Summary'][i] = summary
    except:
        pass

# Strategy
for i in tqdm(range(len(pd_df))):
    try:
        prompt = f'''
        User prompt:
        The following is text from a complaint:

        “{pd_df['consumer_complaint_narrative'][i]}”

        Suggest the best course of action for the bank from the following:

        - Wealth Manager to contact customer immediately
        - Send Policy Letter from Mortgage Servicing
        - Send Policy Letter from Executive Office
        - Mortgage Banker to follow-up with Title Company for documentation and contact customer
        - Branch Manager to contact customer immediately

        - Important: Do not add any formatting into the output.
        - Return just one of the above options
    '''

        output = model.generate_content([prompt])
        strategy = output.candidates[0].content.parts[0].text

        pd_df['Strategy'][i] = strategy
    except:
        pass

We then take our customer complaint DataFrame and prepare it for joining by stripping it of any extra white spaces and combining it with our customer information DataFrame.

pd_df['Sentiment'] = pd_df['Sentiment'].apply(lambda x: x.strip())
pd_df['Topic'] =  pd_df['Topic'].apply(lambda x: x.strip())
pd_df['Summary'] = pd_df['Summary'].apply(lambda x: x.strip())
pd_df['Strategy'] = pd_df['Strategy'].apply(lambda x: x.strip())


combined_df = customer_data.to_pandas().join(pd_df)

We now have our Customer360 data integrated with our LLM analysis! We have the sentiment, topic, summary of each customer communication as well as the appropriate strategy for handling the customer communication.

7. Integrated data with customer 360

pd.set_option('display.max_colwidth', None)
combined_df[["complaint_id","Customer_ID","Sentiment","Topic","Summary", "Strategy"]]

Outcomes and Benefits

With Teradata Vantage and Google Gemini we’ve successfully demonstrated a seamless way to optimize customer communication analysis and accelerated the resolution process. This demonstrates how Generative AI can transform unstructured data into insights a lot quicker than before with traditional Machine Learning methods. With this LLM powered customer communication analysis developers can quickly unlock insights and automatically assign the best next course of action and increase customer satisfaction and retention.

Additional generative AI solutions

The table below highlights additional GenAI use cases by techniques, and industries, along with links to Jupyter notebook demos available for free on the ClearScape Analytics Experience site. The first-row groups examples of generative AI applications with unstructured data and just prompt engineering and Gemini. The other rows showcase advanced applications combining prompt engineering with RAG (Retrieval-Augmented Generation).

Teradata and Google Cloud: Ushering in a data-driven future for enterprises

There is significant value in unstructured data stored in formats such as text, audio, and more, which you can leverage to achieve this goal.

Advanced Large Language Models (LLMs), like Google’s Gemini, can simplify the process of introducing structure into unstructured data, enabling individuals and organizations to derive insights that better serve their customers.

Teradata, the Trusted AI company, helps enterprises create value by providing the most complete cloud analytics and data platform for AI.

Together, Teradata and Google Cloud offer the expertise, scale, and technology for enterprises to accelerate time-to-value and increase ROI — all while delivering trusted data and Trusted AI.

Key benefits of this collaboration for accelerating your AI innovation from development to production, include:

Get faster results from your AI/ML initiatives by quickly building and training ML models with Vertex AI and the powerful in-database analytics functions of ClearScape Analytics
Easily build and deploy powerful gen AI solutions with Teradata VantageCloud Lake, Vertex AI, and Gemini
Transform customer complaint management through advanced generative AI for precise and automated classification.

Ready to start?

ClearScape Analytics, a powerful analytics engine in Teradata Vantage, delivers unmatched performance, value, and scalability. It empowers enterprises and developers with the most powerful, open and connected AI/ML capabilities available today.

Experience ClearScape Analytics and Teradata Vantage, in a non-production setting, through ClearScape Analytics Experience.

Gemini’s ecosystem of products and models can help developers and businesses get the most out of Google AI, from building with Gemini models to using Gemini as your AI assistant.

Try Gemini 2.0 models — the latest and most advanced multimodal models from Google. See what you can build with up to a 2M token context window.

BUILD WITH GEMINI MODELS

Google AI Studio Experiment, prototype, and deploy. Google AI Studio is the fast path for developers, students, and researchers who want to try Gemini models and get started building with the Gemini Developer API.
Vertex AI Build AI agents and integrate generative AI into your applications, Google Cloud offers Vertex AI, a single, fully-managed, unified development platform for using Gemini models and other third party models at scale.

Special thanks to Daniel Herrera, Rosalie Bartlett, Merlin Yamssi, Matt Mazzarell, Chetan Hirapara, and Pratik Somwanshi for their valuable contributions to this piece.