PROBLEM STATEMENT¶

In this notebook, we explore the potential of OpenAI's GPT-4 Large Language Model (LLM) for intent classification in conversational AI.

As businesses increasingly turn to chatbots and virtual assistants, the demand for robust Natural Language Understanding (NLU) systems is more critical than ever.

Traditional NLU systems require extensive resources and large, labeled datasets, presenting significant challenges.

The introduction of LLMs like GPT-4 offers an innovative solution to streamline this process. Despite concerns about generating inaccurate responses, or "hallucinations," focusing LLMs on intent classification with predefined, compliant responses could significantly enhance our approach to customer interactions in the meantime.

Importing Necessary Libraries and Dependencies¶

In [1]:
from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score, confusion_matrix, ConfusionMatrixDisplay

# import libraries for data manipulation
import numpy as np
import pandas as pd

# import libraries for data visualization
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

pd.set_option('display.float_format', lambda x: '%.2f' % x) # To supress numerical display in scientific notations

# Library to suppress warnings
import warnings
warnings.filterwarnings('ignore')

Loading the Data¶

In [2]:
from google.colab import drive
drive.mount('/content/drive')

path_to_file = '/content/drive/My Drive/ce/telco/'
Mounted at /content/drive
In [3]:
df = pd.read_excel(path_to_file + 'intent_classification_results.xlsx')

df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 300 entries, 0 to 299
Data columns (total 11 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   assistant                 300 non-null    object 
 1   user_response             300 non-null    object 
 2   actual_intent             300 non-null    object 
 3   openai_intent             300 non-null    object 
 4   classification            300 non-null    object 
 5   openai_completion_tokens  300 non-null    int64  
 6   openai_completion_cost    300 non-null    float64
 7   openai_prompt_tokens      300 non-null    int64  
 8   openai_prompt_cost        300 non-null    float64
 9   openai_total_tokens       300 non-null    int64  
 10  openai_total_cost         300 non-null    float64
dtypes: float64(3), int64(3), object(5)
memory usage: 25.9+ KB

Data Overview¶

View the first and last 5 rows of the dataset¶

In [ ]:
pd.set_option('display.float_format', lambda x: '%.5f' % x) # To supress numerical display in scientific notations
df.head()
In [ ]:
df.tail()

View the shape of the dataset¶

In [6]:
df.shape
Out[6]:
(300, 11)

data types of the columns¶

In [7]:
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 300 entries, 0 to 299
Data columns (total 11 columns):
 #   Column                    Non-Null Count  Dtype  
---  ------                    --------------  -----  
 0   assistant                 300 non-null    object 
 1   user_response             300 non-null    object 
 2   actual_intent             300 non-null    object 
 3   openai_intent             300 non-null    object 
 4   classification            300 non-null    object 
 5   openai_completion_tokens  300 non-null    int64  
 6   openai_completion_cost    300 non-null    float64
 7   openai_prompt_tokens      300 non-null    int64  
 8   openai_prompt_cost        300 non-null    float64
 9   openai_total_tokens       300 non-null    int64  
 10  openai_total_cost         300 non-null    float64
dtypes: float64(3), int64(3), object(5)
memory usage: 25.9+ KB

Statistical Analysis¶

In [8]:
df.describe().T
Out[8]:
count mean std min 25% 50% 75% max
openai_completion_tokens 300.00000 6.43000 0.86463 5.00000 6.00000 6.00000 7.00000 8.00000
openai_completion_cost 300.00000 0.00019 0.00003 0.00015 0.00018 0.00018 0.00021 0.00024
openai_prompt_tokens 300.00000 684.86333 18.48505 644.00000 672.00000 684.00000 696.00000 775.00000
openai_prompt_cost 300.00000 0.00685 0.00018 0.00644 0.00672 0.00684 0.00696 0.00775
openai_total_tokens 300.00000 691.29333 18.56167 650.00000 678.00000 690.00000 702.25000 782.00000
openai_total_cost 300.00000 0.00704 0.00019 0.00662 0.00691 0.00703 0.00717 0.00796

Unique data¶

In [9]:
df.nunique()
Out[9]:
assistant                   200
user_response               297
actual_intent                15
openai_intent                15
classification                2
openai_completion_tokens      4
openai_completion_cost        4
openai_prompt_tokens         78
openai_prompt_cost           78
openai_total_tokens          77
openai_total_cost           114
dtype: int64

Missing Value¶

In [10]:
df.isnull().sum()
Out[10]:
assistant                   0
user_response               0
actual_intent               0
openai_intent               0
classification              0
openai_completion_tokens    0
openai_completion_cost      0
openai_prompt_tokens        0
openai_prompt_cost          0
openai_total_tokens         0
openai_total_cost           0
dtype: int64
In [11]:
#finding the number of missing data
df.isna().sum()
Out[11]:
assistant                   0
user_response               0
actual_intent               0
openai_intent               0
classification              0
openai_completion_tokens    0
openai_completion_cost      0
openai_prompt_tokens        0
openai_prompt_cost          0
openai_total_tokens         0
openai_total_cost           0
dtype: int64

Exploratory Data Analysis¶

(1) What is the total cost of the intent classification?¶

In [12]:
print('Total cost for',df['openai_total_cost'].count(),'API calls is $',df['openai_total_cost'].sum())
Total cost for 300 API calls is $ 2.11246

(2) Plot Functions¶

In [13]:
def labeled_barplot(data, feature, perc=False, n=None, order=True):
    """
    Barplot with labels at the top, with an option to sort by frequency or category

    data: dataframe
    feature: dataframe column
    perc: whether to display percentages instead of count (default is False)
    n: displays the top n category levels (default is None, i.e., display all levels)
    order: if True, sort based on frequency (y); if False, sort based on category (x)
    """

    total = len(data[feature])  # length of the column
    count = data[feature].nunique()
    if n is None:
        plt.figure(figsize=(count + 1, 5))
    else:
        plt.figure(figsize=(n + 1, 5))

    # Determine the order of categories
    if order:
        order = data[feature].value_counts().index[:n]  # Sort by frequency
    else:
        order = sorted(data[feature].unique())[:n]  # Sort by category

    plt.xticks(rotation=90, fontsize=15)
    ax = sns.countplot(
        data=data,
        x=feature,
        palette="Paired",
        order=order,
    )

    for p in ax.patches:
        if perc == True:
            label = "{:.1f}%".format(
                100 * p.get_height() / total
            )  # percentage of each class of the category
        else:
            label = p.get_height()  # count of each level of the category

        x = p.get_x() + p.get_width() / 2  # width of the plot
        y = p.get_height()  # height of the plot

        ax.annotate(
            label,
            (x, y),
            ha="center",
            va="center",
            size=12,
            xytext=(0, 5),
            textcoords="offset points",
        )  # annotate the percentage

    plt.show()  # show the plot

(3) Distribtion of actual intents¶

In [14]:
labeled_barplot(df, 'actual_intent', perc=False)
No description has been provided for this image
In [15]:
# Actual Intent Unique Values
df['actual_intent'].value_counts()
Out[15]:
actual_intent
confirmation                32
reschedule                  28
tech call before arrival    22
wrong person                22
wrong time                  21
out of scope                20
call request                19
already rescheduled         19
where is the tech           19
issue fixed                 18
issue not fixed             18
cancellation                17
contact details provided    16
stop communications         15
different time slot         14
Name: count, dtype: int64

(4) List of distincty (unique) intents¶

In [16]:
df['actual_intent'].unique()
Out[16]:
array(['call request', 'out of scope', 'wrong time',
       'tech call before arrival', 'confirmation',
       'contact details provided', 'cancellation', 'issue fixed',
       'different time slot', 'reschedule', 'issue not fixed',
       'already rescheduled', 'stop communications', 'wrong person',
       'where is the tech'], dtype=object)
In [17]:
labels_all = ['call request', 'out of scope', 'wrong time',
       'tech call before arrival', 'confirmation',
       'contact details provided', 'cancellation', 'issue fixed',
       'different time slot', 'reschedule', 'issue not fixed',
       'already rescheduled', 'stop communications', 'wrong person',
       'where is the tech']
In [18]:
sorted_labels_all = sorted(labels_all)

(5) Diustribution of predicted intents¶

In [19]:
labeled_barplot(df, 'openai_intent', perc=False)
No description has been provided for this image

Overall Performance of AI Model¶

In [20]:
#from sklearn.metrics import accuracy_score, recall_score, precision_score, f1_score

def model_performance_classification_sklearn(pred, target):
    acc = accuracy_score(target, pred)  # to compute Accuracy
    recall = recall_score(target, pred, average='macro')  # to compute Recall
    precision = precision_score(target, pred, average='macro')  # to compute Precision
    f1 = f1_score(target, pred, average='macro')  # to compute F1-score

    # creating a dataframe of metrics
    df_perf = pd.DataFrame(
        {
            "Accuracy": acc,
            "Recall": recall,
            "Precision": precision,
            "F1": f1
        },
        index=[0],
    )

    return df_perf
In [21]:
pd.set_option('display.float_format', lambda x: '%.2f' % x) # To supress numerical display in scientific notations

# Calculate Confusion Matrix
conf_mat = confusion_matrix(df['actual_intent'], df['openai_intent'], labels=sorted_labels_all)

# Initialize ConfusionMatrixDisplay
disp = ConfusionMatrixDisplay(confusion_matrix=conf_mat, display_labels=sorted_labels_all)

# Plotting
fig, ax = plt.subplots(figsize=(16, 10))  # Set figure size
disp.plot(cmap='YlGnBu', values_format='d', ax=ax)

# Enhancements
ax.set_title('Confusion Matrix', fontsize=16)
ax.set_xlabel('Predicted Label', fontsize=12)
ax.set_ylabel('True Label', fontsize=12)  # Adjusted font size
plt.xticks(fontsize=8, rotation=90)  # Adjusted font size and rotation
plt.yticks(fontsize=8)
plt.show()
No description has been provided for this image
In [22]:
pd.set_option('display.float_format', lambda x: '%.2f' % x) # To supress numerical display in scientific notations
model_performance_classification_sklearn(df['openai_intent'],df['actual_intent'])
Out[22]:
Accuracy Recall Precision F1
0 0.94 0.93 0.95 0.94

AI Performance per Intent¶

In [23]:
# Function to calculate and print metrics
def calculate_metrics(group):
    accuracy = accuracy_score(group['actual_intent'], group['openai_intent'])
    recall = recall_score(group['actual_intent'], group['openai_intent'], average='weighted', zero_division=0)
    precision = precision_score(group['actual_intent'], group['openai_intent'], average='weighted', zero_division=0)
    f1 = f1_score(group['actual_intent'], group['openai_intent'], average='weighted', zero_division=0)
    return pd.Series({'accuracy':accuracy,'Recall': recall,'Precision': precision, 'F1 Score': f1})

# Group by date and calculate metrics for each group
metrics = df.groupby('actual_intent').apply(calculate_metrics)
metrics
Out[23]:
accuracy Recall Precision F1 Score
actual_intent
already rescheduled 1.00 1.00 1.00 1.00
call request 0.95 0.95 1.00 0.97
cancellation 0.94 0.94 1.00 0.97
confirmation 1.00 1.00 1.00 1.00
contact details provided 0.75 0.75 1.00 0.86
different time slot 0.93 0.93 1.00 0.96
issue fixed 1.00 1.00 1.00 1.00
issue not fixed 0.89 0.89 1.00 0.94
out of scope 0.90 0.90 1.00 0.95
reschedule 0.96 0.96 1.00 0.98
stop communications 0.93 0.93 1.00 0.97
tech call before arrival 0.91 0.91 1.00 0.95
where is the tech 0.89 0.89 1.00 0.94
wrong person 0.95 0.95 1.00 0.98
wrong time 1.00 1.00 1.00 1.00