Enhancing BERTopic Topic Classification with ChatGPT (feat. Circle Packing)

2 min read

Before begin

This is an ongoing project to enhance BERTopic Topic Classification leveraging ChatGPT.

ChatGPT and BERTopic

As I wrote in the previous article, the recent version of BERTopic started to support ChatGPT for its extracted Topic Classification results. With ChatGPT's excellent capacity to explain given available information, generating a well-understood topic sentence was the smart move.

Caveat? ChatGPT's infamous ability to generate a plausible explanation, even when it's not true at all.

Why ChatGPT?

Arguably, ChatGPT is the most hyped generative AI which unlocked and still does a lot of opportunities in the field.

However, it is not the only available tool in the market and without a free credit of OpenAPI, there's no way to use ChatGPT in a programmatic way for free.

With rate limit, there are other companies which provide similar functions to ChatGPT. I tried Cohere (opens in a new tab) as it offers a free and rate-limiting pricing option.

Unfortunately, the result was not so impressive especially in Korean language texts.

Visualization Methodologies

As before, I drew the Topic Classification result using visx's Circle Packing chart to depict the hierarchy among topics.

However, unlike before, I replaced the Wordcloud chart for hierarchy-less Circle Packing to show each individual topic.

I changed in this way as I believe that ChatGPT-generate topics have enough to say and Wordcloud is not very useful more or less especially even unigram word tokenizations.

Hierarchical Topic Representations

US Topics

US

KR Topics

KR

Individual Topic Representations

US Topics

US

KR Topics

KR

Closing

Obviously, ChatGPT alone might not be perfect for the text analytics, but its powerful explainability is the game changer.

As with any other ML tasks, finding the best (hyperparameter) setting for the topic clustering is still the big issue.

CC BY-NC 4.0 © min park.RSS