Enhancing BERTopic Topic Classification with ChatGPT (feat. Circle Packing)
Before begin
This is an ongoing project to enhance BERTopic
Topic Classification leveraging ChatGPT
.
ChatGPT
and BERTopic
As I wrote in the previous article, the recent version of BERTopic
started to support ChatGPT
for its extracted Topic Classification results. With ChatGPT
's excellent capacity to explain given available information, generating a well-understood topic sentence was the smart move.
Caveat? ChatGPT
's infamous ability to generate a plausible explanation, even when it's not true at all.
Why ChatGPT
?
Arguably, ChatGPT
is the most hyped generative AI which unlocked and still does a lot of opportunities in the field.
However, it is not the only available tool in the market and without a free credit of OpenAPI, there's no way to use ChatGPT
in a programmatic way for free.
With rate limit, there are other companies which provide similar functions to ChatGPT
. I tried Cohere (opens in a new tab) as it offers a free and rate-limiting pricing option.
Unfortunately, the result was not so impressive especially in Korean language texts.
Visualization Methodologies
As before, I drew the Topic Classification result using visx
's Circle Packing chart to depict the hierarchy among topics.
However, unlike before, I replaced the Wordcloud chart for hierarchy-less Circle Packing to show each individual topic.
I changed in this way as I believe that ChatGPT
-generate topics have enough to say and Wordcloud is not very useful more or less especially even unigram word tokenizations.
Hierarchical Topic Representations
US Topics
KR Topics
Individual Topic Representations
US Topics
KR Topics
Closing
Obviously, ChatGPT
alone might not be perfect for the text analytics, but its powerful explainability is the game changer.
As with any other ML tasks, finding the best (hyperparameter) setting for the topic clustering is still the big issue.
CC BY-NC 4.0 © min park.RSS