Visualization of BERTopic with Circle packing and Wordcloud (feat. visx)
Motivation
As a text analytics practioner with a focus on financial news, Topic Classification is one of the core jobs for me. For that purpose, BERTopic
changed my job much easier compared to when I had to rely on Word2Vec
. However, it's not a panacea and I've been tinkering with it to get a better insight.
With visualize_hierarchy()
method, BERTopic
provides a way to view hierarchy in the form of dendrogram. However, as dendrogram depicts nodes with varying depths, it's rather difficult to grasp the hierarchy.
Methodologies
This leads me to find more plausible ways to capture the hierarchy in not so complicated way.
Initial try: Collapsible Tree
Source: visx Gallery - Link Types (opens in a new tab)
Initially, what I wanted to employ was Collapsible Tree. As the name suggests, in Collapsible Tree, nodes which have children can be collapsed simply by clicking and expanded by doing it again.
As this tree seemed to save the space by collapsing nodes of no interests, I initially thought that would be a good candidate to use instead of Dendrogram.
However, with deep levels and plenty of nodes, the layout is messy and the linear structure (left to right, for instance) didn't make it a decent choice for my purpose.
Circle packing
Next, as I was browsing D3 Observable Gallery (opens in a new tab), I came across another brilliant hierarchical chart, namely Circle Packing.
This seemed more plausible since multiple levels are compacted into a circle with several children in it, thereby saving spaces than Trees.
Moreover, with added interactive elements (Tooltips, Highlights when MouseOver) provided by visx
, I settled with this type of chart.
Wordcloud
In general, I don't like Wordcloud.
While it can be effective when used for the purpose, it has been used too many times without real contexts (remember WordArt ?).
I might have been too harsh with Wordcloud, as I tried to avoid it at all costs. However, for the Topic Classification and Brainstorming, I came to think that this might be it.
For getting an idea of what the news outlet has talked about, Wordcloud can shed a light even without precise preprocessing steps.
CC BY-NC 4.0 © min park.RSS