Entity Relations Annotation Dashboard

5 min read

It's been a long time since I first pondered a way to simplify a text for easier sentiment analysis. However, as it happens, what I had in mind was too complex to automate, therefore I had to resort to do it manually, resulting in the stoppage. However, I happened to read the articles about Relation Extractions, and which is very close to what I hoped to design. For that purpose, I was able to successfully made an Entity Relations Annotation tool in Streamlit.

Why Annotation?

Relation Extraction is the process by which we hope to extract how different words/spans entities are related. Most of works on this domain have been done with the Machine Learning or Deep Learning (mostly with BERT).

However, machines cannot learn by itself without any guides. Naturally, there should be the definition they must follow. Therefore, we need annotations for the input to ML / DL processes.

Motivation

This initial stage of Relation Extraction was motivated by Prodigy, an advanced annotation tool by Explosion.ai (opens in a new tab) (maintainer of spaCy (opens in a new tab)). Within selected domains, Prodigy can automate the annotation process, without much human involvement.

However, financial texts are one of the fields where conventional models do not work well. Also, I defined specific categories how to classify relation of each entity. Therefore, it should be done manually.

However, I chose to do in semi-automated way, that is, using Tokenizer and EntityRecognizer from spaCy, I only modify the entity relations process.

Like Prodigy, I present texts to analyze and give predefined categories to choose from. Big thanks to Streamlit (as always)!

Predefined categories / directions

For the purpose of sentiment analysis, I defined both categories and directions. Here, category focuses on action between entities, and directions identify relative sentiment between them.

Category Definitions

The following is a proposed categorization of relations between entities. (The example sentences below are not always based on the facts.)

Directions encoding

In view of sentiment analyses, directions between each entity is empirical. For the purpose, it makes sense to simplify by only(?) having three labels for the direction. (i.e., over, under, [pos/neg]-inline)

Relations format

Combining both directions and categories, relations are formed as the following format.

Direction-Category

For instance, in the first sentence (Microsoft agreed to buy stakes in Activision.), the relation between Microsoft-Activision is Under-Buy.

It is because Microsoft is given inferior sentiment in the act of buying Activision. (Inferior sentiment in financial context)

Sample Snapshot

CC BY-NC 4.0 © min park.RSS