NLP, Theory and Reality

Sun Mar 21 2021•3 min read

Looking back the past 4 months

I was lucky to be able to start a new job with one of the hottest job functions, NLP along with Data Analysis. While I had been learning the fast-growing field, I had had no real experience as I was just enrolled in MOOCs. There have been gulf of differences between the fancy theoretical models and messier struggles.

What you expect in NLP

If you are like me before I kickstarted my new job four months ago, you might have heard of or learned a bit of NLP models powered by Deep Neural Networks, notably LSTM (Long/Short-Term Memory), RNN (Recurrent Neural Network) or CNN (Convolutional Neural Network). Additionally, more recently, Transformer models such as WordVec, BERT and etc.

The tool I started in practice

My first duty was to conduct a Bag-of-Words analyses rather than any fancier Neural Networks-powered or even Machine Learning-aided works. I intially resisted to the idea that I should do in an archaic way given that we have more better and fancier tools. My mentor advised that what matters is the result not the tools. While it seems that non-ML methods look antiquated, it can alleviate your burdens of doing more advanced analyses (Remember! DNN is data-hungry).

Striking results in terms of outcomes

Theoretically, more advanced methods should give you better outcomes, e.g., more precise predictions. However, the fundamental axiom (I made this up) of the Data Science tells us that Garbage in, garbage out!. In other words, no matter how good method you use, you can't improve the results unless you have cleaner input data. This is why most practitioners preach that the most important duty in the Data Science is data preprocessing.

Is Neural Network useless, then?

One would like to hastily conclude that we don't need all the fancy cutting-edge Data Science models. They require you to have laborious data cleaning, immense amount of data to train, then what the fuss?

Well, as it turned out, that's not true either. While gathering enough data for training fancy models is hard and non-practical for resource-stricken practitioners, well-made models are still worth.

Drawbacks of simple models are evident that they can't figure out the natural language patterns, which only can be improved by advanced ones such as Google BERT-powered models. At times, they predict texts better in the same domain where it was trained. Again however, take with a pinch of salt.

Concluding remarks

No single model is superior to any others however it is trained. Sometimes, the results I got were so counterintutive as simple model delivered better ones. Rather than being stuck with the idea that you need to devise a better model, it is oftentimes better to tweak your analyses objects. For instance, try to analyze title rather than full texts as it often shout the whole idea with the clear stance.