Using spaCy for Korean PoS tagging

2 min read

While spaCy has been around for a while and it does a good job with its available pipelines, for Korean language jobs, it was out of option for a while.

However, with the version 3.3 official release, the wait is over! This is the first version it is shipped with Korean language pipelines. In other words, it natively supports Korean language model and therefore no need to use external module for that purpose (mecab for instance).

While it might not be perfect nor provides an integrated analysis method for all languages, I believe it is a big step to take advantage of the spaCy toolkit.

For a stint of experiment, I found the following as pros and cons of this new release.

Pros

Cons

Unless the explosion team will standardize the different naming for NER, manual conversion is still be required (plus manual separation of core words such as VERBS or NOUNS). However, with few works added, this will open the door for more convenient analyses.

CC BY-NC 4.0 © min park.RSS