spaCy performance enhancement on Apple Silicon

Wed Feb 15 2023•3 min read

These days, I have been working with spaCy's spacy-experimental module. As this version includes a transformer-based pipeline, which was trained with GPU, the speed issue is pretty critical for non-CUDA users (or as many perceive). Since the release of Apple Silicon models, many have been wondering how its built-in GPU can be used to speed up the ML processes (myself included).

According to spaCy's GPU Support Troubleshooting FAQ on GitHub (opens in a new tab), with Pytorch version 1.13+ alongside spacy-transformer 1.18+, Apple Silicon users can take advantage of their GPUs.

No special setup necessary?

Not really. While not so complicated, it does not necessarily mean that you can automatically use GPU on spaCy. Some requirements and additional installations as well as initializing GPU on spaCy-side are needed.

Not Available on Big Sur

The support of GPU of Apple Silicon on Pytorch is via Metal Performance Shaders (MPS), using Apple's Metal acceleration. However, according to Apple's document (opens in a new tab), the support of MPS on Pytorch requires the following.

macOS 12.3+ (macOS Monterey)
Python 3.7+
Xcode 13.3.1+
Xcode command-line tool: xcode-select --install

This requirement is critical as whatever methods I tried except the upgrade of the OS, spaCy couldn't enable GPU. I assume the requirement of macOS 12.3 is with the version of Xcode, as nearly every (major) version up of it has required newer version of Mac OS.

Install spaCy with extras

To make spaCy use of the MPS (or other additional beneficial features), normal installation is not enough. Rather, it should be done as follows (or install such dependencies manually).

pip install spacy[apple]

Enabling GPU on spaCy

Note: From my experience on macOS Monterey (minimum OS requirement), With pytorch 1.3.1, MPS support is still experimental or not yet stable. Be advised and wait until a smooth-working version will be released.

Same as in using NVIDA GPUs, spaCy needs to know which processing unit is to be used. Namely, use spacy.prefer_gpu() or spacy.require_gpu() command.

Note that spacy.prefer_gpu() command outputs boolean without emitting any error message while the latter complains when GPU is not available on the machine. Also be aware that when using Jupyter, spacy.load() and initializing GPU should be in the same cell.

import spacy
spacy.prefer_gpu()
nlp = spacy.load('en_corerefence_web_trf')

Semi-optimal optimization without MPS

For those refuses to (like me until recently) or cannot upgrade the macOS, (or MPS support is unstable to use) MPS utilization is out of the option. However, they can still make use of AMX Matrix Multiplication block. This processing unit can reduce the time compared to solely relying on CPU.

Fortunately, the only requirement is to install spaCy with AppleOps extras (pip install spacy[apple]). Other than that, no other tasks are required.

Type	CPU	Acceleration	Time took
Desktop	Intel i5-9500 3GHz	N/A	30 mins
M1 MBA	Apple M1 w/ 8 cores	AMX	15 mins
Colab (Free)	Unknown	GPU	3 mins

Obviously, my M1 didn't beat Colab with GPU on, which is reasonable. From what I heard, MPS can speed up to 2x that of AMX. So, it might've taken 7 mins which is still more than twice slower than on Colab.

However, that's not entirely bad considering the fact that this model is the cheapest option (except memory bump to 16GB) among Apple Silicon-based family.