Using ollama for AI Agents

2 min read

ollama - A companion to local LLMs

Ollama might not be the only available tool out there for local LLMs but is arguably one of the crucial players.

Why?

As a Apple Silicon user, lack of CUDA and the fact that many tools requiring or assuming CUDA to make use of GPU acceleration had been a deal breaker for me.

Admittedly, PyTorch (opens in a new tab) started to support Apple Silicon via the MPS (opens in a new tab) (Metal Performance Shader).

However, its use was not so straightforward and has been subpar than our solutions including MLX (opens in a new tab) or Ollama.

Why not MLX, then?

As explained in the previous post, MLX is not supported well in Huggingface's smolagents (opens in a new tab) framework, which is my current default choice for AI Agents use.

In contrast, Ollama, powered by LiteLLM via LiteLLMModel (opens in a new tab), smoothly supports most frameworks including smolagents and llamaindex.

Most importantly, as with the MLX, Ollama correctly uses Apple Silicon's GPU without even noticing.

Take notice of tags

While you can use most of the models (opens in a new tab) available in Ollama for AI Agents uses, there should be considerations.

In the age of Tool-based AI Agents, it is highly likely that one would create or play with tool-capable AI Agents.

In turn, using a tool-capable LLM models.

Ollama tags tool-capable models as tools, while one without that tag cannot be used to use tools in an AI Agents project.

Additionally, for vision- or thinking-capable models, each is tagged as vision or thinking.

For instance, gemma3, which is capable of visions is tagged as vision and qwen3 is with thinking.

CC BY-NC 4.0 © min park.RSS