Choosing a Model
The Gen AI Lab course standardizes on the IBM Granite 4.0 Tiny Preview family because it pairs great latency on commodity CPUs with the tool-calling features we depend on for agentic AI.
Why Granite 4.0 Tiny Preview?
- CPU-friendly – The model fits in memory on a fast laptop or a single devcontainer node, so every participant can reproduce the labs without a GPU.
- Reliable tool use – Granite exposes structured tool calls out of the box, which lets us demonstrate deterministic agents without bolting on brittle prompt hacks.
- Enterprise pedigree – IBM ships reviewable training data documentation and consistent versioning that matches how regulated customers vet foundation models.
Typical Lab Setup
- Download the Tiny Preview weights and accompanying tokenizer files from the IBM announcement page.
- Register the artifact with your inference runtime (Ollama, LM Studio, or a custom Rust runner) using the same name that the exercises reference.
- Enable tool routing in the runtime so JSON arguments are emitted directly into your Axum handlers.
- Capture a baseline latency trace so you can compare optimizations later in the course.
Next Steps
Once Granite is installed locally, move on to the runtime-specific docs (such as Ollama or Docker Compose) to attach the Bionic UI and start sending agentic prompts through the toolchain.