How autocomplete suggestions work

Last updated March 18, 2026

Typeahead watches as you type and predicts what you are likely to write next. The prediction appears as ghost text inline, right where your cursor is.

The model runs on your Mac

Suggestions are generated entirely on-device using llama.cpp, an open-source runtime built for efficient local inference. The model file lives in your Application Support folder. When Typeahead starts, it loads the model into RAM.

Nothing you type is sent to a server. There are no cloud API calls during normal operation.

What context does it use?

Typeahead reads the text in the active text field, up to the model's context window. It does not see other open apps, your clipboard or anything else on screen. The context is scoped to what you are currently typing.

Why a brief pause before suggestions?

Typeahead waits for a natural pause in your typing before generating. This avoids interrupting mid-sentence and reduces wasted inference when you are still deciding what to write.

The pause is typically less than a second. Suggestion generation itself takes one to two seconds on Apple Silicon with the recommended Gemma 3 4B model.

What if the suggestion is wrong?

Keep typing. Typeahead updates the suggestion based on what you add. If no good completion exists, the ghost text clears. You can also press Esc to dismiss explicitly.

Typeahead does not learn from what you accept or reject. Suggestions come entirely from the model, not from your history.

Does it get better over time?

Not through usage. The quality of suggestions comes from the model you choose, not from adapting to you. You can switch to a different model from Settings > Model if the default does not match your writing style.

Still have questions?

Contact Support