Skip to content

Commit d9d48f9

Browse files
committed
revise local models to include llama cpp and lmstudio
1 parent 230e162 commit d9d48f9

File tree

1 file changed

+85
-17
lines changed

1 file changed

+85
-17
lines changed

units/en/unit2/continue-client.mdx

Lines changed: 85 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,9 @@ like Ollama.
99
You can install Continue from the VS Code marketplace.
1010

1111
<Tip>
12+
1213
*Continue also has an extension for [JetBrains](https://plugins.jetbrains.com/plugin/22707-continue).*
14+
1315
</Tip>
1416

1517
### VS Code extension
@@ -22,46 +24,112 @@ You can install Continue from the VS Code marketplace.
2224

2325
With Continue configured, we'll move on to setting up Ollama to pull local models.
2426

25-
### Ollama local models
27+
### Local Models
28+
29+
There are many ways to run local models that are compatible with Continue. Three popular options are Ollama, Llama.cpp, and LM Studio. Ollama is an open-source tool that allows users to easily run large language models (LLMs) locally. Llama.cpp is a high-performance C++ library for running LLMs that also includes an OpenAI-compatible server. LM Studio provides a graphical interface for running local models.
30+
31+
32+
You can access local models from the Hugging Face Hub and get commands and quick links for all major local inference apps.
33+
34+
![hugging face hub](https://cdn-uploads.huggingface.co/production/uploads/64445e5f1bc692d87b27e183/d6XMR5q9DwVpdEKFeLW9t.png)
35+
36+
<hfoptions id="local-models">
37+
<hfoption id="llamacpp">
38+
39+
Llama.cpp provides `llama-server`, a lightweight, OpenAI API compatible, HTTP server for serving LLMs. You can either build it from source by following the instructions in the [Llama.cpp repository](https://github.com/ggml-org/llama.cpp), or use a pre-built binary if available for your system. Check out the [Llama.cpp documentation](https://github.com/ggerganov/llama.cpp) for more information.
40+
41+
Once you have `llama-server`, you can run a model from Hugging Face with a command like this:
2642

27-
Ollama is an open-source tool that allows users to run large language models (LLMs)
28-
locally on their own computers. To use Ollama, you can [install](https://ollama.com/download) it and
29-
download the model you want to run with the `ollama pull` command.
43+
```bash
44+
llama-server -hf unsloth/Devstral-Small-2505-GGUF:Q4_K_M
45+
```
46+
47+
</hfoption>
48+
<hfoption id="lmstudio">
49+
LM Studio is an application for Mac, Windows, and Linux that makes it easy to run open-source models locally with a graphical interface. To get started:
3050

31-
For example, you can download the [llama 3.1:8b](https://ollama.com/models/llama-3:1b) model with:
51+
1. [Click here to open the model in LM Studio](lmstudio://open_from_hf?model=unsloth/Devstral-Small-2505-GGUF).
52+
2. Once the model is downloaded, go to the "Local Server" tab and click "Start Server".
53+
</hfoption>
54+
<hfoption id="ollama">
55+
To use Ollama, you can [install](https://ollama.com/download) it and download the model you want to run with the `ollama run` command.
56+
57+
For example, you can download and run the [Devstral-Small](https://huggingface.co/unsloth/Devstral-Small-2505-GGUF?local-app=ollama) model with:
3258

3359
```bash
34-
ollama pull llama3.1:8b
60+
ollama run unsloth/devstral-small-2505-gguf:Q4_K_M
3561
```
62+
</hfoption>
63+
</hfoptions>
64+
3665
<Tip>
37-
It is possible
38-
to use other local model provides, like [Llama.cpp](https://docs.continue.dev/customize/model-providers/more/llamacpp), and [LLmstudio](https://docs.continue.dev/customize/model-providers/more/lmstudio) by updating the
39-
model provider in the configuration files below. However, Continue has been
40-
tested with Ollama and it is recommended to use it for the best experience.
4166

42-
Details on all available model providers can be found in the [Continue documentation](https://docs.continue.dev/customize/model-providers).
67+
Continue supports various local model providers. Besides Ollama, Llama.cpp, and LM Studio you can also use other providers. For a complete list of supported providers and detailed configuration options, please refer to the [Continue documentation](https://docs.continue.dev/customize/model-providers).
68+
4369
</Tip>
4470

4571
It is important that we use models that have tool calling as a built-in feature, i.e. Codestral Qwen and Llama 3.1x.
4672

4773
1. Create a folder called `.continue/models` at the top level of your workspace
48-
2. Add a file called `llama-max.yaml` to this folder
49-
3. Write the following contents to `llama-max.yaml` and save
74+
2. Add a file to this folder to configure your model provider. For example, `local-models.yaml`.
75+
3. Add the following configuration, depending on whether you are using Ollama, Llama.cpp, or LM Studio.
76+
77+
<hfoptions id="local-models">
78+
<hfoption id="llamacpp">
79+
This configuration is for a `llama.cpp` model served with `llama-server`. Note that the `model` field should match the model you are serving.
80+
81+
```yaml
82+
name: Llama.cpp model
83+
version: 0.0.1
84+
schema: v1
85+
models:
86+
- provider: llama.cpp
87+
model: unsloth/Devstral-Small-2505-GGUF
88+
apiBase: http://localhost:8080
89+
defaultCompletionOptions:
90+
contextLength: 8192 # Adjust based on the model
91+
name: Llama.cpp Devstral-Small
92+
roles:
93+
- chat
94+
- edit
95+
```
96+
</hfoption>
97+
<hfoption id="lmstudio">
98+
This configuration is for a model served via LM Studio. The model identifier should match what is loaded in LM Studio.
99+
100+
```yaml
101+
name: LM Studio Model
102+
version: 0.0.1
103+
schema: v1
104+
models:
105+
- provider: lmstudio
106+
model: unsloth/Devstral-Small-2505-GGUF
107+
name: LM Studio Devstral-Small
108+
apiBase: http://localhost:1234/v1
109+
roles:
110+
- chat
111+
- edit
112+
```
113+
</hfoption>
114+
<hfoption id="ollama">
115+
This configuration is for an Ollama model.
50116
51117
```yaml
52-
name: Ollama Llama model
118+
name: Ollama Devstral model
53119
version: 0.0.1
54120
schema: v1
55121
models:
56122
- provider: ollama
57-
model: llama3.1:8b
123+
model: unsloth/devstral-small-2505-gguf:Q4_K_M
58124
defaultCompletionOptions:
59-
contextLength: 128000
60-
name: a llama3.1:8b max
125+
contextLength: 8192
126+
name: Ollama Devstral-Small
61127
roles:
62128
- chat
63129
- edit
64130
```
131+
</hfoption>
132+
</hfoptions>
65133
66134
By default, each model has a max context length, in this case it is `128000` tokens. This setup includes a larger use of
67135
that context window to perform multiple MCP requests and needs to be able to handle more tokens.

0 commit comments

Comments
 (0)