You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: units/en/unit2/continue-client.mdx
+85-17Lines changed: 85 additions & 17 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -9,7 +9,9 @@ like Ollama.
9
9
You can install Continue from the VS Code marketplace.
10
10
11
11
<Tip>
12
+
12
13
*Continue also has an extension for [JetBrains](https://plugins.jetbrains.com/plugin/22707-continue).*
14
+
13
15
</Tip>
14
16
15
17
### VS Code extension
@@ -22,46 +24,112 @@ You can install Continue from the VS Code marketplace.
22
24
23
25
With Continue configured, we'll move on to setting up Ollama to pull local models.
24
26
25
-
### Ollama local models
27
+
### Local Models
28
+
29
+
There are many ways to run local models that are compatible with Continue. Three popular options are Ollama, Llama.cpp, and LM Studio. Ollama is an open-source tool that allows users to easily run large language models (LLMs) locally. Llama.cpp is a high-performance C++ library for running LLMs that also includes an OpenAI-compatible server. LM Studio provides a graphical interface for running local models.
30
+
31
+
32
+
You can access local models from the Hugging Face Hub and get commands and quick links for all major local inference apps.
33
+
34
+

35
+
36
+
<hfoptionsid="local-models">
37
+
<hfoptionid="llamacpp">
38
+
39
+
Llama.cpp provides `llama-server`, a lightweight, OpenAI API compatible, HTTP server for serving LLMs. You can either build it from source by following the instructions in the [Llama.cpp repository](https://github.com/ggml-org/llama.cpp), or use a pre-built binary if available for your system. Check out the [Llama.cpp documentation](https://github.com/ggerganov/llama.cpp) for more information.
40
+
41
+
Once you have `llama-server`, you can run a model from Hugging Face with a command like this:
26
42
27
-
Ollama is an open-source tool that allows users to run large language models (LLMs)
28
-
locally on their own computers. To use Ollama, you can [install](https://ollama.com/download) it and
29
-
download the model you want to run with the `ollama pull` command.
LM Studio is an application for Mac, Windows, and Linux that makes it easy to run open-source models locally with a graphical interface. To get started:
30
50
31
-
For example, you can download the [llama 3.1:8b](https://ollama.com/models/llama-3:1b) model with:
51
+
1.[Click here to open the model in LM Studio](lmstudio://open_from_hf?model=unsloth/Devstral-Small-2505-GGUF).
52
+
2. Once the model is downloaded, go to the "Local Server" tab and click "Start Server".
53
+
</hfoption>
54
+
<hfoptionid="ollama">
55
+
To use Ollama, you can [install](https://ollama.com/download) it and download the model you want to run with the `ollama run` command.
56
+
57
+
For example, you can download and run the [Devstral-Small](https://huggingface.co/unsloth/Devstral-Small-2505-GGUF?local-app=ollama) model with:
32
58
33
59
```bash
34
-
ollama pull llama3.1:8b
60
+
ollama run unsloth/devstral-small-2505-gguf:Q4_K_M
35
61
```
62
+
</hfoption>
63
+
</hfoptions>
64
+
36
65
<Tip>
37
-
It is possible
38
-
to use other local model provides, like [Llama.cpp](https://docs.continue.dev/customize/model-providers/more/llamacpp), and [LLmstudio](https://docs.continue.dev/customize/model-providers/more/lmstudio) by updating the
39
-
model provider in the configuration files below. However, Continue has been
40
-
tested with Ollama and it is recommended to use it for the best experience.
41
66
42
-
Details on all available model providers can be found in the [Continue documentation](https://docs.continue.dev/customize/model-providers).
67
+
Continue supports various local model providers. Besides Ollama, Llama.cpp, and LM Studio you can also use other providers. For a complete list of supported providers and detailed configuration options, please refer to the [Continue documentation](https://docs.continue.dev/customize/model-providers).
68
+
43
69
</Tip>
44
70
45
71
It is important that we use models that have tool calling as a built-in feature, i.e. Codestral Qwen and Llama 3.1x.
46
72
47
73
1. Create a folder called `.continue/models` at the top level of your workspace
48
-
2. Add a file called `llama-max.yaml` to this folder
49
-
3. Write the following contents to `llama-max.yaml` and save
74
+
2. Add a file to this folder to configure your model provider. For example, `local-models.yaml`.
75
+
3. Add the following configuration, depending on whether you are using Ollama, Llama.cpp, or LM Studio.
76
+
77
+
<hfoptionsid="local-models">
78
+
<hfoptionid="llamacpp">
79
+
This configuration is for a `llama.cpp` model served with `llama-server`. Note that the `model` field should match the model you are serving.
80
+
81
+
```yaml
82
+
name: Llama.cpp model
83
+
version: 0.0.1
84
+
schema: v1
85
+
models:
86
+
- provider: llama.cpp
87
+
model: unsloth/Devstral-Small-2505-GGUF
88
+
apiBase: http://localhost:8080
89
+
defaultCompletionOptions:
90
+
contextLength: 8192# Adjust based on the model
91
+
name: Llama.cpp Devstral-Small
92
+
roles:
93
+
- chat
94
+
- edit
95
+
```
96
+
</hfoption>
97
+
<hfoption id="lmstudio">
98
+
This configuration is for a model served via LM Studio. The model identifier should match what is loaded in LM Studio.
99
+
100
+
```yaml
101
+
name: LM Studio Model
102
+
version: 0.0.1
103
+
schema: v1
104
+
models:
105
+
- provider: lmstudio
106
+
model: unsloth/Devstral-Small-2505-GGUF
107
+
name: LM Studio Devstral-Small
108
+
apiBase: http://localhost:1234/v1
109
+
roles:
110
+
- chat
111
+
- edit
112
+
```
113
+
</hfoption>
114
+
<hfoption id="ollama">
115
+
This configuration is for an Ollama model.
50
116
51
117
```yaml
52
-
name: Ollama Llama model
118
+
name: Ollama Devstral model
53
119
version: 0.0.1
54
120
schema: v1
55
121
models:
56
122
- provider: ollama
57
-
model: llama3.1:8b
123
+
model: unsloth/devstral-small-2505-gguf:Q4_K_M
58
124
defaultCompletionOptions:
59
-
contextLength: 128000
60
-
name: a llama3.1:8b max
125
+
contextLength: 8192
126
+
name: Ollama Devstral-Small
61
127
roles:
62
128
- chat
63
129
- edit
64
130
```
131
+
</hfoption>
132
+
</hfoptions>
65
133
66
134
By default, each model has a max context length, in this case it is `128000` tokens. This setup includes a larger use of
67
135
that context window to perform multiple MCP requests and needs to be able to handle more tokens.
0 commit comments