@@ -117,34 +117,9 @@ <h2 style="color: #f0f0f0;" align="center">Manage VRAM</h2>
117
117
and re-enable the graphics adapters on your motherboard. The name of the specific setting can vary so check your specific
118
118
motherboard's documentation.</ p >
119
119
120
- < h2 style ="color: #f0f0f0; " align ="center "> Select an Appropriate Embedding Model </ h2 >
120
+ < h2 style ="color: #f0f0f0; " align ="center "> See The User Guide Embedding Models Button </ h2 >
121
121
122
- < p > Previously, the < code > instructor</ code > models performed best IMHO. However, I have since corrected a mistake in
123
- my code and now recommend using the < code > BGE v1.5</ code > models for 90% of use cases. They perform just as good and
124
- use less memory. Also, < code > all-mpnet-base-v2</ code > is good and is low-memory. Here are some resources to read:</ p >
125
-
126
- < p > < b > https://www.sbert.net/docs/pretrained_models.html</ b > </ p >
127
- < p > < b > https://instructor-embedding.github.io/</ b > </ p >
128
- < p > < b > https://github.com/FlagOpen/FlagEmbedding</ b > </ p >
129
- < p > < b > https://huggingface.co/thenlper/gte-large</ b > </ p >
130
- < p > < b > https://huggingface.co/jinaai/jina-embedding-l-en-v1</ b > </ p >
131
-
132
- < h2 style ="color: #f0f0f0; " align ="center "> Select the Appropriate Model Within LM Studio</ h2 >
133
-
134
- < p > My program uses the embedding model to create the database and subsequently obtain "context" from it, which is
135
- then forwarded to the LLM within LM Studio along with your question, for an answer.
136
- The embedding model (not the LLM) is responsible for the quality of the context and it is overwhelmingly this quality
137
- that determines the quality of the answer you get from LM Studio. Therefore, if VRAM is short, prioritize a higher
138
- quality embedding model over a larger LLM. Even a 7B model quantized to 8-bit can be overkill.</ p >
139
-
140
- < p > This is the smallest model that still works decently IMHO, but my current overall favorite is Mistral:</ p >
141
-
142
- < p > < b > https://huggingface.co/TheBloke/TinyLlama-1.1B-Chat-v0.3-GGUF</ b > </ p >
143
-
144
- < p > There's one caveat, if the documents in your vector database are highly technical (e.g. medical or legal documents),
145
- a larger LLM might provide some benefit because of its increased vocabulary. Just experiment.</ p >
146
-
147
- < p > Also, remember that my program only supports llama-based models that follow the llama "prompt format."</ p >
122
+ < p > See the new Embedding Models portion of the User Guide.</ p >
148
123
149
124
< h2 style ="color: #f0f0f0; " align ="center "> Select the Appropriate Transcription Model and Quantization</ h2 >
150
125
@@ -163,22 +138,6 @@ <h2 style="color: #f0f0f0;" align="center">Load LM Studio After Creating Databas
163
138
querying it; therefore, don't load a model into LM Studio until after creating the database.
164
139
This will reduce the chance that you run out of VRAM when creating the database.</ p >
165
140
166
- < h2 style ="color: #f0f0f0; " align ="center "> Ask the Right Questions</ h2 >
167
-
168
- < p > Modify your question if you don't get a good answer. Sometimes there's a big difference between
169
- "What is the statute of limitations for defamation?" versus "What is the statute of limitations for a defamation
170
- action if the allegedly defamatory statement is in writing as opposed to verbal?" Experiment with how specific you are.</ p >
171
-
172
- < p > My previous advice was to not ask multiple questions, but now that I've added an option to increase the number of
173
- "contexts" from the database to the LLM, this is less stringent. I now encourage you ask longer-winded questions and even
174
- general descriptions of the types of information you're looking for (not strictly a question you see). For reference, here
175
- are my prior instructions:</ p >
176
-
177
- < p > < i > Don't use multiple questions. For example, the results will be poor if you ask "What is the statute of limitations for a
178
- defamation action?" AND "Can the statute of limitations tolled under certain circumstances?" at the same time. Instead,
179
- reformulate your question into something like: "What is the statute of limitations for a defamation and can it be tolled
180
- under certain circumstances?" Again, just experiment and DO NOT assume that you must use a larger LLM or embedding model.</ i > </ p >
181
-
182
141
< h2 style ="color: #f0f0f0; " align ="center "> Ensure Sufficient Context Length for the LLM</ h2 >
183
142
184
143
< p > Rarely, the server log within LM Studio might give you an error stating that the context is too long. Increase the maximum
0 commit comments