Skip to content

Commit ace34a8

Browse files
authored
v2.6.4
1 parent aa77643 commit ace34a8

File tree

1 file changed

+50
-53
lines changed

1 file changed

+50
-53
lines changed

src/User_Manual/embedding_models.html

Lines changed: 50 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -164,97 +164,94 @@ <h1>Embedding Models</h1>
164164

165165
<h2 id="overview">Overview</h2>
166166

167-
<p><b>To get the most out of this program, it's crucial to choose the right embedding model based on the type of task.
168-
Choosing the wrong model can lead to poor or incoherent results. Remember, the LLM's response is only as good as the
169-
context you provide via the embedding model.</b></p>
170-
171-
<p>An embedding model is loaded into memory (and then immediately unloaded when not needed) for the following tasks:
167+
<p>My program loads an embedding model into memory for:
172168
<ol>
173-
<li>Creating the vector database; and</li>
169+
<li>Creating the vector database</li>
174170
<li>Querying the vector database before your question and the "context" it obtains are sent to the LLM for an answer.</li>
175171
</ol>
176-
177172

173+
<p><b>To get the most out of this program, it's crucial to choose the right embedding model. Remember, the LLM's
174+
response is only as good as the context you provide via the embedding model.</b></p>
175+
178176
<h2 id="model">Choosing the Correct Model</h2>
179177

180-
<p>The first rule of embedding models is experiment. Make sure and review the "Download Model" and the characteristics
181-
of the various embedding models. It directly relates to these instructions. With that being said, there are too many
182-
different types of tasks to prescribe a one-size-fits-all . However, based my research and testing the following rules apply:</p>
178+
<p>The first rule of embedding models is experiment. Make sure and review the list of models displayed when you click
179+
"Download Model". These instructions directly relate to the information displayed there. With that being said, there
180+
is no one-size-fits-all . However, based my research and testing the following rules apply:</p>
183181

184182
<h2 id="semantic">Semantic Search</h2>
185183

186-
<p>Semantic search is a general term that refers to obtain results based off questions. This includes RAG and other tasks
187-
(this program is primarily geared towards RAG). All of the embedding models with the "Semantic search" description
188-
will perform reasonably well for question answering, in fact, they're trained for this purpose.</p>
184+
<p>Semantic search generally refers to obtaining results that are similar to your query This includes RAG and other
185+
tasks, but this program is geared towards RAG. All of the embedding models with the "Semantic search" description
186+
will perform reasonably well for this with a few nuances.</p>
189187

190-
<p>Models with <code>msmarco</code> in their name are especially good at providing long results to short questions - e.g.</p>
188+
<p>Models with <code>msmarco</code> in their name are specifically trained to provide longer results when
189+
given a short question:</p>
191190

192-
<p><b>What does this legal treatise say about the elements of a defamation claim?</b></p>
193-
<p><b>What kind of troubles does this character encounter throughout this book?</b></p>
191+
<p><b>Tell me everything that this legal treatise say about the elements of a defamation claim?</b></p>
192+
<p><b>What are all the kinds of troubles that the main character from the book experiences?</b></p>
194193

195-
<p><code>msmarco</code> models are specifically trained for "asymmetric semantic search," which means a search where your
196-
question is short but the result you want is expected to be long and comprehensive. Other models with the "Semantic search"
197-
description are GENERALLY "symmetric semantic search" focused, which means that they tend to produce results of the same
198-
length of your question.</p>
194+
<p>This is called "asymmetric semantic search. Other models with the "Semantic search" description are GENERALLY
195+
"symmetric semantic search," which means that they tend to produce results of the same length of your question.</p>
199196

200-
<p>Models with <code>multi-qa</code> in their name have specifically been trained on question answering and are similar in
201-
function to <code>msmarco</code> models. Experiment.</p>
197+
<p>In addition, models with <code>multi-qa</code> in their name have specifically been trained on question answering
198+
and are similar in function to <code>msmarco</code> models. You'll just have to experiment with the different
199+
models as well as how you structure your query.</p>
202200

203-
<p>More recent (and much larger) models like ones beginning with <code>gtr</code> (to give one example) are arguably
204-
just as efficient in providing long answers to a short question. The most in recent history has been towards models good
205-
at everything, but just experiment!</p>
201+
<p>With that being said, more recent larger models like ones beginning with <code>gtr</code> (to give one example)
202+
claim to be just as efficient as "specialist" models. The recent trend has been for larger models good at everything.</p>
206203

207204
<h2 id="clustering">Clustering or Semantic Search</h2>
208205

209-
<p>The phrase "clustering or semantic search" is taken from the "Sentence Transformers website, the organization that created
210-
a majority of the models used in this program. What it basically means is "well rounded." They are good at question answering
211-
as well as other typical embedding models tasks, but for purposes of this program you should think of them as "well rounded."</p>
206+
<p>Models with the description of "clustering or semantic search" basically means "well rounded." They are good at
207+
question answering as well as other tasks, but for purposes of this program you should think of them as "well rounded."</p>
212208

213-
<p>The <code>all-mpnet-base-v2</code> model is widely considered the best of its size in this category, but again, experiment.
214-
You might find that it performs better than a larger model geared towards "semantic search" or vice versa.</p>
209+
<p>The <code>all-mpnet-base-v2</code> model is widely considered the best of its size in this category, and some claim
210+
that it performs better than models in the "Semantic Search" category, for example. And as previously stated, the larger
211+
models might outperform, or vice versa. Just experiment.</p>
215212

216213
<h2 id="sentence_similarity">Sentence Similarity</h2>
217214

218-
<p>There are three models with this description and they are extremely good at it. They focus on providing sentences that are
219-
as similar to your question/sentence as possible; for example:</p>
215+
<p>There are onlyk three models with this description. They focus on providing sentences that are very similar to
216+
your question/sentence as possible; for example:</p>
220217

221218
<p><b>Quote for me all sentences that discuss the main character in this book eating food</b></p>
222219
<p><b>Provide me all sentences verbatim of a court discussing the elements of a defamation claim.</b></p>
223220

224-
<p>The search results will be a slew of chunks containing relevant sentences (as opposed to answering a question), and
225-
LM Studio should provide a succinct verbatim outline of the sentences.</p>
221+
<p>The search results should be multiple chunks with highly relevant sentences (as opposed to answering a question):</p>
226222

227223
<h2 id="rounded">Well Rounded</h2>
228224

229-
<p>Models with this description provide reasonable quality, and the larger ones are arguably as good as the "specialist" models
230-
described above.</p>
225+
<p>Self-explanatory. But again, the larger models sometimes perform just as good as the specialist models.</p>
231226

232-
<p>The "customizable" description means that the model comes with a way to modify its "instructions" within my scripts to fit
233-
your specific task. Currently, the <code>instructor</code> and <code>bge</code> models have this characteristic. However, I've
234-
commented out settings to save space and, in my experience, they're good enough you don't need to change the defaults. Feel
235-
free to modify <code>gui_tabs_settings_models.py</code> to make the settings visible again.</p>
227+
<p>The "customizable" description means that you can modify an "instruction" parameter when running the model. However, I
228+
removed this setting from the Settings tab to conserve space since I never saw the need based off of my searched. If you
229+
want to add it back, feel free to modify <code>gui_tabs_settings_models.py</code> to make the setting visible again.
230+
Only the <code>instructor</code> and <code>bge</code> models use this parameter.</p>
236231

237232
<h2 id="dimensions">Max Sequence and Dimensions</h2>
238233

239-
<p>"Max sequence" of an embedding models refers to the maximum number of tokens (not characters) that it can process to create
240-
an embedding. "Dimensions" refers to how nuanced meanings it can discern. The higher "dimensions" means that the model can
241-
discern nuanced meanings from very similar text and provide more accurate results.</p>
234+
<p>The "max sequence" of an refers to the maximum number of tokens (not characters) that it can process at once.
235+
A model's "dimensions" refers to how nuanced a meaning it can extract from text. A higher "dimensions" means that
236+
the model can discern nuanced meanings from very similar text and provide more accurate results.</p>
242237

243-
<p>The "chunk size" setting within the Settings tab refers to the number of maximum characters a "chunk" that is fed to the
244-
embedding model can have. IMPORTANT, this refers to the number of characters, not tokens (like with the max sequence length).
245-
Therefore, make sure you are chunking your text in a way that falls under the threshold of an embedding model's max sequence
246-
token length. As a general rule of thumb, a token" contains four (4) characters. Therefore, if you set the chunk size to
247-
1200, for example, make sure the embedding model you're using has a max sequence length of 300+.</p>
238+
<p>The "max sequence" of a model relates to the "chunk size" setting within the Settings tab. "Chunk size" means
239+
that the program strives to chunk text to the specified number of characters (not tokens), and then
240+
these chunks are sent to the embedding model. Therefore, it's important to make sure that chunk size
241+
(in characters) doesn't exceed a model's "max sequence" (in tokens). To analyze this, a general rule of thumb
242+
is that a token approximately consists of four (4) characters. If you set the chunk size to 1200, for example,
243+
you would need to make sure the embedding model you're using has a max sequence length of 300+.</p>
248244

249-
<p>If the chunks are too large, the embedding model will simply truncate them, making your search less efficient. You won't
250-
notice this happening because the program does not give an error when this occurs.</p>
245+
<p>If the chunks are too large, the embedding model will simply truncate them, but this will make your search
246+
less accurate.</p>
251247

252248
<h2 id="total_context">Total Program Context</h2>
253249

254250
<p>Finally, remember to abide by the total context (in tokens) available from the LLM within LM Studio (typically 4096).
255-
However many tokens your question consists of is added to the total tokens of the multiple pieces of "context" you
256-
receive from the vector database. If anything is left from the 4096, that is how many tokens LM Studio has to respond to
257-
your question.</p>
251+
The program prints in the command prompt the total number of tokens received from the embedding model and sent to
252+
the LLM within LM Studio. To calculate how many tokens the LLM cand respond with, you would subtract the total number
253+
of tokens sent to LM Studio from 4096. For example, if 3096 tokens are sent to LM Studio, the LLM within LM
254+
Studio would have 1000 tokens to respond (approximately 4,000 characters).</p>
258255

259256
</main>
260257

0 commit comments

Comments
 (0)