@@ -164,97 +164,94 @@ <h1>Embedding Models</h1>
164
164
165
165
< h2 id ="overview "> Overview</ h2 >
166
166
167
- < p > < b > To get the most out of this program, it's crucial to choose the right embedding model based on the type of task.
168
- Choosing the wrong model can lead to poor or incoherent results. Remember, the LLM's response is only as good as the
169
- context you provide via the embedding model.</ b > </ p >
170
-
171
- < p > An embedding model is loaded into memory (and then immediately unloaded when not needed) for the following tasks:
167
+ < p > My program loads an embedding model into memory for:
172
168
< ol >
173
- < li > Creating the vector database; and </ li >
169
+ < li > Creating the vector database</ li >
174
170
< li > Querying the vector database before your question and the "context" it obtains are sent to the LLM for an answer.</ li >
175
171
</ ol >
176
-
177
172
173
+ < p > < b > To get the most out of this program, it's crucial to choose the right embedding model. Remember, the LLM's
174
+ response is only as good as the context you provide via the embedding model.</ b > </ p >
175
+
178
176
< h2 id ="model "> Choosing the Correct Model</ h2 >
179
177
180
- < p > The first rule of embedding models is experiment. Make sure and review the "Download Model" and the characteristics
181
- of the various embedding models . It directly relates to these instructions. With that being said, there are too many
182
- different types of tasks to prescribe a one-size-fits-all . However, based my research and testing the following rules apply:</ p >
178
+ < p > The first rule of embedding models is experiment. Make sure and review the list of models displayed when you click
179
+ "Download Model" . These instructions directly relate to the information displayed there. With that being said, there
180
+ is no one-size-fits-all . However, based my research and testing the following rules apply:</ p >
183
181
184
182
< h2 id ="semantic "> Semantic Search</ h2 >
185
183
186
- < p > Semantic search is a general term that refers to obtain results based off questions. This includes RAG and other tasks
187
- ( this program is primarily geared towards RAG) . All of the embedding models with the "Semantic search" description
188
- will perform reasonably well for question answering, in fact, they're trained for this purpose .</ p >
184
+ < p > Semantic search generally refers to obtaining results that are similar to your query This includes RAG and other
185
+ tasks, but this program is geared towards RAG. All of the embedding models with the "Semantic search" description
186
+ will perform reasonably well for this with a few nuances .</ p >
189
187
190
- < p > Models with < code > msmarco</ code > in their name are especially good at providing long results to short questions - e.g.</ p >
188
+ < p > Models with < code > msmarco</ code > in their name are specifically trained to provide longer results when
189
+ given a short question:</ p >
191
190
192
- < p > < b > What does this legal treatise say about the elements of a defamation claim?</ b > </ p >
193
- < p > < b > What kind of troubles does this character encounter throughout this book?</ b > </ p >
191
+ < p > < b > Tell me everything that this legal treatise say about the elements of a defamation claim?</ b > </ p >
192
+ < p > < b > What are all the kinds of troubles that the main character from the book experiences ?</ b > </ p >
194
193
195
- < p > < code > msmarco</ code > models are specifically trained for "asymmetric semantic search," which means a search where your
196
- question is short but the result you want is expected to be long and comprehensive. Other models with the "Semantic search"
197
- description are GENERALLY "symmetric semantic search" focused, which means that they tend to produce results of the same
198
- length of your question.</ p >
194
+ < p > This is called "asymmetric semantic search. Other models with the "Semantic search" description are GENERALLY
195
+ "symmetric semantic search," which means that they tend to produce results of the same length of your question.</ p >
199
196
200
- < p > Models with < code > multi-qa</ code > in their name have specifically been trained on question answering and are similar in
201
- function to < code > msmarco</ code > models. Experiment.</ p >
197
+ < p > In addition, models with < code > multi-qa</ code > in their name have specifically been trained on question answering
198
+ and are similar in function to < code > msmarco</ code > models. You'll just have to experiment with the different
199
+ models as well as how you structure your query.</ p >
202
200
203
- < p > More recent (and much larger) models like ones beginning with < code > gtr</ code > (to give one example) are arguably
204
- just as efficient in providing long answers to a short question. The most in recent history has been towards models good
205
- at everything, but just experiment!</ p >
201
+ < p > With that being said, more recent larger models like ones beginning with < code > gtr</ code > (to give one example)
202
+ claim to be just as efficient as "specialist" models. The recent trend has been for larger models good at everything.</ p >
206
203
207
204
< h2 id ="clustering "> Clustering or Semantic Search</ h2 >
208
205
209
- < p > The phrase "clustering or semantic search" is taken from the "Sentence Transformers website, the organization that created
210
- a majority of the models used in this program. What it basically means is "well rounded." They are good at question answering
211
- as well as other typical embedding models tasks, but for purposes of this program you should think of them as "well rounded."</ p >
206
+ < p > Models with the description of "clustering or semantic search" basically means "well rounded." They are good at
207
+ question answering as well as other tasks, but for purposes of this program you should think of them as "well rounded."</ p >
212
208
213
- < p > The < code > all-mpnet-base-v2</ code > model is widely considered the best of its size in this category, but again, experiment.
214
- You might find that it performs better than a larger model geared towards "semantic search" or vice versa.</ p >
209
+ < p > The < code > all-mpnet-base-v2</ code > model is widely considered the best of its size in this category, and some claim
210
+ that it performs better than models in the "Semantic Search" category, for example. And as previously stated, the larger
211
+ models might outperform, or vice versa. Just experiment.</ p >
215
212
216
213
< h2 id ="sentence_similarity "> Sentence Similarity</ h2 >
217
214
218
- < p > There are three models with this description and they are extremely good at it . They focus on providing sentences that are
219
- as similar to your question/sentence as possible; for example:</ p >
215
+ < p > There are onlyk three models with this description. They focus on providing sentences that are very similar to
216
+ your question/sentence as possible; for example:</ p >
220
217
221
218
< p > < b > Quote for me all sentences that discuss the main character in this book eating food</ b > </ p >
222
219
< p > < b > Provide me all sentences verbatim of a court discussing the elements of a defamation claim.</ b > </ p >
223
220
224
- < p > The search results will be a slew of chunks containing relevant sentences (as opposed to answering a question), and
225
- LM Studio should provide a succinct verbatim outline of the sentences.</ p >
221
+ < p > The search results should be multiple chunks with highly relevant sentences (as opposed to answering a question):</ p >
226
222
227
223
< h2 id ="rounded "> Well Rounded</ h2 >
228
224
229
- < p > Models with this description provide reasonable quality, and the larger ones are arguably as good as the "specialist" models
230
- described above.</ p >
225
+ < p > Self-explanatory. But again, the larger models sometimes perform just as good as the specialist models.</ p >
231
226
232
- < p > The "customizable" description means that the model comes with a way to modify its "instructions" within my scripts to fit
233
- your specific task. Currently, the < code > instructor </ code > and < code > bge </ code > models have this characteristic . However, I've
234
- commented out settings to save space and, in my experience, they're good enough you don't need to change the defaults. Feel
235
- free to modify < code > gui_tabs_settings_models.py </ code > to make the settings visible again .</ p >
227
+ < p > The "customizable" description means that you can modify an "instruction" parameter when running the model. However, I
228
+ removed this setting from the Settings tab to conserve space since I never saw the need based off of my searched . If you
229
+ want to add it back, feel free to modify < code > gui_tabs_settings_models.py </ code > to make the setting visible again.
230
+ Only the < code > instructor </ code > and < code > bge </ code > models use this parameter .</ p >
236
231
237
232
< h2 id ="dimensions "> Max Sequence and Dimensions</ h2 >
238
233
239
- < p > "Max sequence" of an embedding models refers to the maximum number of tokens (not characters) that it can process to create
240
- an embedding. "Dimensions " refers to how nuanced meanings it can discern . The higher "dimensions" means that the model can
241
- discern nuanced meanings from very similar text and provide more accurate results.</ p >
234
+ < p > The "max sequence" of an refers to the maximum number of tokens (not characters) that it can process at once.
235
+ A model's "dimensions " refers to how nuanced a meaning it can extract from text . A higher "dimensions" means that
236
+ the model can discern nuanced meanings from very similar text and provide more accurate results.</ p >
242
237
243
- < p > The "chunk size" setting within the Settings tab refers to the number of maximum characters a "chunk" that is fed to the
244
- embedding model can have. IMPORTANT, this refers to the number of characters, not tokens (like with the max sequence length).
245
- Therefore, make sure you are chunking your text in a way that falls under the threshold of an embedding model's max sequence
246
- token length. As a general rule of thumb, a token" contains four (4) characters. Therefore, if you set the chunk size to
247
- 1200, for example, make sure the embedding model you're using has a max sequence length of 300+.</ p >
238
+ < p > The "max sequence" of a model relates to the "chunk size" setting within the Settings tab. "Chunk size" means
239
+ that the program strives to chunk text to the specified number of characters (not tokens), and then
240
+ these chunks are sent to the embedding model. Therefore, it's important to make sure that chunk size
241
+ (in characters) doesn't exceed a model's "max sequence" (in tokens). To analyze this, a general rule of thumb
242
+ is that a token approximately consists of four (4) characters. If you set the chunk size to 1200, for example,
243
+ you would need to make sure the embedding model you're using has a max sequence length of 300+.</ p >
248
244
249
- < p > If the chunks are too large, the embedding model will simply truncate them, making your search less efficient. You won't
250
- notice this happening because the program does not give an error when this occurs .</ p >
245
+ < p > If the chunks are too large, the embedding model will simply truncate them, but this will make your search
246
+ less accurate .</ p >
251
247
252
248
< h2 id ="total_context "> Total Program Context</ h2 >
253
249
254
250
< p > Finally, remember to abide by the total context (in tokens) available from the LLM within LM Studio (typically 4096).
255
- However many tokens your question consists of is added to the total tokens of the multiple pieces of "context" you
256
- receive from the vector database. If anything is left from the 4096, that is how many tokens LM Studio has to respond to
257
- your question.</ p >
251
+ The program prints in the command prompt the total number of tokens received from the embedding model and sent to
252
+ the LLM within LM Studio. To calculate how many tokens the LLM cand respond with, you would subtract the total number
253
+ of tokens sent to LM Studio from 4096. For example, if 3096 tokens are sent to LM Studio, the LLM within LM
254
+ Studio would have 1000 tokens to respond (approximately 4,000 characters).</ p >
258
255
259
256
</ main >
260
257
0 commit comments