Skip to content

Commit 174e6d8

Browse files
authored
v2.7
1 parent 8e59f90 commit 174e6d8

File tree

2 files changed

+62
-79
lines changed

2 files changed

+62
-79
lines changed

src/User_Manual/config.yaml

Lines changed: 29 additions & 40 deletions
Original file line numberDiff line numberDiff line change
@@ -1,41 +1,30 @@
1-
AVAILABLE_MODELS:
2-
- BAAI/bge-large-en-v1.5
3-
- BAAI/bge-base-en-v1.5
4-
- BAAI/bge-small-en-v1.5
5-
- thenlper/gte-large
6-
- thenlper/gte-base
7-
- thenlper/gte-small
8-
- hkunlp/instructor-xl
9-
- hkunlp/instructor-large
10-
- hkunlp/instructor-base
11-
- sentence-transformers/all-mpnet-base-v2
12-
- sentence-transformers/all-MiniLM-L6-v2
13-
- sentence-transformers/all-MiniLM-L12-v2
14-
- sentence-transformers/sentence-t5-xxl
15-
- sentence-transformers/sentence-t5-xl
16-
- sentence-transformers/sentence-t5-large
17-
- sentence-transformers/sentence-t5-base
18-
- sentence-transformers/gtr-t5-xxl
19-
- sentence-transformers/gtr-t5-xl
20-
- sentence-transformers/gtr-t5-large
21-
- sentence-transformers/gtr-t5-base
22-
- jinaai/jina-embedding-l-en-v1
23-
- jinaai/jina-embedding-b-en-v1
24-
- jinaai/jina-embedding-s-en-v1
25-
- jinaai/jina-embedding-t-en-v1
26-
COMPUTE_DEVICE: cuda
271
Compute_Device:
282
available:
29-
- cuda
303
- cpu
31-
database_creation: cuda
4+
- cuda
5+
database_creation: cpu
326
database_query: cpu
33-
EMBEDDING_MODEL_NAME: C:/PATH/Scripts/ChromaDB-Plugin-for-LM-Studio/v2_6 - working/Embedding_Models/sentence-transformers--gtr-t5-base
7+
gpu_brand: NVIDIA
8+
EMBEDDING_MODEL_NAME:
9+
Platform_Info:
10+
os: windows
11+
Supported_CTranslate2_Quantizations:
12+
CPU:
13+
- float32
14+
- int8_float32
15+
- int8
16+
GPU:
17+
- float32
18+
- float16
19+
- bfloat16
20+
- int8_float32
21+
- int8_float16
22+
- int8_bfloat16
23+
- int8
3424
database:
35-
chunk_overlap: 150
36-
chunk_size: 500
37-
contexts: 10
38-
device: null
25+
chunk_overlap: 300
26+
chunk_size: 600
27+
contexts: 25
3928
similarity: 0.9
4029
embedding-models:
4130
bge:
@@ -49,22 +38,22 @@ server:
4938
model_max_tokens: -1
5039
model_temperature: 0.1
5140
prefix: '[INST]'
41+
prompt_format_disabled: false
5242
suffix: '[/INST]'
5343
styles:
5444
button: 'background-color: #323842; color: light gray; font: 10pt "Segoe UI Historic";
5545
width: 29;'
5646
frame: 'background-color: #161b22;'
5747
input: 'background-color: #2e333b; color: light gray; font: 13pt "Segoe UI Historic";'
5848
text: 'background-color: #092327; color: light gray; font: 12pt "Segoe UI Historic";'
49+
test_embeddings: false
5950
transcribe_file:
6051
device: cpu
61-
file: C:/PATH/Scripts/ChromaDB-Plugin-for-LM-Studio/v2_6 - working/test.mp3
62-
language: Option 1
63-
model: base.en
64-
quant: int8
52+
file:
53+
model: small.en
54+
quant: float32
6555
timestamps: true
66-
translate: false
6756
transcriber:
68-
device: cuda
69-
model: base.en
57+
device: cpu
58+
model: small.en
7059
quant: float32

src/User_Manual/settings.html

Lines changed: 33 additions & 39 deletions
Original file line numberDiff line numberDiff line change
@@ -93,62 +93,56 @@ <h2>Server/LLM Settings</h2>
9393
<p>The <code>port</code> number in these settings must match the one you've set in LM Studio. If you update it in LM
9494
Studio, make sure to update it here as well.</p>
9595

96-
<p>The <code>max-tokens</code> setting is <code>-1</code> by default, allows the LLM to provide a response that is
97-
unlimited in length. 99% it will cut itself off after sufficiently answering your question, however, so there's
98-
little risk in using <code>-1</code>. However, you can change it to experiment. Remember, any number here besides
99-
<code>-1</code> is in tokens (not characters).</p>
96+
<p>The <code>max-tokens</code> setting is <code>-1</code> by default, which allows the LLM to provide a response
97+
that is unlimited in length. Most off the time the LLM will cut itself off after sufficiently answering your
98+
question; however, rarely it will repeat itself or ramble. Therefore, you can change this setting if need be.
99+
Remember, any number here besides <code>-1</code> is in tokens (not characters).</p>
100100

101101
<h3>Temperature Setting</h3>
102-
<p>The <code>temperature</code> setting can be between 0 and 1, and it determines the creativity of the LLM's response.
102+
<p>The <code>temperature</code> setting can be between 0 and 1, and determines the creativity of the LLM's response.
103103
Zero means don't be creative.</p>
104104

105105
<h3>Prefix and Suffix</h3>
106106
<p>The <code>prefix</code> and <code>suffix</code> settings are tailored for LLAMA 2-based models by default, and
107-
this also works with <code>Mistral</code> models. Do not change this setting unless you're 100% sure about the
108-
prompt format that a model needs to function efficiently. Since you just need a basic LLM to answer questions from
109-
context you provide, stick with basic models like Llama-2 itself of Mistral, but make sure that the model uses the
110-
Llama-2 prompt format.</p>
107+
it also works pretty well with <code>Mistral</code> models. Do not change this setting unless you know what you're
108+
doing. Since you just need a basic LLM to answer questions based on the context from the vector database, I
109+
recommend using basic models like Llama-2 itself or Mistral.</p>
110+
111+
<p>Within LM Studio, you need to turn OFF the Automatic Prompt Formatting setting within the server tab in order
112+
for the program to work best. However, you can disable the prefix/suffix setting within this program by clicking
113+
the "disable" checkbox, just make sure to re-enable the setting in LM Studio and know what you're doing.</p>
111114

112115
<h2>Database Settings</h2>
113116
<p>The <code>chunk size</code> and <code>chunk overlap</code> settings apply to Langchain's
114117
"RecursiveCharacterTextSplitter," which is responsible for splitting the text before it's entered into the
115-
vector database. In short, this program extracts text, chunks it, and then sends the chunks to the embedding
116-
model, which then puts it into the vector database. Feel free to experiment with different chunk sizes to see if
118+
vector database. These settings are in CHARACTERS not TOKENS.</p>
119+
120+
<p>How large the chunks are and whether there is an overlap has a direct impact on the quality of
121+
the results received from the vector database. Feel free to experiment with different settings to see if
117122
it improves the search results. However, make sure that the chunk size falls under the "token" limit of the embedding
118-
model you use. Different embedding models have different token limits (like different LLM's do).</p>
123+
model you use. Different embedding models have different token limits (like different LLM's do). </p>
119124

120-
<p>The "chunk" size setting is in the number of characters (not tokens), and one token is approximately four characters.
121-
Therefore, if you set the chunk size to 1,200, for example, make sure the embedding modle you choose has a maximum
122-
token limit of at least 300.</p>
123-
124-
<h3>Chunk Size</h3>
125-
<p>The RecursiveCharacterTextSplitter tries to create chunks of the specified size, but it adheres to certain criteria
126-
as to when it can split chunks. the specified chunk size as possible. However, it adheres to certain cutoff points
127-
such as the end of a paragraph. As such, your text might be split in the middle of two ideas/concepts that are
128-
related. That's where the "overlap" setting comes in.</p>
125+
<p>A token is approximately four (4) characters. For example, if you set the chunk size to 1,200, make sure the
126+
embedding modle you choose has a maximum token limit of at least 300.</p>
129127

130-
<p>The "chunk overlap" setting (also in characters, not tokens) starts the next chunk to include the specified number
131-
of characters of the former chunk so no meaning is lost (ideally). Feel free to experiment with this setting as well to
132-
improve the search results that are fed to the LLM for an answer. The most important thing to remember, however, is to
133-
keep the chunk size within the embedding model's token limit, and make sure to leave enough overall context for the LLM
134-
to provide a sufficient response.</p>
128+
<p>Ultimately, you must leave enough "context" (in tokens) for the LLM to provide a response. You can calculate it
129+
like this: <code>all chunks + your question + LLM's response</code> should fall within the LLM's token context limit
130+
(usually 4096). If what you send the LLM exceeds 4096 you will get an error message, and even if you don't, the
131+
LLM may cut itself off if it doesn't have enough context to provide a sufficient answer (no error message for this).</p>
135132

136-
<p>You can calculate it like this: <code>all chunks + your question + LLM's response</code> should fall within the LLM's token
137-
context limit (usually 4096). If what you send the LLM exceeds 4096 you will get an error message, and even if you don't,
138-
the LLM may cut itself off if it doesn't have enough context to provide a sufficient answer (no error message for this).</p>
133+
<h2>Whisper Settings</h2>
139134

140-
<p>On average, there are four characters per "token" Therefore, if you set the chunk size to 1,200 characters that equals
141-
approximately 300 tokens...and if you requst 12 "contexts" from the database, that equals 3,600 tokens, whihc leaves the
142-
LLM approximatelyk 496 tokens to provide a response. This is usually sufficient, but it might not be...just experiment.</p>
135+
<p> Whisper models are used throughout this program to transcribe your question for the LLM as well as transcribe an
136+
audio file to put it into the database. See the User Guide section on this for more details. Generally, you should
137+
transcribe your question using CPU and only use GPU acceleration to transcribe an audio file. If VRAM is especially
138+
a concern, unload the model from LLM Studio and load it back after the transcription is completed. Both uses of
139+
Whisper models remove the model immediately after their done being used in order to conserve VRAM.</p>
143140

144-
<h2>Whisper Settings</h2>
141+
<h2>Test Embeddings</h2>
145142

146-
<p> Whisper models are used throughout this program to transcribe your question for the LLM as well as the new feature to
147-
transcribe an audio file to put it into the database. See the User Guide section on this for more details. Generally,
148-
however, you should transcribe your question using CPU and only use GPU acceleration to transcribe an audio file. If
149-
VRAM is short when transcribing an audio file, unload the model from LLM Studio and load it back after the transcription
150-
is completed. Both utilizations of Whisper models remove the model immediately after their done being used in order to
151-
conserve valuable VRAM.</p>
143+
<p>The setting is useful to actually see the "contexts" provided by the vector database. Checking this box will
144+
obtain and display the contexts, and no longer connect to LM Studio. This is useful for fine-tuning your chunk size
145+
and overlap and other settings before connecting to LM Studio.</p>
152146

153147
<h2>Break in Case of Emergency</h2>
154148
<p>All settings for this progrma are keps in <code>config.yaml</code>. If you accidentally change a setting you don't

0 commit comments

Comments
 (0)