v2.5

BBC-Esq · web-flow · commit 0110a025fed8 · 2023-10-31T11:33:27.000-04:00
diff --git a/src/User_Manual/config.yaml b/src/User_Manual/config.yaml
@@ -25,10 +25,13 @@ AVAILABLE_MODELS:
 - jinaai/jina-embedding-t-en-v1
 - jinaai/jina-embeddings-v2-base-en
 - jinaai/jina-embeddings-v2-small-en
-COMPUTE_DEVICE: cuda
-EMBEDDING_MODEL_NAME: null
-chunk_overlap: 250
-chunk_size: 1500
+COMPUTE_DEVICE: cpu
+EMBEDDING_MODEL_NAME: 
+chunk_overlap: 200
+chunk_size: 600
+database:
+  contexts: 15
+  similarity: 0.9
 embedding-models:
   bge:
     query_instruction: 'Represent this sentence for searching relevant passages:'
@@ -38,7 +41,7 @@ embedding-models:
 server:
   api_key: ''
   connection_str: http://localhost:1234/v1
-  model_max_tokens: -1
+  model_max_tokens: 512
   model_temperature: 0.1
   prefix: '[INST]'
   suffix: '[/INST]'
@@ -48,3 +51,7 @@ styles:
   frame: 'background-color: #161b22;'
   input: 'background-color: #2e333b; color: light gray; font: 13pt "Segoe UI Historic";'
   text: 'background-color: #092327; color: light gray; font: 12pt "Segoe UI Historic";'
+transcriber:
+  device: cpu
+  model: base.en
+  quant: float32
diff --git a/src/User_Manual/settings.html b/src/User_Manual/settings.html
@@ -156,6 +156,16 @@ <h3>Chunk Overlap</h3>
 	it will automatically include, for example, the last 250 characters of the prior chunk.  Feel free to experiment
 	with this setting as well to get the best results!</p>
 	
+	<h2>Database Settings</h2>
+    <p>The <code>Similarity</code> setting determines how similiar to your question the results from the database must be in
+	order for them to be considered to be sent to the LLM as "context."  The closer the value to <code>1</code> the more
+	similar it must be, with a value of <code>1</code> meaning a verbatim match to your query.  It's generally advised to
+	leave this unless you notice that you're not getting a sufficient number of contexts.</p>
+	
+	<p> The <code>Contexts</code> setting is more fun to play with.  Here you can control the number of chunks that will be
+	forwarded to the LLM along with your question, for a response.  HOWEVER, make sure and read my instructions above about how
+	to ensure that the LLM does not exceed its maximum context limit; otherwise, it'll give an error.</p>
+	
 	<h2>Break in Case of Emergency</h2>
 	<p>All of the settings are kept in a <code>config.yaml</code> file.  If you accidentally change a setting you don't like or
 	its deleted or corrupted somehow, inside the "User Guide" folder I put a backup of the original file.</p>
diff --git a/src/User_Manual/tips.html b/src/User_Manual/tips.html
@@ -164,10 +164,15 @@ <h2 style="color: #f0f0f0;" align="center">Ask the Right Questions</h2>
 		"What is the statute of limitations for defamation?" versus "What is the statute of limitations for a defamation
 		action if the allegedly defamatory statement is in writing as opposed to verbal?"  Experiment with how specific you are.</p>
 		
-		<p>Don't use multiple questions.  For example, the results will be poor if you ask "What is the statute of limitations for a
+		<p>My previous advice was to not ask multiple questions, but now that I've added an option to increase the number of
+		"contexts" from the database to the LLM, this is less stringent.  I now encourage you ask longer-winded questions and even
+		general descriptions of the types of information you're looking for (not strictly a question you see).  For reference, here
+		are my prior instructions:</p>
+		
+		<p><i>Don't use multiple questions.  For example, the results will be poor if you ask "What is the statute of limitations for a
 		defamation action?" AND "Can the statute of limitations tolled under certain circumstances?" at the same time.  Instead,
 		reformulate your question into something like: "What is the statute of limitations for a defamation and can it be tolled
-		under certain circumstances?"  Again, just experiment and DO NOT assume that you must use a larger LLM or embedding model.</p>
+		under certain circumstances?"  Again, just experiment and DO NOT assume that you must use a larger LLM or embedding model.</i></p>
 		
 		<h2 style="color: #f0f0f0;" align="center">Ensure Sufficient Context Length for the LLM</h2>
 		
diff --git a/src/User_Manual/whisper_quants.html b/src/User_Manual/whisper_quants.html
@@ -103,6 +103,14 @@ <h1>Whisper Quants</h1>
 	
 	<main>
 	
+	<h2>As of Version 2.5</h2>
+	
+	<p>ALL transcriber settings have been moved to the GUI so they can be changed dynamically, easily.  Therefore, the instructions
+	pertaining to modifying scripts to change them no longer applies.  ALSO, no need to worry about which quants are available
+	on your CPU/GPU because the program will automatically detect compatible quants and only display those that are compatible!
+	I'm leaving the instructions below unchanged, however, to get this release out.  You can still reference them for purposes
+	of understanding what the different settings represent or what not.</p>
+	
 	<h2>Changing Model Size and Quantization</h2>
 	
 	<p>The <code>base.en</code> model with in <code>float32</code> format is used by default.  To use a different model size,