@@ -113,7 +113,7 @@ <h2 style="color: #f0f0f0;" align="left">Which Vision Models Are Available?</h2>
113
113
< p > < code > llava</ code > models were trailblazers in what they did and this program uses both the 7b and 13b sizes.
114
114
< code > llava</ code > models are based on the < code > llama2</ code > architecture. < code > bakllava</ code > is similar to
115
115
< code > llava</ code > except that it's architecture is based on < code > mistral</ code > and only comes in the 7b variety.
116
- < code > cogvlm</ code > has < u > 18b parameters</ u > but is my personal favorite because it produces the bset results by far. Its
116
+ < code > cogvlm</ code > has < u > 18b parameters</ u > but is my personal favorite because it produces the best results by far. Its
117
117
accuracy is over 90% in the statements its summaries I've found whereas < code > bakllava</ code > is only about 70% and
118
118
< code > llava</ code > is slightly lower than that (regardless of whether you use the 7b or 13b sizes).</ p >
119
119
@@ -149,7 +149,7 @@ <h2 style="color: #f0f0f0;" align="center">How do I use the Vision Model?</h2>
149
149
150
150
< p > The "loading" process takes very little time for documents but a relatively long time for images. "Loading" images involves
151
151
creating the summaries for each image using the selected vision model. Make sure and test your vision model settings within
152
- the Tools Tab before committing to processing, for example, 100 images .</ p >
152
+ the Tools Tab before committing to processing 1000 images, for example .</ p >
153
153
154
154
< p > After both documents and images are "loaded" they are added to the vectorstore just the same as prior release of this
155
155
program.</ p >
@@ -160,7 +160,7 @@ <h2 style="color: #f0f0f0;" align="center">How do I use the Vision Model?</h2>
160
160
model settings.</ p >
161
161
162
162
< p > PRO TIP: Make sure and set your chunking settings to larger than the summaries that are provided by the vision model.
163
- Doing this prevents the summary for a particular image from EVER being split. In short, each and every chunk consist of the
163
+ Doing this prevents the summary for a particular image from EVER being split. In short, each and every chunk consists of the
164
164
< u > entire summary</ u > provided by the vision model! This tends to be 400-800 chunk size depending on the vision model
165
165
settings.</ p >
166
166
@@ -176,7 +176,7 @@ <h2 style="color: #f0f0f0;" align="center">Can I Change What the Vision Model Do
176
176
</ ol >
177
177
178
178
< p > You can go into these scripts and modify the question sent to the vision model, but make sure the prompt format remains
179
- the same. In future releases I will likely add the functionality to experiement with different questions within the
179
+ the same. In future releases I will likely add the functionality to experiment with different questions within the
180
180
grapical user interface to achieve better results.</ p >
181
181
182
182
</ main >
0 commit comments