You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This application leverages OpenAI's GPT-4 and other LLMs to generate contextually relevant responses based on user input. It searches through a database of over 1,000 websites to provide accurate information, with special handling for disambiguating between entities with identical names. It can be used for semantic search and context aware question-answering for any text dataset.
You can pick from available LLMs: `GPT-4o, Claude-3.5, Llama3.2, Mixtral, Gemini2`\
164
+
For using other ones you can just modify model name in LLM client class for model provider, for example `app/src/service/openai/GeneratedTextFromGPTProvider.php:13`
165
+
```php
166
+
final class GeneratedTextFromGPTProvider extends AbstractGPTAPIClient
Modify `app/src/loadDocuments.php:13` and `app/src/process.php:20`. \
173
+
Put there one of classes that implement `TextEncoderInterface` or create yours that satisfies interface.\
174
+
Embedding size can have impact on text matching precision.
175
+
- Modify system prompt. \
176
+
Modify system prompt text in `\service\PromptResolver::getSystemPrompt()`. \
177
+
You can add there additional instructions, example solutions (one-shot/few-shot) or some patterns of reasoning (chain of thought).
178
+
```php
179
+
private function getSystemPrompt(): string
180
+
{
181
+
return 'You are a helpful assistant that answers questions based on source documents.' . PHP_EOL;
182
+
}
183
+
```
184
+
- Use different number of retrieved documents. \
185
+
Change `$limit` in `DocumentProvider::getSimilarDocuments()`
186
+
```php
187
+
public function getSimilarDocuments(
188
+
string $prompt,
189
+
string $embeddingPrompt,
190
+
bool $useReranking = false,
191
+
int $limit = 10,
192
+
string $distanceFunction = 'l2'
193
+
) {
194
+
```
195
+
- Use reranking. \
196
+
If too many documents are passed to LLM it may focus on wrong information. If number is too small on the other hand it's possible to miss most important sources.\
197
+
Set `Payload::$useReranking` to `True` in `app/src/process.php:25`.
198
+
- Use different text matching algorithm. \
199
+
Change `$distanceFunction` in `DocumentProvider::getSimilarDocuments()`. \
200
+
Pick one from l2|cosine|innerProduct or support other one (see https://github.com/pgvector/pgvector, section "Quering").
201
+
```php
202
+
public function getSimilarDocuments(
203
+
string $prompt,
204
+
string $embeddingPrompt,
205
+
bool $useReranking = false,
206
+
int $limit = 10,
207
+
string $distanceFunction = 'l2'
208
+
) {
209
+
```
210
+
139
211
## 📚 Resources
140
212
141
213
- Dataset: "Website Classification" by Hetul Mehta on [Kaggle](https://www.kaggle.com/datasets/hetulmehta/website-classification)
0 commit comments