doc: updated /docs and added hello-world.md

Wermutton · Wermutton · commit e34994044107 · 2023-09-15T04:39:06.000-07:00
diff --git a/README.md b/README.md
@@ -12,6 +12,8 @@ Paper: https://arxiv.org/abs/2308.11189
 
 Video: https://www.youtube.com/watch?v=BekDOLm6qBI&t=10s&ab_channel=NeuroSymbolic
 
+Check out [LangDiversity Hello World](https://github.com/lab-v2/langdiversity/blob/main/docs/hello-world.md) if you're new.
+
 ## Table of Contents
 
 - [Introduction](#introduction)
diff --git a/docs/hello-world.md b/docs/hello-world.md
@@ -0,0 +1,73 @@
+# LangDiversity Hello World 📝
+
+Welcome! In this document, we outline how LangDiversity functions along with a simple program that demonstrates its capabilities.
+
+The image below is a visual representation of how LangDiversity works.
+
+<img src="../media/LangDiversityExample.png"/>
+
+1. In the example above, we have a user prepare in 3 prompts. For each prompt, they are trying to obtain 4 separate responses from the language model.
+2. Using those 4 responses, LangDiversity performs a calculation based on the measure chosen (Entropy, Gini Impurity, etc.). This gives us a numerical value representing the diversity of the language model's responses to each prompt.
+3. Each prompt is now associated with its corresponding diversity measure.
+4. Then based on the user's selection method, the prompt with the desired diversity measure and its associated value are returned.
+
+## Code Implementation
+
+### Installation
+
+Before you start using LangDiversity, you'll need to install it. You can do so with pip:
+
+```bash
+pip install langdiversity
+```
+
+### Importing Diversity Measure
+
+Import the diversity measure you want to use. In this example, we're using [Shannon's entropy](https://github.com/lab-v2/langdiversity/blob/main/langdiversity/measures/shannon_entropy.py).
+
+```python
+from langdiversity.measures import ShannonEntropyMeasure
+diversity_measure = ShannonEntropyMeasure()
+```
+
+### Configuring the Language Model
+
+In this step, we configure our language model. For this example, we're utilizing [OpenAI's GPT model](https://github.com/lab-v2/langdiversity/blob/main/langdiversity/models/openai.py). Additionally, we import a parser to sanitize and format the responses generated by the language model.
+
+Note: It's recommended to use a custom parser tailored to the specific type of questions you'll be working with. LangDiversity offers a variety of [built-in parsers](https://github.com/lab-v2/diversity_package/tree/main/langdiversity/parser/answer_extractor.py) that you can use as a reference or starting point.
+
+Since our example will involve math-related questions, we opt for the `extract_math_answer` parser to handle the responses.
+
+```python
+from langdiversity.models import OpenAIModel
+from langdiversity.parser import extract_math_answer
+model = OpenAIModel(openai_api_key="[API KEY]", extractor=extract_math_answer)
+```
+
+### Prompt Selection
+
+Now, we initialize the `PromptSelection` object. This is where we specify how many responses we want from the language model for each prompt, the diversity measure to use, and the selection method.
+
+```python
+from langdiversity.utils import PromptSelection
+prompt_selection = PromptSelection(model=model, num_responses=4, diversity_measure=diversity_measure, selection='min')
+```
+
+### Generate and Select Prompts
+
+Finally, we pass in a list of prompts to the `PromptSelection` object. It will send these prompts to the language model, calculate the diversity measure for each set of responses, and then select the prompt with the minimum (or maximum) diversity measure.
+
+The selected prompt and its corresponding diversity measure are stored in `selected_prompt` and `selected_measure`, respectively.
+
+Note: The prompts are structured to guide the language model in generating a specific type of response. This makes it easier for the parser to extract clean answers.
+
+```python
+selected_prompt, selected_measure = prompt_selection.generate([
+            "At the end, say 'the answer is [put your numbers here separated by commas]'.\nQuestion: What is the speed of the current if Junior's boat can cover 12 miles downstream in the same time it takes to travel 9 miles upstream, given that his boat's speed in still water is 15 miles per hour?",
+            "At the end, say 'the answer is [put your numbers here separated by commas]'.\nQuestion: What is the speed of the current if Junior's boat travels at a constant speed of 15 miles per hour in still water and he spends the same amount of time traveling 12 miles downstream as he does traveling 9 miles upstream?.",
+            "At the end, say 'the answer is [put your numbers here separated by commas]'.\nQuestion: Juniors boat will go 15 miles per hour in still water . If he can go 12 miles downstream in the same amount of time as it takes to go 9 miles upstream , then what is the speed of the current?",
+])
+
+print("Selected Prompt:", selected_prompt)
+print("Selected Measure:", selected_measure)
+```
diff --git a/docs/langdiversity_library.md b/docs/langdiversity_library.md
@@ -5,7 +5,7 @@ pypi project: https://pypi.org/project/langdiversity/
 ## Install
 
 ```bash
-pip install langchain
+pip install langdiversity
 ```
 
 ## Usage
@@ -16,21 +16,45 @@ Example:
 from langdiversity.models import OpenAIModel
 from langdiversity.measures import ShannonEntropyMeasure
 from langdiversity.utils import PromptSelection
+from langdiversity.parser import # Select a parser that suits your question set
 
 # Initialize the OpenAI model and diversity measure
-model = OpenAIModel(openai_api_key="YOUR_OPENAI_API_KEY")
+model = OpenAIModel(openai_api_key="[YOUR API KEY]", extractor="[SELECT YOUR PARSER](optional)")
 diversity_measure = ShannonEntropyMeasure()
 
 # Use the PromptSelection utility
 prompt_selection = PromptSelection(model=model, num_responses=10, diversity_measure=diversity_measure)
 
-# Selects the prompt with the configured diversity measure criteria from the LLM's 10 responses
+# Pass in question set to the LLM & selects the prompt with the configured diversity measure criteria from the LLM's 10 responses
 selected_prompt, selected_measure = prompt_selection.generate(["Your list of prompts here..."])
 
 print("Selected Prompt:", selected_prompt)
 print("Selected Measure:", selected_measure)
 ```
 
+### Modules:
+
+LangDiversity offers a variety of modules for different use-cases. Below are the essential modules you can either directly import or use as a foundation for creating your own custom solutions:
+
+- [Language Models](https://github.com/lab-v2/langdiversity/tree/main/langdiversity/models) (`langdiversity.models`)
+
+  - `OpenAIModel`: Interfaces with OpenAI's GPT models.
+
+- [Diversity Measures](https://github.com/lab-v2/langdiversity/tree/main/langdiversity/measures) (`langdiversity.measures`)
+
+  - `ShannonEntropyMeasure`: Implements Shannon's entropy as a diversity measure.
+  - `GiniImpurityMeasure`: Implements Gini Impurity as a diversity measure.
+
+- [Utility Classes](https://github.com/lab-v2/langdiversity/tree/main/langdiversity/utils) (`langdiversity.utils`)
+
+  - `PromptSelection`: Handles the selection of prompts based on diversity measures.
+  - `DiversityCalculator`: Calculates various diversity measures for a given set of values. Supports Shannon's entropy and Gini impurity by default.
+
+- [Parsers](https://github.com/lab-v2/langdiversity/tree/main/langdiversity/parser) (`langdiversity.parsers`)
+  - `extract_last_letters(response: str)`: Extracts the last letters of each word in the response.
+  - `extract_math_answer(response: str)`: Extracts numerical answers from a mathematical question in the response.
+  - `extract_multi_choice_answer(response: str)`: Extracts the selected choice (A, B, C, D, E) from a multiple-choice question in the response.
+
 ### PromptSelection Paramaters:
 
 - `model`: The language model you want to use. In this example, we're using OpenAI's model.
diff --git a/examples/prompt_selection.py b/examples/prompt_selection.py
@@ -12,7 +12,7 @@
 diversity_measure = ShannonEntropyMeasure()
 model = OpenAIModel(openai_api_key=openai_api_key, extractor=extract_last_letters)
 prompt_selection = PromptSelection(
-    model=model, num_responses=10, diversity_measure=diversity_measure, selection="min"
+    model=model, num_responses=4, diversity_measure=diversity_measure, selection="min"
 )
 selected_prompt, selected_diversity = prompt_selection.generate(
         [
diff --git a/langdiversity/utils/calculate_measures.py b/langdiversity/utils/calculate_measures.py
@@ -12,7 +12,6 @@ def calculate(self, values, measures=None):
 
         results = {}
         
-        # TODO: maybe include more? Or allow users to insert more?
         if "entropy" in measures:
             entropy_measure = ShannonEntropyMeasure()
             results["entropy"] = entropy_measure.generate(values)
diff --git a/media/LangDiversityExample.png b/media/LangDiversityExample.png
diff --git a/setup.py b/setup.py
@@ -8,7 +8,7 @@
 setup(
     name='langdiversity',
     packages=find_packages(exclude=['tests']),
-    version='0.0.3',
+    version='1.0.0',
     description='A tool to elevate your language models with insightful diversity metrics.',
     long_description=long_description,  
     long_description_content_type="text/markdown",  

Original file line number	Diff line number	Diff line change
`@@ -12,7 +12,7 @@`
`12`	`12`	`diversity_measure = ShannonEntropyMeasure()`
`13`	`13`	`model = OpenAIModel(openai_api_key=openai_api_key, extractor=extract_last_letters)`
`14`	`14`	`prompt_selection = PromptSelection(`
`15`		`- model=model, num_responses=10, diversity_measure=diversity_measure, selection="min"`
	`15`	`+ model=model, num_responses=4, diversity_measure=diversity_measure, selection="min"`
`16`	`16`	`)`
`17`	`17`	`selected_prompt, selected_diversity = prompt_selection.generate(`
`18`	`18`	`[`