You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
* Update new README with fresh sections for features + contributors.
* Add asset screenshots for README update
* Adjust feature section components to have a 'Useful for' section. Improve other sections with feedback from Aaryan.
* Attempted fix for Table of Contents hyperlinks
* Attempt fix for README TOC with github-markdown-toc repo
* Change dataset clustering image to dark-mode and edit wording for some sections.
* add darkmode ss
---------
Co-authored-by: Aaryan Divate <44125685+adivate2021@users.noreply.github.com>
Copy file name to clipboardExpand all lines: README.md
+5-5Lines changed: 5 additions & 5 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -43,11 +43,11 @@ Judgeval is created and maintained by [Judgment Labs](https://judgmentlabs.ai/).
43
43
44
44
|||
45
45
|:---|:---:|
46
-
| <h3>🔍 Tracing</h3>Automatic agent tracing integrated with common frameworks (LangGraph, OpenAI, Anthropic): **tracking inputs/outputs, latency, and cost** at every step.<br><br>Online evals can be applied to traces to measure quality on production data in real-time.<br><br>Export trace data to the Judgment Platform or your own S3 buckets, {Parquet, JSON, YAML} files, or data warehouse.<br><br>**Useful for:**<br>• 🐛 Debugging agent runs <br>• 👤 Tracking user activity <br>• 🔬 Pinpointing performance bottlenecks| <palign="center"><imgsrc="assets/trace_screenshot.png"alt="Tracing visualization"width="800"/></p> |
| <h3>📡 Monitoring</h3>Real-time performance tracking of your agents in production environments. **Track all your metrics in one place.**<br><br>Set up **Slack/email alerts** for critical metrics and receive notifications when thresholds are exceeded.<br><br> **Useful for:** <br>•📉 Identifying degradation early <br>•📈 Visualizing performance trends across versions and time | <palign="center"><imgsrc="assets/monitoring_screenshot.png"alt="Monitoring Dashboard"width="400"/></p> |
49
-
| <h3>📊 Datasets</h3>Export trace data or import external testcases to datasets hosted on Judgment's Platform. Move datasets to/from Parquet, S3, etc. <br><br>Run evals on datasets as unit tests or to A/B test different agent configurations. <br><br> **Useful for:**<br>• 🔄 Scaled analysis for A/B tests <br>• 🗃️ Filtered collections of agent runtime data| <palign="center"><imgsrc="assets/datasets_preview_screenshot.png"alt="Dataset management"width="800"/></p> |
50
-
| <h3>💡 Insights</h3>Error clustering groups agent failures to uncover patterns and speed up root cause analysis.<br><br>Trace failures to their exact source with Judgment's Osiris agent, which localizes errors to specific components for precise fixes.<br><br> **Useful for:**<br>•🤖 Investigating agent/user behavior for optimization <br>•🔮 Surfacing common inputs that lead to error| <palign="center"><imgsrc="assets/dataset_clustering_screenshot.png"alt="Insights dashboard"width="2400"/></p> |
46
+
| <h3>🔍 Tracing</h3>Automatic agent tracing integrated with common frameworks (LangGraph, OpenAI, Anthropic): **tracking inputs/outputs, latency, and cost** at every step.<br><br>Online evals can be applied to traces to measure quality on production data in real-time.<br><br>Export trace data to the Judgment Platform or your own S3 buckets, {Parquet, JSON, YAML} files, or data warehouse.<br><br>**Useful for:**<br>• 🐛 Debugging agent runs <br>• 👤 Tracking user activity <br>• 🔬 Pinpointing performance bottlenecks| <palign="center"><imgsrc="assets/trace_screenshot.png"alt="Tracing visualization"width="1200"/></p> |
| <h3>📡 Monitoring</h3>Real-time performance tracking of your agents in production environments. **Track all your metrics in one place.**<br><br>Set up **Slack/email alerts** for critical metrics and receive notifications when thresholds are exceeded.<br><br> **Useful for:** <br>•📉 Identifying degradation early <br>•📈 Visualizing performance trends across versions and time | <palign="center"><imgsrc="assets/monitoring_screenshot.png"alt="Monitoring Dashboard"width="1200"/></p> |
49
+
| <h3>📊 Datasets</h3>Export trace data or import external testcases to datasets hosted on Judgment's Platform. Move datasets to/from Parquet, S3, etc. <br><br>Run evals on datasets as unit tests or to A/B test different agent configurations. <br><br> **Useful for:**<br>• 🔄 Scaled analysis for A/B tests <br>• 🗃️ Filtered collections of agent runtime data| <palign="center"><imgsrc="assets/datasets_preview_screenshot.png"alt="Dataset management"width="1200"/></p> |
50
+
| <h3>💡 Insights</h3>Cluster on your data to reveal common use cases and failure modes.<br><br>Trace failures to their exact source with Judgment's Osiris agent, which localizes errors to specific components for precise fixes.<br><br> **Useful for:**<br>•🔮 Surfacing common inputs that lead to error<br>•🤖 Investigating agent/user behavior for optimization <br>| <palign="center"><imgsrc="assets/dataset_clustering_screenshot_dm.png"alt="Insights dashboard"width="1200"/></p> |
0 commit comments