Skip to content

Commit 672b7ad

Browse files
justinsheualanzhang25Mandolaroadivate2021gemini-code-assist[bot]
authored
Staging -> Main (#292)
* Staging and auto package release (#275) * add auto package release * add staging stuff * add test trigger * remove test trigger * timeout + minor error handling * gemini sugggestions * Trace Usage Edits (#266) * update litellm pyproject and fix e2etest (#276) * update litellm pyproject * add e2etest fix * Agent Names added (#270) * Agent Names added * Fixed agent names for deep tracing * Minor fixes * Dummy commit to trigger staging checks * Fix assert test e2e test (#277) * fix assert test e2e test * Update src/judgeval/judgment_client.py Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> --------- Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> * add evaluation link (#279) * add evaluation link * fix run eval * add e2etest for yaml * Image file updates and social media badges (#274) * updated logos and socials badges * Update README.md * Delete package-lock.json --------- Co-authored-by: Minh Pham <62208322+Mandolaro@users.noreply.github.com> * Testing updates (#269) * updates and s3 e2e tests * clear singleton before tests * coverage, fix http request blocker * test trigger * if always block * change working directory * fix * remove test trigger * gemini suggestions * add new trace column (#281) * change timeout to use variable (#282) * add code coverage for staging e2es (#285) * Add ToolDependency Scorer and Parameter Checking (#253) * Sequence to Trace Conversion * Add agent names with a new decorator * Sequence to Trace Conversion * trace save * comment out; * Changes to yaml and agent file * pydantic * Added tool dependency metric * Changed yaml * Add support for multiple assert test calls * Added parameter checking to ToolDependency metric * Agent Names added * Fixed tool dependency scorer with pydantic change * Added internal scorer for parameter checking * Support for action dependencies added * Changed multi agent test to better show case where param checking helps * Added error check for providing no tools when using param checking * Remove unused parameter --------- Co-authored-by: Alan <alanzhang2021@gmail.com> Co-authored-by: Minh Pham <62208322+Mandolaro@users.noreply.github.com> Co-authored-by: JCamyre <jwcamry03@gmail.com> * modified deep tracing to chain with existing tracefuncs (#286) * Refactor agent names (#287) * Refactor agent names * Minor fix * Minor Gemini fix * Minor fixes to e2etest and TraceSpan datatype * Handle un-serializable types in Traces. Filter 'self' arguments. (#278) * Example Created At (#284) * fix logging for assert test trace (#290) * fix logging for assert test trace * fix agent * E2E additions based on assigned judgeval files (#283) * classifierscorer refactor, other e2es + uts additions * removed e2e * small updates * fix uts * remove print --------- Co-authored-by: Alan Zhang <97066812+alanzhang25@users.noreply.github.com> Co-authored-by: Minh Pham <62208322+Mandolaro@users.noreply.github.com> Co-authored-by: Aaryan Divate <44125685+adivate2021@users.noreply.github.com> Co-authored-by: gemini-code-assist[bot] <176961590+gemini-code-assist[bot]@users.noreply.github.com> Co-authored-by: shunuen0 <138656153+shunuen0@users.noreply.github.com> Co-authored-by: Alan <alanzhang2021@gmail.com> Co-authored-by: JCamyre <jwcamry03@gmail.com> Co-authored-by: Joseph S Camyre <68767176+JCamyre@users.noreply.github.com>
1 parent e2e30e1 commit 672b7ad

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

54 files changed

+2142
-736
lines changed

.github/workflows/ci-staging.yaml

Lines changed: 103 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,103 @@
1+
2+
name: Staging CI Tests
3+
4+
on:
5+
pull_request:
6+
types: [opened, synchronize, reopened]
7+
branches:
8+
- staging
9+
10+
permissions: read-all
11+
12+
jobs:
13+
run-tests:
14+
strategy:
15+
fail-fast: false
16+
matrix:
17+
os: [ubuntu-latest, macos-latest]
18+
python-version:
19+
- "3.11"
20+
name: Test
21+
runs-on: ${{ matrix.os }}
22+
env:
23+
PYTHONPATH: "."
24+
OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}
25+
TOGETHER_API_KEY: ${{ secrets.TOGETHER_API_KEY }}
26+
JUDGMENT_DEV: true
27+
28+
steps:
29+
- name: Checkout code
30+
uses: actions/checkout@v4
31+
32+
- name: Set up Python
33+
uses: actions/setup-python@v4
34+
with:
35+
python-version: ${{ matrix.python-version }}
36+
37+
- name: Install dependencies
38+
run: |
39+
pip install pipenv
40+
pipenv install --dev
41+
42+
43+
- name: Run tests
44+
run: |
45+
cd src
46+
pipenv run pytest tests
47+
48+
run-e2e-tests-staging:
49+
if: "!contains(github.actor, '[bot]')" # Exclude if the actor is a bot
50+
name: Staging E2E Tests
51+
runs-on: ubuntu-latest
52+
env:
53+
TEST_TIMEOUT_SECONDS: ${{ secrets.TEST_TIMEOUT_SECONDS }}
54+
steps:
55+
- name: Wait for turn
56+
uses: softprops/turnstyle@v2
57+
with:
58+
poll-interval-seconds: 10
59+
same-branch-only: false
60+
job-to-wait-for: "Staging E2E Tests"
61+
62+
- name: Configure AWS Credentials
63+
uses: aws-actions/configure-aws-credentials@v4
64+
with:
65+
aws-access-key-id: ${{ secrets.AWS_ACCESS_KEY_ID }}
66+
aws-secret-access-key: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
67+
aws-region: us-west-1
68+
69+
- name: Checkout code
70+
uses: actions/checkout@v4
71+
72+
- name: Set up Python
73+
uses: actions/setup-python@v4
74+
with:
75+
python-version: "3.11"
76+
77+
- name: Install judgeval dependencies
78+
run: |
79+
pip install pipenv
80+
pipenv install --dev
81+
82+
- name: Check if server is running
83+
run: |
84+
if ! curl -s https://staging.api.judgmentlabs.ai/health > /dev/null; then
85+
echo "Staging Judgment server is not running properly. Check logs on AWS CloudWatch for more details."
86+
exit 1
87+
else
88+
echo "Staging server is running."
89+
fi
90+
91+
- name: Run E2E tests
92+
working-directory: src
93+
run: |
94+
SECRET_VARS=$(aws secretsmanager get-secret-value --secret-id gh-actions-stg-judgeval/api-keys/judgeval --query SecretString --output text)
95+
export $(echo "$SECRET_VARS" | jq -r 'to_entries | .[] | "\(.key)=\(.value)"')
96+
timeout ${TEST_TIMEOUT_SECONDS}s pipenv run pytest --durations=0 --cov=. --cov-config=.coveragerc --cov-report=html ./e2etests
97+
98+
- name: Upload coverage HTML report
99+
if: always()
100+
uses: actions/upload-artifact@v4
101+
with:
102+
name: coverage-html
103+
path: src/htmlcov

.github/workflows/ci.yaml

Lines changed: 13 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
1+
12
name: CI Tests
23

34
on:
@@ -48,6 +49,8 @@ jobs:
4849
if: "!contains(github.actor, '[bot]')" # Exclude if the actor is a bot
4950
name: E2E Tests
5051
runs-on: ubuntu-latest
52+
env:
53+
TEST_TIMEOUT_SECONDS: ${{ secrets.TEST_TIMEOUT_SECONDS }}
5154
steps:
5255
- name: Wait for turn
5356
uses: softprops/turnstyle@v2
@@ -78,7 +81,7 @@ jobs:
7881
7982
- name: Check if server is running
8083
run: |
81-
if ! curl -s http://api.judgmentlabs.ai/health > /dev/null; then
84+
if ! curl -s https://api.judgmentlabs.ai/health > /dev/null; then
8285
echo "Production Judgment server is not running properly. Check logs on AWS CloudWatch for more details."
8386
exit 1
8487
else
@@ -88,6 +91,13 @@ jobs:
8891
- name: Run E2E tests
8992
working-directory: src
9093
run: |
91-
SECRET_VARS=$(aws secretsmanager get-secret-value --secret-id gh-actions/api-keys/judgeval --query SecretString --output text)
94+
SECRET_VARS=$(aws secretsmanager get-secret-value --secret-id gh-actions-judgeval/api-keys/judgeval --query SecretString --output text)
9295
export $(echo "$SECRET_VARS" | jq -r 'to_entries | .[] | "\(.key)=\(.value)"')
93-
pipenv run pytest --durations=0 ./e2etests
96+
timeout ${TEST_TIMEOUT_SECONDS}s pipenv run pytest --durations=0 --cov=. --cov-config=.coveragerc --cov-report=html ./e2etests
97+
98+
- name: Upload coverage HTML report
99+
if: always()
100+
uses: actions/upload-artifact@v4
101+
with:
102+
name: coverage-html
103+
path: src/htmlcov

.github/workflows/merge-to-main.yaml

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,32 @@
1+
name: Enforce Main Branch Protection
2+
3+
on:
4+
pull_request:
5+
types: [opened, synchronize, reopened, edited]
6+
7+
jobs:
8+
validate-branch:
9+
runs-on: ubuntu-latest
10+
steps:
11+
- name: Check branch name
12+
run: |
13+
# Get the base and source branch names
14+
BASE_BRANCH="${{ github.base_ref }}"
15+
SOURCE_BRANCH="${{ github.head_ref }}"
16+
17+
echo "BASE_BRANCH: $BASE_BRANCH"
18+
echo "SOURCE_BRANCH: $SOURCE_BRANCH"
19+
20+
# Only run validation if the base branch is main
21+
if [[ "$BASE_BRANCH" != "main" ]]; then
22+
echo "Skipping branch validation - not targeting main branch"
23+
exit 0
24+
fi
25+
26+
# Check if the source branch is staging or starts with hotfix/
27+
if [[ "$SOURCE_BRANCH" != "staging" && ! "$SOURCE_BRANCH" =~ ^hotfix/ ]]; then
28+
echo "::error::Pull requests to main can only be created from 'staging' or 'hotfix/*' branches. Current branch: $SOURCE_BRANCH"
29+
exit 1
30+
fi
31+
32+
echo "Branch validation passed. Source branch: $SOURCE_BRANCH"

.github/workflows/release.yaml

Lines changed: 92 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,92 @@
1+
name: Release on Main Merge
2+
3+
on:
4+
push:
5+
branches:
6+
- main
7+
8+
jobs:
9+
release:
10+
runs-on: ubuntu-latest
11+
outputs:
12+
new_version: ${{ steps.bump_tag.outputs.new_version }}
13+
14+
steps:
15+
- name: Checkout code
16+
uses: actions/checkout@v4
17+
with:
18+
fetch-depth: 0
19+
20+
- name: Install Python
21+
uses: actions/setup-python@v4
22+
with:
23+
python-version: 3.11
24+
25+
- name: Get latest version
26+
id: get_version
27+
run: |
28+
version=$(curl -s https://pypi.org/pypi/judgeval/json | jq -r .info.version)
29+
echo "latest_version=$version" >> $GITHUB_OUTPUT
30+
31+
- name: Bump version and create new tag
32+
id: bump_tag
33+
run: |
34+
latest_version=${{ steps.get_version.outputs.latest_version }}
35+
echo "Latest version: $latest_version"
36+
37+
# Extract version numbers
38+
IFS='.' read -r major minor patch <<< "$latest_version"
39+
40+
# Bump patch version
41+
patch=$((patch + 1))
42+
new_version="$major.$minor.$patch"
43+
44+
echo "New version: $new_version"
45+
echo "new_version=$new_version" >> $GITHUB_OUTPUT
46+
47+
git config user.name "github-actions"
48+
git config user.email "github-actions@github.com"
49+
git tag v$new_version
50+
git push origin v$new_version
51+
52+
- name: Create GitHub release
53+
uses: softprops/action-gh-release@v2
54+
with:
55+
tag_name: v${{ steps.bump_tag.outputs.new_version }}
56+
generate_release_notes: true
57+
body: |
58+
You can find this package release on PyPI: https://pypi.org/project/judgeval/${{ steps.bump_tag.outputs.new_version }}/
59+
env:
60+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
61+
62+
- name: Bump pyproject.toml version
63+
run: |
64+
python update_version.py ${{ steps.bump_tag.outputs.new_version }}
65+
66+
- name: Build PyPI package
67+
run: |
68+
python -m pip install --upgrade build
69+
python -m build
70+
71+
- name: Create PyPI release
72+
run: |
73+
python -m pip install --upgrade twine
74+
python -m twine upload --repository pypi -u ${{ secrets.PYPI_USERNAME }} -p ${{ secrets.PYPI_PASSWORD }} dist/*
75+
76+
cleanup:
77+
needs: release
78+
if: failure()
79+
runs-on: ubuntu-latest
80+
steps:
81+
- name: Checkout code
82+
uses: actions/checkout@v4
83+
84+
- name: Authenticate GitHub CLI
85+
run: echo "${{ secrets.GITHUB_TOKEN }}" | gh auth login --with-token
86+
87+
- name: Delete tag and release
88+
run: |
89+
gh release delete v${{ needs.release.outputs.new_version }} --yes
90+
git push --delete origin v${{ needs.release.outputs.new_version }}
91+
env:
92+
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}

Pipfile

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,7 @@ pytest-mock = "*"
2727
tavily-python = "*"
2828
chromadb = "*"
2929
langchain-community = "*"
30+
pytest-cov = "*"
3031

3132
[requires]
3233
python_version = "3.11"

Pipfile.lock

Lines changed: 85 additions & 4 deletions
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)