Skip to content

Commit f080a84

Browse files
[Integration][Github] Add Support to Only Ingest Closed Pull Request Within a Specified Date Range (#1948)
### **User description** # Description What - Adds optional maxResults config option to include closed PRs during export Why - Users need visibility into recently closed PRs for tracking and analysis, but ingesting all historical closed PRs would be impractical and slow. How - - Changed state selector on Pull Request to states to accept both open and closed states - Added maxResults config option to include closed PRs during export - Added Batch limiting (max 100 closed PRs) to prevent performance issues - Modified Webhook processor to update (not delete) closed PRs when maxResults flag is enabled pull request ingestion to GitHub integration with time-based filtering. ## Type of change Please leave one option from the following and delete the rest: - [ ] Bug fix (non-breaking change which fixes an issue) - [ ] New feature (non-breaking change which adds functionality) - [ ] New Integration (non-breaking change which adds a new integration) - [ ] Breaking change (fix or feature that would cause existing functionality to not work as expected) - [ ] Non-breaking change (fix of existing functionality that will not change current behavior) - [ ] Documentation (added/updated documentation) <h4> All tests should be run against the port production environment(using a testing org). </h4> ### Core testing checklist - [ ] Integration able to create all default resources from scratch - [ ] Resync finishes successfully - [ ] Resync able to create entities - [ ] Resync able to update entities - [ ] Resync able to detect and delete entities - [ ] Scheduled resync able to abort existing resync and start a new one - [ ] Tested with at least 2 integrations from scratch - [ ] Tested with Kafka and Polling event listeners - [ ] Tested deletion of entities that don't pass the selector ### Integration testing checklist - [ ] Integration able to create all default resources from scratch - [ ] Resync able to create entities - [ ] Resync able to update entities - [ ] Resync able to detect and delete entities - [ ] Resync finishes successfully - [ ] If new resource kind is added or updated in the integration, add example raw data, mapping and expected result to the `examples` folder in the integration directory. - [ ] If resource kind is updated, run the integration with the example data and check if the expected result is achieved - [ ] If new resource kind is added or updated, validate that live-events for that resource are working as expected - [ ] Docs PR link [here](#) ### Preflight checklist - [ ] Handled rate limiting - [ ] Handled pagination - [ ] Implemented the code in async - [ ] Support Multi account ## Screenshots Include screenshots from your environment showing how the resources of the integration will look. ## API Documentation Provide links to the API documentation used for this integration. ___ ### **PR Type** Enhancement ___ ### **Description** - Add optional closed pull request ingestion with time filtering - Limit closed PRs to 60-day window and 100 per repository - Update webhook processor to preserve closed PRs when enabled - Maintain backward compatibility with existing configurations ___ ### Diagram Walkthrough ```mermaid flowchart LR A["Config Flag"] --> B["Fetch Open PRs"] A --> C["Fetch Closed PRs"] C --> D["Filter by 60 days"] D --> E["Limit to 100 PRs"] F["Webhook Event"] --> G["Check Config"] G --> H["Update vs Delete"] ``` <details> <summary><h3> File Walkthrough</h3></summary> <table><thead><tr><th></th><th align="left">Relevant files</th></tr></thead><tbody><tr><td><strong>Enhancement</strong></td><td><details><summary>4 files</summary><table> <tr> <td><strong>pull_request_exporter.py</strong><dd><code>Add closed PR fetching with time filtering</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/port-labs/ocean/pull/1948/files#diff-4fa157a9e3d755d837f15ff03fc0c67a7b822e04188ffd5efba92ea6a2aca7c9">+86/-3</a>&nbsp; &nbsp; </td> </tr> <tr> <td><strong>options.py</strong><dd><code>Add include_closed option to ListPullRequestOptions</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/port-labs/ocean/pull/1948/files#diff-c47b92fb4fc063ecebd6e211ac6c6b269ad22b8e2464fe37ccd838cec4e55e79">+1/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>pull_request_webhook_processor.py</strong><dd><code>Update webhook to preserve closed PRs conditionally</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/port-labs/ocean/pull/1948/files#diff-b0dd226e62e15b9cad433cc962bfab31cbc8074ce147708cefa51629753e4b8b">+5/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> <tr> <td><strong>main.py</strong><dd><code>Pass include_closed flag to exporter</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/port-labs/ocean/pull/1948/files#diff-87e9377eb3998df79e7b005e93973b0b888f8431d81b49504bae4264fc148a0a">+1/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> </table></details></td></tr><tr><td><strong>Configuration changes</strong></td><td><details><summary>1 files</summary><table> <tr> <td><strong>integration.py</strong><dd><code>Add closedPullRequests configuration field</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/port-labs/ocean/pull/1948/files#diff-352618668ce736511a5de564f4e299286cc93f0445a9f0300368ba6c93452eef">+5/-0</a>&nbsp; &nbsp; &nbsp; </td> </tr> </table></details></td></tr><tr><td><strong>Tests</strong></td><td><details><summary>2 files</summary><table> <tr> <td><strong>test_pull_request_exporter.py</strong><dd><code>Add comprehensive tests for closed PR functionality</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/port-labs/ocean/pull/1948/files#diff-8d5e404f1fd0f2ee953732fc2064032bd99c768058685fb7ea669146160d3705">+214/-5</a>&nbsp; </td> </tr> <tr> <td><strong>test_pull_request_webhook_processor.py</strong><dd><code>Add tests for webhook closed PR handling</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/port-labs/ocean/pull/1948/files#diff-3857aef3d467cc3605e6de42bbe2016de5acc83a5fbb297291e77d86ea7d04b0">+184/-1</a>&nbsp; </td> </tr> </table></details></td></tr><tr><td><strong>Documentation</strong></td><td><details><summary>1 files</summary><table> <tr> <td><strong>CHANGELOG.md</strong><dd><code>Document new closed PR features</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/port-labs/ocean/pull/1948/files#diff-092e2c8d25f45c5b95e709fe6ab651954f49369b5616cafe9dfbdb2ec0819147">+10/-0</a>&nbsp; &nbsp; </td> </tr> </table></details></td></tr><tr><td><strong>Miscellaneous</strong></td><td><details><summary>1 files</summary><table> <tr> <td><strong>pyproject.toml</strong><dd><code>Bump version to 1.2.1-beta</code>&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; </dd></td> <td><a href="https://github.com/port-labs/ocean/pull/1948/files#diff-d1a301fa45068b9aa15404141d1303141fb46e768af1e5a05513f414faca0299">+1/-1</a>&nbsp; &nbsp; &nbsp; </td> </tr> </table></details></td></tr></tr></tbody></table> </details> ___ --------- Co-authored-by: Michael Kofi Armah <mikeyarmah@gmail.com>
1 parent 6e4e984 commit f080a84

File tree

10 files changed

+341
-67
lines changed

10 files changed

+341
-67
lines changed

integrations/github/CHANGELOG.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,16 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
<!-- towncrier release notes start -->
99

10+
## 1.3.0-beta (2025-08-18)
11+
12+
13+
### Improvements
14+
15+
- Added maxResults and since config options to include closed PRs during export
16+
- Added Batch limiting (max 100 closed PRs) to prevent performance issues
17+
- Modified Webhook processor to update (not delete) closed PRs when maxResults flag is enabled
18+
19+
1020
## 1.2.11-beta (2025-08-18)
1121

1222

integrations/github/github/core/exporters/pull_request_exporter.py

Lines changed: 84 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,5 @@
1+
from datetime import UTC, datetime, timedelta
2+
from typing import Any
13
from github.helpers.utils import enrich_with_repository, extract_repo_params
24
from port_ocean.core.ocean_types import ASYNC_GENERATOR_RESYNC_TYPE, RAW_ITEM
35
from loguru import logger
@@ -26,15 +28,94 @@ async def get_paginated_resources[
2628
](self, options: ExporterOptionsT) -> ASYNC_GENERATOR_RESYNC_TYPE:
2729
"""Get all pull requests in the organization's repositories with pagination."""
2830

29-
repo_name, params = extract_repo_params(dict(options))
31+
repo_name, extras = extract_repo_params(dict(options))
32+
states = extras["states"]
33+
max_results = extras["max_results"]
34+
since = extras["since"]
35+
36+
logger.info(f"Starting pull request export for repository {repo_name}")
37+
38+
if "open" in states:
39+
async for open_batch in self._fetch_open_pull_requests(
40+
repo_name, {"state": "open"}
41+
):
42+
yield open_batch
3043

31-
endpoint = (
44+
if "closed" in states:
45+
async for closed_batch in self._fetch_closed_pull_requests(
46+
repo_name, max_results, since
47+
):
48+
yield closed_batch
49+
50+
def _build_pull_request_paginated_endpoint(self, repo_name: str) -> str:
51+
return (
3252
f"{self.client.base_url}/repos/{self.client.organization}/{repo_name}/pulls"
3353
)
3454

55+
async def _fetch_open_pull_requests(
56+
self, repo_name: str, params: dict[str, Any]
57+
) -> ASYNC_GENERATOR_RESYNC_TYPE:
58+
endpoint = self._build_pull_request_paginated_endpoint(repo_name)
59+
3560
async for pull_requests in self.client.send_paginated_request(endpoint, params):
3661
logger.info(
37-
f"Fetched batch of {len(pull_requests)} pull requests from repository {repo_name}"
62+
f"Fetched batch of {len(pull_requests)} open pull requests from repository {repo_name}"
3863
)
3964
batch = [enrich_with_repository(pr, repo_name) for pr in pull_requests]
4065
yield batch
66+
67+
async def _fetch_closed_pull_requests(
68+
self, repo_name: str, max_results: int, since: int
69+
) -> ASYNC_GENERATOR_RESYNC_TYPE:
70+
endpoint = self._build_pull_request_paginated_endpoint(repo_name)
71+
params = {
72+
"state": "closed",
73+
"sort": "updated",
74+
"direction": "desc",
75+
}
76+
77+
total_count = 0
78+
logger.info(
79+
f"[Closed PRs] Starting fetch for closed pull requests of repository {repo_name} "
80+
f"with max_results={max_results}"
81+
)
82+
83+
async for pull_requests in self.client.send_paginated_request(endpoint, params):
84+
if not pull_requests:
85+
logger.info(
86+
f"[Closed PRs] No more closed pull requests returned for repository {repo_name}; stopping."
87+
)
88+
break
89+
90+
remaining = max_results - total_count
91+
if remaining <= 0:
92+
break
93+
94+
# Trim batch if it would exceed max_results
95+
limited_batch = pull_requests[:remaining]
96+
batch_count = len(limited_batch)
97+
98+
logger.info(
99+
f"[Closed PRs] Fetched closed pull requests batch of {batch_count} from {repo_name} "
100+
f"(total so far: {total_count + batch_count}/{max_results})"
101+
)
102+
103+
yield [
104+
enrich_with_repository(pr, repo_name)
105+
for pr in self._filter_prs_by_updated_at(limited_batch, since)
106+
]
107+
total_count += batch_count
108+
109+
def _filter_prs_by_updated_at(
110+
self, prs: list[dict[str, Any]], since: int
111+
) -> list[dict[str, Any]]:
112+
cutoff = datetime.now(UTC) - timedelta(days=since)
113+
114+
return [
115+
pr
116+
for pr in prs
117+
if datetime.strptime(pr["updated_at"], "%Y-%m-%dT%H:%M:%SZ").replace(
118+
tzinfo=UTC
119+
)
120+
>= cutoff
121+
]

integrations/github/github/core/options.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -37,7 +37,9 @@ class SinglePullRequestOptions(RepositoryIdentifier):
3737
class ListPullRequestOptions(RepositoryIdentifier):
3838
"""Options for listing pull requests."""
3939

40-
state: Required[str]
40+
states: Required[list[str]]
41+
max_results: Required[int]
42+
since: Required[int]
4143

4244

4345
class SingleIssueOptions(RepositoryIdentifier):

integrations/github/github/webhook/webhook_processors/pull_request_webhook_processor.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -42,7 +42,7 @@ async def handle_event(
4242
logger.info(f"Processing pull request event: {action} for {repo_name}/{number}")
4343

4444
config = cast(GithubPullRequestConfig, resource_config)
45-
if action == "closed" and config.selector.state == "open":
45+
if action == "closed" and "closed" not in config.selector.states:
4646
logger.info(
4747
f"Pull request {repo_name}/{number} was closed and will be deleted"
4848
)

integrations/github/integration.py

Lines changed: 16 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -70,9 +70,22 @@ class GithubFolderResourceConfig(ResourceConfig):
7070

7171

7272
class GithubPullRequestSelector(Selector):
73-
state: Literal["open", "closed", "all"] = Field(
74-
default="open",
75-
description="Filter by pull request state (e.g., open, closed, all)",
73+
states: list[Literal["open", "closed"]] = Field(
74+
default=["open"],
75+
description="Filter by pull request state (e.g., open, closed)",
76+
)
77+
max_results: int = Field(
78+
alias="maxResults",
79+
default=100,
80+
ge=1,
81+
le=300,
82+
description="Limit the number of pull requests returned",
83+
)
84+
since: int = Field(
85+
default=60,
86+
ge=1,
87+
le=90,
88+
description="Only fetch pull requests created within the last N days (1-90 days)",
7689
)
7790

7891

integrations/github/main.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -235,7 +235,9 @@ async def resync_pull_requests(kind: str) -> ASYNC_GENERATOR_RESYNC_TYPE:
235235
pull_request_exporter.get_paginated_resources(
236236
ListPullRequestOptions(
237237
repo_name=repo["name"],
238-
state=config.selector.state,
238+
states=list(config.selector.states),
239+
max_results=config.selector.max_results,
240+
since=config.selector.since,
239241
)
240242
)
241243
for repo in repos

integrations/github/pyproject.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
[tool.poetry]
22
name = "github-ocean"
3-
version = "1.2.11-beta"
3+
version = "1.3.0-beta"
44
description = "This integration ingest data from github"
55
authors = ["Chukwuemeka Nwaoma <joelchukks@gmail.com>", "Melody Anyaegbulam <melodyogonna@gmail.com>", "Michael Armah <mikeyarmah@gmail.com>"]
66

0 commit comments

Comments
 (0)