Critical Memory Leak and Infinite Retry Loop in Semantic Splitter Node

The `Semantic Splitter with Context` node has a critical bug that causes infinite retry loops and memory exhaustion when processing large documents that trigger a `PayloadTooLargeError`. This leads to Node.js heap exhaustion and n8n instance crashes.

---

### Environment

-   **n8n version**: `1.98.2`
-   **Node version**: `v24.2.0`
-   **OS**: Ubuntu 24.04.2 LTS (`Linux 6.8.0-62-generic x86_64`)
-   **Installation method**: Docker

### Steps to Reproduce

1.  Configure the `Semantic Splitter with Context` node with large documents (>50KB content).
2.  Process documents that are large enough to exceed API payload limits.
3.  The node triggers a `PayloadTooLargeError` from the underlying API call.
4.  Observe the infinite retry loop in the n8n execution logs.
5.  Monitor memory consumption, which will grow continuously until the Node.js heap is exhausted.

### Expected Behavior

-   The node should handle the `PayloadTooLargeError` gracefully.
-   Failed operations should not be retried infinitely.
-   Memory usage should remain stable without leaking on retries.
-   Large documents should either be truncated with a warning, or processing should fail cleanly with a descriptive error.

### Actual Behavior

-   **Infinite Loop**: The `splitDocuments` method is called 511+ times with the same document data.
-   **Memory Leak**: New `SemanticDoublePassMergingSplitterWithContext` instances are created on each retry attempt without cleaning up old ones.
-   **Heap Exhaustion**: Memory usage grows from its normal base to over 4GB, triggering a `FATAL ERROR: Reached heap limit`.
-   **System Crash**: The n8n process dies, and the Docker container becomes unresponsive.
-   **Execution Failures**: Multiple workflow executions are marked as "crashed" and "canceled" in the n8n UI.

### Log Evidence

**Repetitive loop:**
```log
LogWrapper: splitDocuments intercepted, docs: 14
LogWrapper: splitDocuments completed, chunks: 14
[Repeated 511+ times over 6 hours]
```

**Memory leak source:**
```log
LogWrapper: Wrapping instance of type: SemanticDoublePassMergingSplitterWithContext
[This log appears on each retry, indicating a new instance is created]
```

**Error trigger:**
```
PayloadTooLargeError: request entity too large
    at readStream (/usr/local/lib/node_modules/n8n/node_modules/.pnpm/raw-body@3.0.0/node_modules/raw-body/index.js:163:17)
```

**Heap exhaustion crash:**
```
<--- Last few GCs --->
[7:0x7718a13d4000] 97030230 ms: Scavenge 4075.5 (4127.0) -> 4072.0 (4128.7) MB
[7:0x7718a13d4000] 97030348 ms: Scavenge 4077.0 (4128.7) -> 4073.4 (4147.7) MB

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
```

### Root Cause Analysis

1.  **Instance Creation Loop (Memory Leak)**: The `supplyData()` method creates a new splitter instance on every call. The n8n retry mechanism invokes this repeatedly, creating 511+ instances that are never garbage collected because they are part of an active (retrying) execution.
2.  **Payload Size Explosion (Trigger)**: The internal `_generateContextualContent()` method embeds the entire document content within API request prompts, causing large documents to exceed API payload limits (e.g., ~1MB).
3.  **Missing Error Handling (Infinite Retry Enabler)**: There is no specific error handling for `PayloadTooLargeError`. The generic error catch allows n8n's built-in retry mechanism to take over without a circuit breaker or custom retry limit.

### Proposed Fix

1.  **Implement Instance Caching:** Prevent re-creating the splitter instance on every retry.
    ```typescript
    private static splitterCache = new Map<string, any>();

    async supplyData(this: ISupplyDataFunctions, itemIndex: number): Promise<SupplyData> {
        const cacheKey = `${this.getNodeParameter('minChunkSize')}_${this.getNodeParameter('maxChunkSize')}`;

        if (!SemanticSplitterWithContext.splitterCache.has(cacheKey)) {
            const splitter = new SemanticDoublePassMergingSplitterWithContext(embeddings, chatModel, options);
            SemanticSplitterWithContext.splitterCache.set(cacheKey, splitter);
        }

        return { split: SemanticSplitterWithContext.splitterCache.get(cacheKey) };
    }
    ```

2.  **Add Payload Size Validation:** Fail fast before making the API call.
    ```typescript
    private validatePayloadSize(content: string): void {
        const maxSize = 100000; // 100KB limit
        if (content.length > maxSize) {
            throw new NodeApiError(this.getNode(), {
                message: `Document too large for processing: ${content.length} characters (max: ${maxSize})`,
                description: 'Consider splitting the document before semantic processing'
            });
        }
    }
    ```

3.  **Implement a Circuit Breaker:** Prevent infinite retries for the same item.
    ```typescript
    private static retryCount = new Map<string, number>();

    // In the main processing method:
    const retryKey = `${executionId}_${itemIndex}`;
    const currentRetries = SemanticSplitterWithContext.retryCount.get(retryKey) || 0;

    if (currentRetries > 3) {
        SemanticSplitterWithContext.retryCount.delete(retryKey);
        throw new NodeApiError(this.getNode(), {
            message: 'Maximum retries exceeded for semantic splitting',
            description: 'Document may be too large or contain problematic content'
        });
    }
    // ... increment retry count on failure
    ```

### Severity

**Critical** - This bug causes complete system failure, data loss (canceled executions), and requires manual intervention to recover. It affects production stability and can crash n8n instances processing even moderately-sized documents (>50KB).

### Impact Assessment

-   **Memory Exhaustion**: Process heap usage grows to 4GB+.
-   **Service Outage**: The n8n instance becomes completely unresponsive, requiring a full restart.
-   **Data Loss**: In-progress executions are marked as "crashed" or "canceled".
-   **Resource Waste**: The server exhausts its resources (observed at 65% RAM + 38% swap usage).

### Workaround

Until a fix is available, users can mitigate this issue by:
1.  Adding a "size validation" step before the semantic splitter node.
2.  Increasing n8n payload limits via environment variables: `N8N_PAYLOAD_DEFAULT_MAX_SIZE=10485760` (10MB).
3.  Setting a global workflow timeout: `N8N_EXECUTIONS_TIMEOUT=300` (5 minutes).
4.  Proactively monitoring memory usage and restarting containers.
5.  Pre-processing large documents by splitting them into smaller chunks *before* the semantic splitter node.

---
**Additional Context**: This issue was identified through production crash analysis showing 511 repeated operations consuming >4GB of memory over 6 hours before heap exhaustion. The bug affects any workflow processing documents that exceed API payload limits, making it a critical stability issue for production deployments.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Critical Memory Leak and Infinite Retry Loop in Semantic Splitter Node #6

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Log Evidence

Root Cause Analysis

Proposed Fix

Severity

Impact Assessment

Workaround

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Critical Memory Leak and Infinite Retry Loop in Semantic Splitter Node #6

Description

Environment

Steps to Reproduce

Expected Behavior

Actual Behavior

Log Evidence

Root Cause Analysis

Proposed Fix

Severity

Impact Assessment

Workaround

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions