n8n-nodes-cheerio-html-parser

This is a custom n8n node that uses Cheerio to parse HTML content.

Features

Parse HTML using multiple CSS selectors
Convert selected output to array or string
Remove unwanted elements (scripts, styles, navigation, etc.) before parsing
Extract specific attributes from elements

Installation

Clone this repository
Install dependencies:

npm install

Build the node:

npm run build

Link to your n8n installation:

npm link

In your n8n installation directory, run:

npm link n8n-nodes-cheerio-html-parser

Usage

Add the "Cheerio HTML Parser" node to your workflow
Input the HTML content you want to parse
Add one or more selectors with:
- Name: A unique identifier for this selector result
- CSS Selector: The CSS selector to find elements (e.g., "div.content", "p.title", "#main")
- Attribute (optional): Extract a specific attribute instead of text content
- Return Single Item: Choose whether to return the first match or all matches
Optionally specify elements to remove before parsing (e.g., "script, style, nav, footer")
Connect the node to your workflow

Example

Input HTML:

<div class="content">
  <h1>Title</h1>
  <p>Some text</p>
</div>

With selector: .content h1 the node will return:

{
  "results": {
    "title": "Title"
  },
  "totalElements": 1,
  "selectors": 1
}

Complete Example

Input HTML:

<div class="article">
  <h1 class="title">Welcome to my blog</h1>
  <div class="content">
    <p>First paragraph of content</p>
    <p>Second paragraph of content</p>
  </div>
</div>

Node Configuration:

{
  "selectors": [
    {
      "name": "title",
      "selector": "h1.title",
      "singleItem": true
    },
    {
      "name": "paragraphs",
      "selector": "div.content p",
      "singleItem": false
    }
  ]
}

Output:

{
  "results": {
    "title": "Welcome to my blog",
    "paragraphs": [
      "First paragraph of content",
      "Second paragraph of content"
    ]
  },
  "totalElements": 3,
  "selectors": 2
}

Advanced Example with Element Removal

Input HTML:

<html>
  <head>
    <script>console.log('analytics');</script>
    <style>.hidden { display: none; }</style>
  </head>
  <body>
    <nav>Navigation Menu</nav>
    <main>
      <h1 class="title">Article Title</h1>
      <div class="content">
        <p>Main content here</p>
      </div>
    </main>
    <footer>Footer content</footer>
  </body>
</html>

Node Configuration:

{
  "removeElements": "script, style, nav, footer",
  "selectors": [
    {
      "name": "title",
      "selector": "h1.title",
      "singleItem": true
    },
    {
      "name": "content",
      "selector": "div.content p",
      "singleItem": true
    },
    {
      "name": "titleClass",
      "selector": "h1.title",
      "attribute": "class",
      "singleItem": true
    }
  ]
}

Output:

{
  "results": {
    "title": "Article Title",
    "content": "Main content here",
    "titleClass": "title"
  },
  "totalElements": 3,
  "selectors": 3
}

Note: The script, style, nav, and footer elements were removed before parsing, so they don't interfere with the content extraction.

Node Structure

The node outputs an object with the following structure:

results: An object containing the extracted data, with keys matching the selector names
totalElements: The total number of elements found across all selectors
selectors: The number of selectors that were processed

Development

To run tests:

npm test

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
.github		.github
nodes/CheerioHTMLParser		nodes/CheerioHTMLParser
.gitignore		.gitignore
README.md		README.md
biome.json		biome.json
gulpfile.js		gulpfile.js
jest.config.js		jest.config.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

n8n-nodes-cheerio-html-parser

Features

Installation

Usage

Example

Complete Example

Advanced Example with Element Removal

Node Structure

Development

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

zzzarius/n8n-nodes-cheerio-html-parser

Folders and files

Latest commit

History

Repository files navigation

n8n-nodes-cheerio-html-parser

Features

Installation

Usage

Example

Complete Example

Advanced Example with Element Removal

Node Structure

Development

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages