Fast, lightweight, and comprehensive Open Graph extractor for Node.js with advanced features
Extract Open Graph tags, Twitter Cards, structured data, and 60+ meta tag types with built-in caching, validation, and bulk processing. Optimized for performance and security.
- π Lightning Fast: Built-in caching with tiny-lru and optimized parsing
- π― Production Ready: Comprehensive error handling, validation, and security features
- π Most Complete: Extracts Open Graph, Twitter Cards, JSON-LD, Schema.org, and 60+ meta tags
- π Smart Analytics: Built-in validation, social scoring, and performance metrics
- π‘οΈ Security First: HTML sanitization, URL validation, and PII protection (Node.js only)
- π§ Developer Friendly: Full TypeScript support, modern async/await API
- β 60+ Meta Tags: Open Graph, Twitter Cards, Dublin Core, App Links
- β JSON-LD Extraction: Complete structured data parsing
- β Schema.org Support: Microdata and RDFa extraction
- β Smart Fallbacks: Intelligent content detection when tags are missing
- πΌοΈ Smart Media: Automatic format detection and best image selection
- πΉ Rich Metadata: Video, audio, and responsive image support
- πΎ Smart Caching: Built-in memory cache with tiny-lru
- π Bulk Processing: Concurrent extraction for multiple URLs
- β¨ Data Validation: Comprehensive Open Graph and Twitter Card validation
- π Social Scoring: 0-100 score for social media optimization
- π― SEO Insights: Performance metrics and recommendations
- β±οΈ Performance Tracking: Detailed timing and statistics
- π‘οΈ HTML Sanitization: XSS protection using Cheerio (Node.js only)
- π PII Protection: Automatic detection and masking of sensitive data
- π URL Security: Domain filtering and validation
- π« Content Safety: Malicious content detection
# Using yarn (recommended)
yarn add @devmehq/open-graph-extractor
# Using npm
npm install @devmehq/open-graph-extractor
import axios from 'axios';
import { extractOpenGraph } from '@devmehq/open-graph-extractor';
// Fetch HTML and extract Open Graph data
const { data: html } = await axios.get('https://example.com');
const ogData = extractOpenGraph(html);
console.log(ogData);
// {
// ogTitle: 'Example Title',
// ogDescription: 'Example Description',
// ogImage: 'https://example.com/image.jpg',
// twitterCard: 'summary_large_image',
// favicon: 'https://example.com/favicon.ico'
// // ... 60+ more fields
// }
import { extractOpenGraphAsync } from '@devmehq/open-graph-extractor';
// Extract with validation, caching, and structured data
const result = await extractOpenGraphAsync(html, {
extractStructuredData: true,
validateData: true,
generateScore: true,
cache: {
enabled: true,
ttl: 3600, // 1 hour
storage: 'memory'
},
security: {
sanitizeHtml: true,
validateUrls: true
}
});
console.log(result);
// {
// data: { /* Complete Open Graph data */ },
// structuredData: { /* JSON-LD, Schema.org, etc */ },
// confidence: 95,
// errors: [],
// warnings: [],
// metrics: { /* Performance data */ }
// }
const result = await extractOpenGraphAsync(html, {
extractStructuredData: true
});
console.log(result.structuredData);
// {
// jsonLD: [...], // All JSON-LD scripts
// schemaOrg: {...}, // Schema.org microdata
// dublinCore: {...}, // Dublin Core metadata
// microdata: {...}, // Microdata
// rdfa: {...} // RDFa data
// }
import { extractOpenGraphBulk } from '@devmehq/open-graph-extractor';
const urls = ['url1', 'url2', 'url3'...];
const results = await extractOpenGraphBulk({
urls,
concurrency: 5,
rateLimit: {
requests: 100,
window: 60000 // 1 minute
},
onProgress: (completed, total, url) => {
console.log(`Processing ${completed}/${total}: ${url}`);
}
});
import { validateOpenGraph, generateSocialScore } from '@devmehq/open-graph-extractor';
// Validate Open Graph data
const validation = validateOpenGraph(ogData);
console.log(validation);
// {
// valid: false,
// errors: [...],
// warnings: [...],
// score: 75,
// recommendations: [...]
// }
// Get social media score
const score = generateSocialScore(ogData);
console.log(score);
// {
// overall: 82,
// openGraph: { score: 90, ... },
// twitter: { score: 75, ... },
// recommendations: [...]
// }
const result = await extractOpenGraphAsync(html, {
security: {
sanitizeHtml: true, // XSS protection using Cheerio
detectPII: true, // PII detection
maskPII: true, // Mask sensitive data
validateUrls: true, // URL validation
allowedDomains: ['example.com'],
blockedDomains: ['malicious.com']
}
});
// With built-in memory cache (tiny-lru)
const result = await extractOpenGraphAsync(html, {
cache: {
enabled: true,
ttl: 3600, // 1 hour
storage: 'memory',
maxSize: 1000
}
});
// With custom cache (Redis example)
import Redis from 'ioredis';
const redis = new Redis();
const result = await extractOpenGraphAsync(html, {
cache: {
enabled: true,
ttl: 3600,
storage: 'custom',
customStorage: {
async get(key) {
const value = await redis.get(key);
return value ? JSON.parse(value) : null;
},
async set(key, value, ttl) {
await redis.setex(key, ttl, JSON.stringify(value));
},
async delete(key) {
await redis.del(key);
},
async clear() {
await redis.flushdb();
},
async has(key) {
return (await redis.exists(key)) === 1;
}
}
}
});
const result = await extractOpenGraphAsync(html);
// Automatically detects and prioritizes best images
console.log(result.data.ogImage);
// {
// url: 'https://example.com/image.jpg',
// type: 'jpg',
// width: '1200',
// height: '630',
// alt: 'Description'
// }
// For multiple images, set allMedia: true
const allMediaResult = extractOpenGraph(html, { allMedia: true });
console.log(allMediaResult.ogImage);
// [
// { url: '...', width: '1200', height: '630', type: 'jpg' },
// { url: '...', width: '800', height: '600', type: 'png' }
// ]
Synchronous extraction - Fast and lightweight for basic use cases.
import { extractOpenGraph } from '@devmehq/open-graph-extractor';
const data = extractOpenGraph(html, {
customMetaTags: [
{ multiple: false, property: 'article:author', fieldName: 'author' }
],
allMedia: true, // Extract all images/videos
ogImageFallback: true, // Fallback to page images
onlyGetOpenGraphInfo: false // Include fallback content
});
Asynchronous extraction - Full feature set with advanced capabilities.
import { extractOpenGraphAsync } from '@devmehq/open-graph-extractor';
const result = await extractOpenGraphAsync(html, {
// Core options
extractStructuredData: true, // JSON-LD, Schema.org, Microdata
validateData: true, // Data validation
generateScore: true, // SEO/social scoring
extractArticleContent: true, // Article text extraction
detectLanguage: true, // Language detection
normalizeUrls: true, // URL normalization
// Advanced features
cache: { enabled: true, ttl: 3600 },
security: { sanitizeHtml: true, validateUrls: true }
});
Option | Type | Default | Description |
---|---|---|---|
customMetaTags |
Array | [] |
Custom meta tags to extract |
allMedia |
boolean | false |
Extract all images/videos instead of just the first |
onlyGetOpenGraphInfo |
boolean | false |
Skip fallback content extraction |
ogImageFallback |
boolean | false |
Enable image fallback from page content |
Option | Type | Default | Description |
---|---|---|---|
extractStructuredData |
boolean | false |
Extract JSON-LD, Schema.org, Microdata |
validateData |
boolean | false |
Validate extracted Open Graph data |
generateScore |
boolean | false |
Generate SEO/social media score (0-100) |
extractArticleContent |
boolean | false |
Extract main article text content |
detectLanguage |
boolean | false |
Detect content language and text direction |
normalizeUrls |
boolean | false |
Normalize and clean all URLs |
cache |
ICacheOptions | undefined |
Caching configuration |
security |
ISecurityOptions | undefined |
Security and validation settings |
Option | Type | Default | Description |
---|---|---|---|
enabled |
boolean | false |
Enable caching |
ttl |
number | 3600 |
Time-to-live in seconds |
storage |
string | 'memory' |
Storage type: 'memory', 'redis', 'custom' |
maxSize |
number | 1000 |
Maximum cache entries (memory only) |
keyGenerator |
Function | - | Custom cache key generator |
customStorage |
ICacheStorage | - | Custom storage implementation |
Option | Type | Default | Description |
---|---|---|---|
sanitizeHtml |
boolean | false |
Sanitize HTML content (XSS protection) |
detectPII |
boolean | false |
Detect personally identifiable information |
maskPII |
boolean | false |
Mask detected PII in results |
validateUrls |
boolean | false |
Validate and filter URLs |
maxRedirects |
number | 5 |
Maximum URL redirects to follow |
timeout |
number | 10000 |
Request timeout in milliseconds |
allowedDomains |
string[] | [] |
Allowed domains whitelist |
blockedDomains |
string[] | [] |
Blocked domains blacklist |
Basic extraction result with 60+ fields:
{
ogTitle?: string;
ogDescription?: string;
ogImage?: string | string[] | IOgImage | IOgImage[];
ogUrl?: string;
ogType?: OGType;
twitterCard?: TwitterCardType;
favicon?: string;
// ... 50+ more fields including:
// Twitter Cards, App Links, Article metadata,
// Product info, Music data, Dublin Core, etc.
}
Enhanced result with validation and metrics:
{
data: IOGResult; // Extracted Open Graph data
structuredData: { // Structured data extraction
jsonLD: any[];
schemaOrg: any;
microdata: any;
rdfa: any;
dublinCore: any;
};
errors: IError[]; // Validation errors
warnings: IWarning[]; // Validation warnings
confidence: number; // Confidence score (0-100)
confidenceLevel: 'high' | 'medium' | 'low';
fallbacksUsed: string[]; // Which fallbacks were used
metrics: IMetrics; // Performance metrics
validation?: IValidationResult; // Validation details (if enabled)
socialScore?: ISocialScore; // Social media scoring (if enabled)
}
Validates Open Graph data against specifications.
import { validateOpenGraph } from '@devmehq/open-graph-extractor';
const validation = validateOpenGraph(ogData);
console.log(validation);
// {
// valid: boolean,
// errors: IError[],
// warnings: IWarning[],
// score: number,
// recommendations: string[]
// }
Generates social media optimization score (0-100).
import { generateSocialScore } from '@devmehq/open-graph-extractor';
const score = generateSocialScore(ogData);
console.log(score);
// {
// overall: number,
// openGraph: { score, present, missing, issues },
// twitter: { score, present, missing, issues },
// schema: { score, present, missing, issues },
// seo: { score, present, missing, issues },
// recommendations: string[]
// }
Process multiple URLs concurrently with rate limiting.
import { extractOpenGraphBulk } from '@devmehq/open-graph-extractor';
const results = await extractOpenGraphBulk({
urls: ['url1', 'url2', 'url3'],
concurrency: 5, // Process 5 URLs simultaneously
rateLimit: { // Rate limiting
requests: 100, // Max 100 requests
window: 60000 // Per 60 seconds
},
continueOnError: true, // Don't stop on individual failures
onProgress: (completed, total, url) => {
console.log(`Progress: ${completed}/${total} - ${url}`);
},
onError: (url, error) => {
console.error(`Failed to process ${url}:`, error);
}
});
console.log(results.summary);
// {
// total: number,
// successful: number,
// failed: number,
// totalDuration: number,
// averageDuration: number
// }
// Extract custom meta tags
const result = extractOpenGraph(html, {
customMetaTags: [
{
multiple: false,
property: 'article:author',
fieldName: 'articleAuthor'
},
{
multiple: true,
property: 'article:tag',
fieldName: 'articleTags'
}
]
});
console.log(result.articleAuthor); // Custom field
console.log(result.articleTags); // Array of tags
- Open Graph: Complete og:* tag support with type validation
- Twitter Cards: All twitter:* tags including player and app cards
- Dublin Core: dc:* metadata extraction
- App Links: al:* tags for mobile app deep linking
- Article Metadata: Publishing dates, authors, sections, tags
- Product Info: Prices, availability, condition, retailer data
- Music Metadata: Albums, artists, songs, duration
- Place/Location: GPS coordinates and location data
// Automatically extracts all supported meta types
const data = extractOpenGraph(html);
console.log(data.ogTitle, data.twitterCard, data.articleAuthor);
When meta tags are missing, the library intelligently falls back to:
<title>
tags for ogTitle- Meta descriptions for ogDescription
- Page images for ogImage
- Canonical URLs for ogUrl
- Page content analysis for missing data
// Fallbacks work automatically
const data = extractOpenGraph(html, { ogImageFallback: true });
// Will find images even if og:image is missing
- JSON-LD: Parses all
<script type="application/ld+json">
blocks - Schema.org: Extracts microdata with itemscope/itemprop
- RDFa: Resource Description Framework attributes
- Microdata: HTML5 microdata extraction
const result = await extractOpenGraphAsync(html, {
extractStructuredData: true
});
console.log(result.structuredData);
// {
// jsonLD: [{ "@type": "Article", "headline": "..." }],
// schemaOrg: { "Product": { "name": "...", "price": "..." }},
// microdata: { "Review": { "rating": "5" }},
// rdfa: { "Person": { "name": "John Doe" }}
// }
- Article Extraction: Finds and extracts main article content
- Reading Time: Calculates estimated reading time
- Word Count: Counts words in extracted content
- Language Detection: Auto-detects content language and text direction
const result = await extractOpenGraphAsync(html, {
extractArticleContent: true,
detectLanguage: true
});
console.log(result.data.articleContent); // Main article text
console.log(result.data.readingTime); // 5 (minutes)
console.log(result.data.language); // "en-US"
console.log(result.data.textDirection); // "ltr"
- Open Graph Validation: Checks required fields and formats
- Twitter Card Validation: Ensures proper card types and content
- URL Validation: Verifies image and video URLs
- Content Validation: Checks for reasonable field lengths
const result = await extractOpenGraphAsync(html, {
validateData: true
});
if (!result.validation.valid) {
console.log("Issues found:");
result.validation.errors.forEach(error => {
console.log(`- ${error.field}: ${error.message}`);
});
console.log("Recommendations:");
result.validation.recommendations.forEach(rec => {
console.log(`- ${rec}`);
});
}
Generates SEO and social media optimization scores (0-100):
const result = await extractOpenGraphAsync(html, {
generateScore: true
});
console.log(`Overall Score: ${result.socialScore.overall}/100`);
console.log(`Open Graph: ${result.socialScore.openGraph.score}/100`);
console.log(`Twitter: ${result.socialScore.twitter.score}/100`);
// Get actionable recommendations
result.socialScore.recommendations.forEach(rec => {
console.log(`π‘ ${rec}`);
});
// π‘ Add og:image for better social sharing
// π‘ Include twitter:card for Twitter optimization
- Memory Cache: Built-in LRU cache with tiny-lru
- Redis Support: Enterprise-ready Redis caching
- Custom Storage: Implement your own cache backend
- TTL Control: Configurable expiration times
// Memory caching
const result = await extractOpenGraphAsync(html, {
cache: {
enabled: true,
ttl: 3600, // 1 hour
maxSize: 1000, // Max entries
storage: 'memory'
}
});
// Redis caching
const result = await extractOpenGraphAsync(html, {
cache: {
enabled: true,
ttl: 7200, // 2 hours
storage: 'redis' // Requires Redis setup
}
});
Process multiple URLs efficiently with concurrency control:
const results = await extractOpenGraphBulk({
urls: siteUrls,
concurrency: 10, // 10 simultaneous requests
rateLimit: {
requests: 100, // Max 100 requests
window: 60000 // Per minute
},
onProgress: (done, total, url) => {
updateProgressBar(done / total);
}
});
console.log(`Processed ${results.summary.successful}/${results.summary.total} URLs`);
Detailed metrics for optimization:
const result = await extractOpenGraphAsync(html);
console.log("Performance Metrics:");
console.log(`- Total time: ${result.metrics.performance.totalTime}ms`);
console.log(`- HTML parsing: ${result.metrics.performance.htmlParseTime}ms`);
console.log(`- Meta extraction: ${result.metrics.performance.metaExtractionTime}ms`);
console.log(`- Found ${result.metrics.metaTagsFound} meta tags`);
console.log(`- Used fallbacks: ${result.fallbacksUsed.join(', ')}`);
- XSS Protection: Sanitizes HTML content using Cheerio
- URL Validation: Prevents SSRF attacks
- Domain Control: Allow/block specific domains
- Content Filtering: Remove malicious content
const result = await extractOpenGraphAsync(html, {
security: {
sanitizeHtml: true, // Clean HTML content
validateUrls: true, // Verify all URLs
allowedDomains: [ // Only allow these domains
'example.com',
'cdn.example.com'
],
blockedDomains: [ // Block these domains
'malicious.com'
],
maxRedirects: 3, // Limit URL redirects
timeout: 5000 // 5 second timeout
}
});
- PII Detection: Automatically detects personal information
- Data Masking: Optional masking of sensitive content
- Safe Extraction: Removes potentially harmful data
const result = await extractOpenGraphAsync(html, {
security: {
detectPII: true, // Detect emails, phones, addresses
maskPII: true // Mask detected PII in results
}
});
// PII will be masked in the output
// "Contact: j***@example.com" instead of "Contact: john@example.com"
- Format Detection: Supports JPG, PNG, GIF, WebP, AVIF, SVG
- Size Optimization: Automatically selects best image sizes
- Responsive Images: Handles srcset and multiple formats
- Fallback Images: Finds images when og:image is missing
// Enhanced image extraction
const result = await extractOpenGraphAsync(html, {
allMedia: true // Extract all images, not just the first
});
console.log(result.data.ogImage);
// [
// { url: 'image1.jpg', width: 1200, height: 630, type: 'jpg' },
// { url: 'image2.png', width: 800, height: 600, type: 'png' }
// ]
- Video Information: Duration, thumbnails, captions, chapters
- Audio Metadata: Track info, artists, albums, duration
- Streaming Support: Handles video players and streaming URLs
const result = await extractOpenGraphAsync(videoPageHtml);
console.log(result.data.ogVideo);
// {
// url: 'video.mp4',
// duration: 300,
// thumbnails: [{ url: 'thumb.jpg', width: 1280, height: 720 }],
// captions: [{ language: 'en', url: 'captions.vtt' }]
// }
const result = await extractOpenGraphAsync(html);
console.log(result.metrics);
// {
// extractionTime: 125, // ms
// htmlSize: 54321, // bytes
// metaTagsFound: 15,
// structuredDataFound: 3,
// imagesFound: 8,
// videosFound: 1,
// fallbacksUsed: ['title', 'description'],
// performance: {
// htmlParseTime: 20,
// metaExtractionTime: 10,
// structuredDataExtractionTime: 15,
// validationTime: 5,
// totalTime: 125
// }
// }
# Run tests
yarn test
# Run with coverage
yarn test --coverage
# Install dependencies
yarn install
# Build
yarn build
# Lint and format with Biome
yarn lint
yarn format
# Type check
yarn typecheck
We offer this as a managed Cloud API Service. Try it here: URL Scraping & Metadata Service
The library is fully typed with comprehensive TypeScript definitions:
IOGResult
- Main result interface with 60+ fieldsIExtractionResult
- Async extraction result with metricsIExtractOpenGraphOptions
- Configuration optionsIStructuredData
- JSON-LD and structured data typesIValidationResult
- Data validation resultsISocialScore
- Social media scoring detailsIMetrics
- Performance tracking metrics
All types are exported for your use in TypeScript projects.
Feature | This Library | Others |
---|---|---|
Open Graph | β Complete (60+ fields) | β Basic |
Twitter Cards | β Complete | |
JSON-LD | β Full Extraction | β No |
Schema.org | β Microdata/RDFa | β No |
Caching | β Built-in (tiny-lru) | β No |
Bulk Processing | β Concurrent | β No |
Validation | β Comprehensive | β No |
Security | β Node.js optimized | β No |
TypeScript | β Full Types | |
Performance | β Optimized | |
Maintenance | β Active |
- HTML Sanitization: Uses Cheerio for safe HTML parsing (Node.js only)
- PII Detection: Automatic detection and masking of sensitive data
- URL Validation: Prevents SSRF attacks with domain filtering
- Content Security: Malicious content detection and filtering
- Fast Extraction: Sub-100ms for average pages
- Smart Caching: Built-in tiny-lru cache reduces repeated processing
- Concurrent Processing: Configurable concurrency for bulk operations
- Optimized Parsing: Cheerio-based parsing for Node.js performance
We welcome contributions! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
Built with:
- Cheerio - Fast, flexible & lean implementation of jQuery for Node.js
- tiny-lru - Tiny LRU cache for high-performance caching
- Biome - Fast formatter and linter for JavaScript and TypeScript
Made with β€οΈ by DEV.ME
Need help or custom features? Contact us