Skip to content

📰 Scraper for Brazilian newspapers Estadão, Folha de S.Paulo, g1 and VEJA

License

Notifications You must be signed in to change notification settings

igorantun/news-scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

📰 News Scraper

Description

This is an automated script that scrapes the websites of 5 major Brazilian newspapers (Estadão, Folha, g1, UOL and VEJA). It scrapes the homepage of each newspaper and extracts the news headlines, links, summary and more. It then exports the report data to HTML, JSON, PDF and/or image files.

Getting started

Prerequesites

  • Docker
  • Docker Compose

Cloning and copying .env example

$ git clone git@github.com:igorantun/news-scraper.git
$ cd news-scraper
$ cp .env.example .env

Other requirements

You should also copy your Firebase serviceAccountKey.json file to the src/config folder.

Make commands

$ make news-scraper # Starts production news scraper worker, with Logflare and Firebase integration enabled
$ make news-scraper-dev # Starts development news scraper worker, with nodemon
$ make clean # Deletes all generated files under ./reports
$ make stop # Stops all services

License

Released under the MIT License. See the LICENSE file for details.

About

📰 Scraper for Brazilian newspapers Estadão, Folha de S.Paulo, g1 and VEJA

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published