📰 News Scraper

Description

This is an automated script that scrapes the websites of 5 major Brazilian newspapers (Estadão, Folha, g1, UOL and VEJA). It scrapes the homepage of each newspaper and extracts the news headlines, links, summary and more. It then exports the report data to HTML, JSON, PDF and/or image files.

Getting started

Prerequesites

Docker
Docker Compose

Cloning and copying .env example

$ git clone git@github.com:igorantun/news-scraper.git
$ cd news-scraper
$ cp .env.example .env

Other requirements

You should also copy your Firebase serviceAccountKey.json file to the src/config folder.

Make commands

$ make news-scraper # Starts production news scraper worker, with Logflare and Firebase integration enabled
$ make news-scraper-dev # Starts development news scraper worker, with nodemon
$ make clean # Deletes all generated files under ./reports
$ make stop # Stops all services

License

Released under the MIT License. See the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
results		results
scripts		scripts
src		src
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
docker-compose.yml		docker-compose.yml
package.json		package.json
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📰 News Scraper

Description

Getting started

Prerequesites

Cloning and copying .env example

Other requirements

Make commands

License

About

Uh oh!

Releases

Uh oh!

Languages

License

igorantun/news-scraper

Folders and files

Latest commit

History

Repository files navigation

📰 News Scraper

Description

Getting started

Prerequesites

Cloning and copying .env example

Other requirements

Make commands

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Uh oh!

Languages