LB-Adapt scrapy application

Collect structured data from Adapt website. A web crawler that will visit Adapt website. Fetch html pages, parse them in the desired format and save them in JSON file.

Architecture of my Application:

The Spider throws the initial request to the Engine to start crawling.
The Engine schedules the request in the Scheduler and asks for the next request to crawl.
The Scheduler throws back the request to the Engine.
The Engine sends the request off to the Downloader, where the request passes through the Downloader Middleware.
After the request finishes downloading, the Downloader generates a response, it sends it back to the Engine through the Downloader Middleware
After getting the response from the Downloader, the Engine returns the response to the Spider through the Spider Middleware.
The Spider processes the response and scrapes the required items, and the new request for crawling back to the Engine, again passing through the Spider Middleware.
The Engine sends the scraped items to the Items (also known as Item Pipeline) and then sends the processed requests to the Scheduler and asks for the next request to crawl.
This processes (step-1 to step-8) repeats until no requests are left in the Scheduler for crawling.

Project Skeleton:

.
├── adapt
│   ├── adapt
│   │   ├── __init__.py
│   │   ├── items.py
│   │   ├── middlewares.py
│   │   ├── pipelines.py
│   │   ├── settings.py
│   │   └── spiders
│   │       ├── adapt_apider_company_profiles.py
│   │       ├── adapt_spider_company_index.py
│   │       └── __init__.py
│   ├── data
│   │   ├── architecture.png
│   │   ├── company_index.json
│   │   ├── company_profile.json
│   │   └── lbadaptDB.sql
│   └── scrapy.cfg
├── README.md
└── requirements.txt

Here I wrote two spiders:

adapt_spider_company_index.py for crawl all company name and url and save data into company_index.json file.
adapt_apider_company_profiles.py for crawl company profile with contact details. And save data into company_profile.json file.

All scrape data you can find here:

Company index data format:

[
    {
        "company_name":"A&A Technology Group",
        "source_url":"https://www.adapt.io/company/a-a-technology-group",
        "tag":"https://www.adapt.io/directory/industry/telecommunications/A-1"
    },
    {
        "company_name":"A Better Answer",
        "source_url":"https://www.adapt.io/company/a-better-answer-4",
        "tag":"https://www.adapt.io/directory/industry/telecommunications/A-1"
    },
    {
        "company_name":"A Cheerful Giver",
        "source_url":"https://www.adapt.io/company/a-cheerful-giver-inc-1",
        "tag":"https://www.adapt.io/directory/industry/telecommunications/A-1"
    },
    {
        "company_name":"A-CTI",
        "source_url":"https://www.adapt.io/company/a-cti-1",
        "tag":"https://www.adapt.io/directory/industry/telecommunications/A-1"
    }
]

Company profiles data format:

[
    {
        "Company_name":"Argosy Communication Products Ltd.",
        "Company_location":"Victoria, BC, Canada",
        "Company_website":"http://www.acpltd.ca",
        "Company_webdomain":"acpltd.ca",
        "Company_industry":"Telecommunications",
        "Company_employee_size":"0 - 25",
        "Company_revenue":null,
        "contact_details":[
            {
                "contact_name":"Gerry Wight",
                "contact_jobtitle":"Director of Sales",
                "contact_email_domain":"@acpltd.ca",
                "contact_department":"Director of Sales"
            }
        ]
    },
    {
        "Company_name":"Access Point, Inc.",
        "Company_location":"Cary,  North Carolina, United States",
        "Company_website":"http://www.gtt.net",
        "Company_webdomain":"gtt.net",
        "Company_industry":"Telecommunications",
        "Company_employee_size":"100 - 250",
        "Company_revenue":"$10 - 50M",
        "contact_details":[
            {
                "contact_name":"Candice Lane",
                "contact_jobtitle":"Order Assurance Specialist",
                "contact_email_domain":"@accesspointinc.com",
                "contact_department":"Order Assurance Specialist"
            },
            {
                "contact_name":"Jack Erdman",
                "contact_jobtitle":"Credit and Collections Manager",
                "contact_email_domain":"@accesspointinc.com",
                "contact_department":"Credit and Collections Manager"
            }
        ]
    }
]

Which database engine I choose and why?

I chose MySQL database.
Why?
- Scalability and Flexibility: The MySQL database server provides the ultimate in scalability, sporting the capacity to handle deeply embedded applications
- High Availability: A unique storage-engine architecture allows database professionals to configure the MySQL database server specifically for particular applications, with the end result being amazing performance results.
- Web and Data Warehouse Strengths: MySQL is the de-facto standard for high-traffic web sites because of its high-performance query engine, tremendously fast data insert capability, and strong support for specialized web functions like fast full text searches.
- Comprehensive Application Development: One of the reasons MySQL is the world's most popular open source database is that it provides comprehensive support for every application development need.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
adapt		adapt
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LB-Adapt scrapy application

Architecture of my Application:

Here I wrote two spiders:

Which database engine I choose and why?

About

Uh oh!

Releases

Packages

Uh oh!

Languages

rakibulislam01/lb-adapt

Folders and files

Latest commit

History

Repository files navigation

LB-Adapt scrapy application

Architecture of my Application:

Here I wrote two spiders:

Which database engine I choose and why?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages