Scrape 299 pages to spreadsheet need Web Development

Contact person: Scrape 299 pages to spreadsheet

Phone:Show

Email:Show

Location: Amsterdam, Netherlands

Budget: Recommended by industry experts

Time to start: As soon as possible

Project description:
"I have 299 simple webpages that all share the same underlying layout. What I need is the content pulled from each page — primarily the bodycontent which is mainly headers and paragraphs, plus a handful of smaller fields that sit in the same section of every page.

You can view the pages here:
[login to view URL] (dutch list)
[login to view URL] (english list)

For each page I need the contents of these fields to be scarped and exported to a spreadsheet:

• Title <h1 class="pageheader__title">
• <meta name="language">
• <meta property="og:image">
• Lead <div class="c-lead lead">
• The contents of this div <div class="c-detailsummary__title visually-hidden"> in four separate fields these fields are named differently in the dutch and english pages
• <dt>Thema's</dt> or <dt>Themes</dt>
• <dt>Faculteit</dt> or <dt>Faculty</dt>
• <dt>Doelgroep</dt> or <dt>Target group</dt>
• <dt>Werkvorm</dt> or <dt>Type</dt>
• All other visible bodycontent with markup and hrefs
• All other visible bodycontent withOUT markup

I expect the output to look like the Excel file I added as an example.

I am the owner of the site but I currently have nobody that can do the dataexport from the database so we'll just scape it from the frontend.

You’re free to script this in Python (BeautifulSoup, Scrapy, Selenium, etc.) or any other tooling you’re comfortable with, as long as the final deliverable is a UTF-8 CSV or Excel file that I can open directly. I’ll consider the work complete once a manual spot-check of 10 random pages shows the extracted text and HTML align perfectly with what appears on the live site." (client-provided description)


Matched companies (2)

...

eShop Genius

We’re in the industry With the experience of 12+years created more than 1200 stores and have build brands! At eShop Genius, we are an ISO certi… Read more

...

Haven Futures

We Build any kind of Software and Provide wide range of tech solutions.