Business Client need Web Development

Contact person: Business Client

Phone:Show

Email:Show

Location: Mun. Braşov, Romania

Budget: Recommended by industry experts

Time to start: As soon as possible

Project description:
"Large-Scale PDF Scraping & Translation WordPress Website Development (((or custom PHP/MySQL solution)))

MOST IMPORTANT:
Before placing any bid, you must contact me privately to receive the link to the website from which the PDF documents will be scraped and downloaded. Do not place a bid before contacting me.



We’re looking for an experienced developer to build a scalable system that can automatically scrape, translate, and publish a very large volume of PDF documents as SEO-friendly web pages

1. Data Extraction & Processing
• Automatically scrape and download all PDF files from a publicly available website.
• Extract text content from each PDF.
1a.[Pay special attention to extracting the title of each document (this will become the article title – see step 2 content publishing)].
1b. [Remove any personal data (especially from the first pages of the documents) so that such information is not extracted or published.]
• Translate extracted text using open source translator (or any other free reliable translator).
________________________________________
2. Content Publishing
• For each translated file, create a new article/page on the website. Each PDF file =>AI Translation => one SEO-friendly page.
• Technology: WordPress (((or custom PHP/MySQL solution))).
• Text must be stored in the database (not as iFrames) for full SEO rendering. The complete text must be visible as standard HTML text.
________________________________________
3. SEO & Indexing
• Auto-generate unique meta titles and meta descriptions for every page (fully crawlable, indexable).
• Use clean, descriptive URLs (e.g. /category/document-title-keywords).
o Each page should include: Title, Tags, Meta description, Full HTML/text content.
• Implement an XML sitemap.
________________________________________
4. Security & Reliability
• Anti-scraping & anti-DDoS protection.
• DMCA/copyright system - Please include a DMCA / Copyright Notice & Takedown Contact section on the website, where users can submit requests to remove copyrighted material that they believe has been published without authorization.
________________________________________
5. Performance Targets
• Fast page load times and mobile-first responsive design. Page load time: under 2.5 seconds (desktop & mobile).
• Core Web Vitals score: 90+ (Google PageSpeed Insights).
• TTFB: under 500 ms.
________________________________________
6. Search & Navigation
• Search bar with filters (categories, tags, keywords).
• Fast search results with filtering options.
• Browsing by category.
• Support for multiple category levels (category, subcategory, sub-subcategory).
• All pages must be free to read and browse for all visitors.
________________________________________
7. Scalability
• Implement a scalable architecture to handle a large volume of content efficiently.
• The system/script must be capable of automatically scraping/downloading PDFs & translating, + publishing the initial 220,000+ PDF text files into indexable web pages upon launch.
________________________________________
Payment Terms:
-100% of the payment will be placed in Escrow on Freelancer.com.
- Payment will be released only after the project is fully functional on the live server and all requirements are met.
- Proof required: a production-ready website, hosted and running on the client’s live domain and server, with all 220,000 initial documents uploaded and accessible, and achieving a Google PageSpeed Insights Core Web Vitals score of 90+ on the Document Page CPT (Custom Post Type).
________________________________________
Deliverable:
A complete, production-ready website/system meeting all the above requirements.
The system must be:
-Fully installed, configured, and functional on the client’s own domain and hosting/server;
- Delivered with all the initial 220,000 documents uploaded, indexed, and publicly accessible;
- Optimized for performance and stability according to the agreed technical specifications;
- Structured and coded in a way that allows easy customization and duplication for future websites with similar functionality.
________________________________________


!!!!! PLEASE READ BEFORE BIDDING !!!!!
Do not bid if you do not have the skills to complete this project YOU NEEED SKILLS & EXPERIENCE with Large-Scale PDF Scraping & Translation WordPress Website Development.
!!!!!!Do not bid if you have never done this before. This should be a simple project for someone who knows what they are doing.!!!!!!!

TO APPLY:
- Place your real bid amount, not a placeholder. I do not want to waste time renegotiating. Time-wasters, please do not bid. Place a real bid amount for this project, not a random sum, and do not ask for more money later. No generic bids. Bid what you actually want me to pay you. I will choose based on the content of your bid.
- Please DO NOT bid if you haven’t read the full job description. Please start your proposal with the phrase -"The sun was pink today"- in the first line of your proposal; otherwise, it will not be considered. This is to confirm that you have read the full description. My time is just as important as yours, and I don’t want us to waste each other’s time.
- Please DO NOT send copy-paste automated messages or automated bids.

Questions & Clarifications
Ask any questions or request clarifications before placing your bid. Do NOT ask questions or clarifications after bidding.
!!!!Everything above is required for your bid to be considered!!!!!" (client-provided description)


Matched companies (5)

...

April Innovations

April Innovations is one of the leading Enterprise Software Development companies in Mumbai, with clients being serviced in the USA, UK, and India. T… Read more

...

Mobiweb Global Solutions

Mobiweb Global Solutions is a full-service IT company specializing in web development, mobile app development, blockchain, AI, IoT, and game developm… Read more

...

JanakiBhuvi Tech Labs Private Limited

Delivering Future-Ready Digital Solutions in Web Development, E-commerce, Logo Design, and Digital Marketing. We believe innovation is key to navigat… Read more

...

SJ Solutions & Infotech

SJ Solutions & Infotech is a team of highly experienced and dynamic professionals who have an enormous passion for technology. In this fast changing … Read more

...

Appsdiary Technologies

AppsDiary is a software house that designs and develops mobile applications, websites, and custom software solutions. They work with businesses to c… Read more