Business Client need Web Development
Contact person: Business Client
Phone:Show
Email:Show
Location: Mun. Braşov, Romania
Budget: Recommended by industry experts
Time to start: As soon as possible
Project description:
"Large-Scale PDF Scraping & Translation WordPress Website Development (((or custom PHP/MySQL solution)))
MOST IMPORTANT:
Before placing any bid, you must contact me privately to receive the link to the website from which the PDF documents will be scraped and downloaded. Do not place a bid before contacting me.
We’re looking for an experienced developer to build a scalable system that can automatically scrape, translate, and publish a very large volume of PDF documents as SEO-friendly web pages
1. Data Extraction & Processing
• Automatically scrape and download all PDF files from a publicly available website.
• Extract text content from each PDF.
1a.[Pay special attention to extracting the title of each document (this will become the article title – see step 2 content publishing)].
1b. [Remove any personal data (especially from the first pages of the documents) so that such information is not extracted or published.]
• Translate extracted text using open source translator (or any other free reliable translator).
________________________________________
2. Content Publishing
• For each translated file, create a new article/page on the website. Each PDF file =>AI Translation => one SEO-friendly page.
• Technology: WordPress (((or custom PHP/MySQL solution))).
• Text must be stored in the database (not as iFrames) for full SEO rendering. The complete text must be visible as standard HTML text.
________________________________________
3. SEO & Indexing
• Auto-generate unique meta titles and meta descriptions for every page (fully crawlable, indexable).
• Use clean, descriptive URLs (e.g. /category/document-title-keywords).
o Each page should include: Title, Tags, Meta description, Full HTML/text content.
• Implement an XML sitemap.
________________________________________
4. Security & Reliability
• Anti-scraping & anti-DDoS protection.
• DMCA/copyright system - Please include a DMCA / Copyright Notice & Takedown Contact section on the website, where users can submit requests to remove copyrighted material that they believe has been published without authorization.
________________________________________
5. Performance Targets
• Fast page load times and mobile-first responsive design. Page load time: under 2.5 seconds (desktop & mobile).
• Core Web Vitals score: 90+ (Google PageSpeed Insights).
• TTFB: under 500 ms.
________________________________________
6. Search & Navigation
• Search bar with filters (categories, tags, keywords).
• Fast search results with filtering options.
• Browsing by category.
• Support for multiple category levels (category, subcategory, sub-subcategory).
• All pages must be free to read and browse for all visitors.
________________________________________
7. Scalability
• Implement a scalable architecture to handle a large volume of content efficiently.
• The system/script must be capable of automatically scraping/downloading PDFs & translating, + publishing the initial 220,000+ PDF text files into indexable web pages upon launch.
________________________________________
Payment Terms:
-100% of the payment will be placed in Escrow on Freelancer.com.
- Payment will be released only after the project is fully functional on the live server and all requirements are met.
- Proof required: a production-ready website, hosted and running on the client’s live domain and server, with all 220,000 initial documents uploaded and accessible, and achieving a Google PageSpeed Insights Core Web Vitals score of 90+ on the Document Page CPT (Custom Post Type).
________________________________________
Deliverable:
A complete, production-ready website/system meeting all the above requirements.
The system must be:
-Fully installed, configured, and functional on the client’s own domain and hosting/server;
- Delivered with all the initial 220,000 documents uploaded, indexed, and publicly accessible;
- Optimized for performance and stability according to the agreed technical specifications;
- Structured and coded in a way that allows easy customization and duplication for future websites with similar functionality.
________________________________________
!!!!! PLEASE READ BEFORE BIDDING !!!!!
Do not bid if you do not have the skills to complete this project YOU NEEED SKILLS & EXPERIENCE with Large-Scale PDF Scraping & Translation WordPress Website Development.
!!!!!!Do not bid if you have never done this before. This should be a simple project for someone who knows what they are doing.!!!!!!!
TO APPLY:
- Place your real bid amount, not a placeholder. I do not want to waste time renegotiating. Time-wasters, please do not bid. Place a real bid amount for this project, not a random sum, and do not ask for more money later. No generic bids. Bid what you actually want me to pay you. I will choose based on the content of your bid.
- Please DO NOT bid if you haven’t read the full job description. Please start your proposal with the phrase -"The sun was pink today"- in the first line of your proposal; otherwise, it will not be considered. This is to confirm that you have read the full description. My time is just as important as yours, and I don’t want us to waste each other’s time.
- Please DO NOT send copy-paste automated messages or automated bids.
Questions & Clarifications
Ask any questions or request clarifications before placing your bid. Do NOT ask questions or clarifications after bidding.
!!!!Everything above is required for your bid to be considered!!!!!" (client-provided description)
Matched companies (5)

April Innovations

Mobiweb Global Solutions

JanakiBhuvi Tech Labs Private Limited

SJ Solutions & Infotech
