Business Client need Web Development

Contact person: Business Client

Phone:Show

Email:Show

Location: Mun. Braşov, Romania

Budget: Recommended by industry experts

Time to start: As soon as possible

Project description:
"Large-Scale Document Sharing Website Development

Project Description
We are looking for an experienced web developer or team to build a LARGE-SCALE, SEO-driven document sharing website development similar in concept to Scribd or Academia.edu.
The platform must be robust, scalable, and optimized for performance, capable of handling 200,000+ PDF/DOC/DOCX documents as separate SEO-friendly pages.
All pages must be fully crawlable, indexable, and fast-loading, with native HTML text rendering extracted directly from the documents.

Core Requirements
Architecture & Technology
• Preferred: WordPress or custom PHP/MySQL— developer must justify choice.
• Hybrid storage model:
o Extracted text → stored in database for SEO visibility.
o Processed files (PDF/DOC) → stored securely in cloud (e.g., AWS S3, Google Cloud) for download.
• Fully responsive, mobile-first design.
• Core Web Vitals score 90+, TTFB <500ms, page load <2.5s (desktop & mobile).

The total document archive is approximately 500 GB, containing hundreds of thousands of files. An automated method must be implemented to detect and remove duplicate or near-duplicate documents before or during the import process.


Data Extraction & Processing
• Extract full text content from every PDF/DOC/DOCX (no partial or broken extraction).
• Store extracted text as native HTML in database — no iframes, images, or embedded viewers.

Content Publishing
• Each document → one unique, SEO-friendly page with title, tags, meta description, and HTML text.
• Clean, descriptive URLs (e.g., /category/document-title-keywords).
• Auto-generated XML sitemap and schema markup.

Bulk Upload System (Critical)
• Must support automatic processing and publishing of at least 200,000+ documents upon launch.
• Batch processing: detect and remove duplicate → extract → [login to view URL] page → index (& 2. processed files (PDF/DOC) → stored securely in cloud)
• Smart categorization (categories/subcategories).
• The developer must confirm:
o Batch size and upload method.
o Estimated total time to have first 200k+ documents live and indexed on the site + processed files (PDF/DOC) → stored securely in cloud.

Search & Navigation
• Search with filters (categories, tags, keywords).
• Must support fast, scalable search (e.g., Elasticsearch, Algolia, or optimized MySQL).
• Browsing by category, subcategory, sub-subcategory.

User Access & Downloads
• Reading: All document pages (HTML text) are free and accessible to all visitors — no login required.
• Accounts: User signup/login is required only for file downloads (stored in cloud).
• Limit: Each registered user (or IP address) can download up to “x” documents per day.
• The system must automatically track downloads and restrict users once their daily limit is reached.
• All user management, authentication, and download tracking must be implemented using free/open-source plugins or code, with no paid extensions.

Security & Legal
• Anti-scraping and anti-DDoS protection.
• Secure uploads and backups.
• Include a Copyright Takedown Contact section/ page for content removal requests.

Analytics & Monitoring
• Google Analytics (GA4) integration.

Apart from the hosting+ cloud cost, there will be absolutely no other costs — no paid plugins, APIs, or subscriptions. Everything must be built using free or open-source tools and must run independently without extra charges.

Deliverable:
-A complete, production-ready website/system meeting all the above requirements. The system must be: -Fully installed, configured, and functional on the client’s own domain and hosting/server;
- Delivered with first 200k+ documents uploaded, indexed and publicly accessible;
- Optimized for performance and stability according to the agreed technical specifications (Large-Scale Documents)
- Structured and coded in a way that allows easy customization and duplication for future websites with similar functionality.
-Full technical documentation (setup, maintenance, scalability).
-Google PageSpeed Insights 90+ score on Document Page CPT.

Payment Terms
• 100% payment in Escrow on Freelancer.com.
• Payment released only after: The website is fully live on my domain/server & first 200k+ documents are uploaded and indexed & all performance and SEO requirements are met.
Proof required: a fully live and functional website hosted on my domain/server, achieving a verified Core Web Vitals score of 90+, and strictly meeting every single requirement listed in the job post — without exceptions.



The total archive contains approximately 500 GB of documents (PDF, DOC, and DOCX formats), representing hundreds of thousands of files that must be carefully processed, sorted, and uploaded online.

The system must automatically detect and remove duplicate or near-duplicate files during import to ensure clean and unique content.

It must also include an automated process to detect and remove any personal or sensitive data (especially from the first pages of the documents) before extraction or publication.

The developer must guarantee that the entire system is built exclusively with free and open-source tools, libraries, and plugins, and that there will be no additional or hidden costs whatsoever — apart from the web hosting plan and AWS cloud storage.

The project involves a very large number of documents, and therefore requires a developer with proven experience in large-scale data processing, automation, full-text extraction, and high-performance web systems. Only serious professionals with verifiable experience should apply. The developer must carefully plan in advance how the entire system will function, including how documents will be uploaded, processed, and published.

Before starting development, they must clearly explain the proposed workflow and automation process — how files are uploaded (individually or in bulk), how text extraction and translation are handled, and how the final pages are generated and indexed. This step is essential to ensure that the project is implemented correctly from the beginning and that the platform can easily handle the large number of documents without issues later

PLEASE READ BEFORE BIDDING !!!!!
Do not bid if you do not have the skills to complete this project.
YOU NEEED SKILLS & EXPERIENCE in large-scale data processing, automation, full-text extraction.
Do not bid if you have never done this before. This should be a simple project for someone who knows what they are doing.!!!!!!!
TO APPLY:
- Place your real bid amount, not a placeholder. I do not want to waste time renegotiating. Time-wasters, please do not bid. Place a real bid amount for this project, not a random sum, and do not ask for more money later. No generic bids. Bid what you actually want me to pay you. I will choose based on the content of your bid.
- Please DO NOT bid if you haven’t read the full job description.
- Please DO NOT send copy-paste automated messages or automated bids.


Questions & Clarifications
Please ask any questions or request clarifications before placing your bid. Do NOT ask questions or clarifications after bidding.
Everything above is required!!!!!" (client-provided description)


Matched companies (7)

...

Conchakra Technologies Pvt Ltd

At Conchakra, our mission is to empower organizations through innovative software solutions that leverage the transformative potential of artificial … Read more

...

HJP Media

I am founder and CEO of HJP Media. The fastest growing AI digital solutions company in the world, offering innovative, AI powered digital marketing a… Read more

...

B2Bcert ISO consultants in Bangalore

B2Bcert is a globally recognized certification and consulting firm dedicated to helping businesses achieve international quality and compliance stand… Read more

...

Junkies Coder

Junkies Coder is a leading technology solution provider across 15 countries and 50+ Rockstar Developers is our strength, We're specializing in web de… Read more

...

Omninos Technologies International pvt ltd

Omninos Technologies offers full-stack mobile and web development services with a specialty in ready-made app clones to accelerate launch timelines a… Read more

...

Crystal Infoway

Crystal Infoway is a well-known IT Service Provider who works to Bring Ideas to Reality. We work to shape the dreams victoriously using Design, Techn… Read more

...

TG Coders

We create custom apps for businesses and startups TG Coders is a technology partner specializing in creating custom mobile and web applications for … Read more