WordPress Sitemap-Based Cache-Warming System need Web Development
Contact person: WordPress Sitemap-Based Cache-Warming System
Phone:Show
Email:Show
Location: Frankfurt (Oder), Germany
Budget: Recommended by industry experts
Time to start: As soon as possible
Project description:
"I need a sitemap-based cache-warming system for a high-scale WordPress site. I’m currently using OpenLiteSpeed (with its cache), but the built-in warmup is too limited. With ~50,000 pages, I want a custom solution using WP-CLI + Cron that warms pages quickly while respecting my server’s resources.
Goals
Fast warmup: Crawl and warm ~50k URLs efficiently.
Resource-aware: Throttle/concurrency adapted to server load (no 500s, no cache stampedes).
Sitemap-driven: Read XML sitemap(s) (incl. paginated sitemaps) and queue all URLs.
Smart scheduling: Run at the best time via Cron (off-peak, Europe/Berlin), with manual overrides.
Observable: Progress logs, metrics, and error reporting.
Resumable: If interrupted, the job can resume where it left off.
Configurable: Concurrency, delay, user-agent, include/exclude patterns, rate limits per host.
Cache hit verified: Optionally re-request to confirm cache status headers (or LiteSpeed cache vary key).
Environment
Web server: OpenLiteSpeed
WordPress: production, ~50k published URLs
Time zone: Europe/Berlin
Access: SSH, WP-CLI available
Deliverables
WP-CLI command(s) (as a small mu-plugin or custom plugin) to:
Parse all sitemap indexes and sitemaps (handle gzip).
Enqueue URLs to a persistent store (custom table or transient-backed queue).
Warm URLs using concurrent workers (curl or WP HTTP API).
Back-off on high load (e.g., load average threshold).
Retry transient failures with capped attempts.
Optional “verify cache” pass (check cache headers).
Cron integration:
Nightly schedule (default 02:00–06:00 local time), plus manual trigger and ad-hoc partial runs.
Staggered batches (e.g., 500–2,000 URLs per slice) to avoid spikes.
Config file (e.g., [login to view URL]) with:
sitemap_urls, concurrency, rate_limit_rps, batch_size, delay_ms, load_avg_max, user_agent, include, exclude, verify_cache, retries, timeout_sec.
Logging + metrics:
File logs in wp-content/cache-warmup/logs/ (rotate daily).
Summary stats: processed, warmed, failed, avg response time, cache-hit ratio.
Exit codes suitable for monitoring; optional Slack/webhook on completion.
Documentation:
Install, configure, and operate.
Safe defaults and how to tune for my server.
Troubleshooting guide.
Acceptance criteria
Full crawl completes under the configured window without noticeably impacting TTFB for real users.
Can pause/resume without losing progress.
Handles 50k+ URLs reliably (tested with dry-run + real run).
Honors include/exclude patterns (e.g., skip search, admin, feeds).
Produces a summary report after each run (success/failed counts, duration, top error codes)." (client-provided description)
Matched companies (3)

Crystal Infoway

TG Coders
