Business Client need Software Development
Contact person: Business Client
Phone:Show
Email:Show
Location: Ramat Gan, Israel
Budget: Recommended by industry experts
Time to start: As soon as possible
Project description:
"We are seeking a seasoned Entity Resolution (ER) Specialist to collaborate on building a high-performance matching pipeline for our music metadata. As the lead Python and JS developer who built the initial system, I will provide full support and infrastructure context.
Project Scope & Core Challenge
The system must link records between two existing databases (our internal MongoDB Atlas cluster and an external source like Genius).
Crucially, in both databases, the hierarchical relationships (Artist → Albums → Tracks) are already established, and every entity has a unique ID.
The task is to build a process that leverages this existing structure:
Phase 1 (Core ER): Accurately link Artist IDs between the two databases.
Phase 2 (Cascade): Use the confirmed Artist links to efficiently and accurately cascade the matching process to the corresponding Album IDs and Track IDs.
Scale: The pipeline must efficiently handle ≈1.5 million tracks and identify/create new internal entities for millions of external records not yet in our database.
Database: Read and write operations must be optimized for MongoDB Atlas.
Technical Methodology & Collaboration
The matching core must be robust and high-speed:
ER Algorithms: The core should leverage the Record Linkage framework, supplemented by powerful string similarity techniques like RapidFuzz or equivalent methods for optimal precision.
Performance: I expect highly efficient blocking/indexing to ensure speed.
Modularity: The logic must be clean and modular, allowing us (as developers) to easily tune weights, thresholds, and candidate generators in the future.
Infrastructure & Acceptance Criteria
The solution must be production-ready for our existing environment.
Deployment Stack: Deliverables must be encapsulated in a Docker container ready for deployment to our AWS environment and integrated neatly into our existing CI/CD flow.
Acceptance Criteria: A job is complete when the container processes a provided sample of 100k records in under 60 minutes on an [login to view URL], and returns ≥95% precision and ≥90% recall at the track level.
Deliverables
Production-ready Python project (PEP 8 compliant).
Dockerfile and compose file that build and run locally and in AWS.
Link Map collection written back to MongoDB with confidence scores.
Brief validation report summarising precision/recall on a held-out set.
Timeline: We are ready to move immediately. Please respond only if you can begin right away and meet the ASAP delivery timeline." (client-provided description)
Matched companies (4)

Kiantechwise Pvt. Ltd.

TechGigs LLP

Omninos Technologies International pvt ltd
