Active Learning Text Labeling Pipeline -- 2 need Software Development
Contact person: Active Learning Text Labeling Pipeline -- 2
Phone:Show
Email:Show
Location: 6 of October, Egypt
Budget: Recommended by industry experts
Time to start: As soon as possible
Project description:
"I have thousands of raw text records coming in every week and I want a hands-off system that can label them continuously while getting smarter on its own. The pipeline must be fully automated: new text is ingested, pre-processed, routed through a Query-by-Committee active-learning loop, and the agreed-upon label is written back to a database or flat file.
Here is the core flow I have in mind: an ensemble of three to five lightweight text-classification models (they can be variations of fine-tuned transformers, fastText, or scikit-learn models) forms the committee. When the committee disagrees beyond a configurable threshold, the sample is earmarked for retraining—no human in the loop for now, just self-training on the growing pseudo-labeled set. I also need a small service or scheduled job that periodically re-trains the individual models on the newly labeled corpus and updates the ensemble’s voting weights.
Discrete deliverables
• Python source code (cleanly modular, ideally with Poetry or pip-tools)
• Dockerfile or docker-compose for one-command spin-up
• Configuration guide: how to adjust disagreement thresholds, add/remove models, and switch storage back-ends (PostgreSQL or Parquet)
• README explaining end-to-end workflow and a quick demo on a sample dataset
Acceptance criteria
• On a provided benchmark set the pipeline achieves at least the same macro-F1 as a single baseline BERT model within three active-learning iterations
• New data dropped into a watch folder is labeled automatically within one minute and the result appears in the output store
• All steps reproducible on a vanilla Ubuntu 22.04 instance with Docker installed only
If you have prior experience wiring active-learning loops for text — especially using Query-by-Committee — let me know what tools you prefer and drop a short outline of how you would structure the ensemble." (client-provided description)
Matched companies (7)

Codetreasure Co

JanakiBhuvi Tech Labs Private Limited

Chirag Solutions

WhizzAct Private Limited

eShop Genius

TG Coders
