Comic Translation GUI Program need Software Development
Contact person: Comic Translation GUI Program
Phone:Show
Email:Show
Location: Minya, Egypt
Budget: Recommended by industry experts
Time to start: As soon as possible
Project description:
"# Build an AI Comic Translator (GUI app + optional Discord bot integration)
## Goal
Create a desktop application (Windows) that **detects comic text**, performs **high-accuracy OCR**, and does **glossary-aware translation** for **Korean manhwa, Chinese manhua, Japanese manga, and Mangatoon** images.
The app should feel like a blend of **Ballon-translator-portable** and **PanelCleaner** (UI + workflow), focused on translation only (no cleaning). Optionally expose a minimal **Discord bot** interface that triggers the same core pipeline on a shared Drive folder.
---
## What I need (Scope)
### 1) Input & Project Handling
* I paste a **Google Drive link** (or select a local folder) containing page images (JPG/PNG/WebP).
* The app loads pages in a **left side panel** (thumbnail list) with page name/number.
* Supported reading directions:
* **Manga (JP):** Right-to-Left, Top-to-Bottom.
* **Manhwa/Manhua (KR/CN):** Left-to-Right, Top-to-Bottom.
* The app should **infer a correct reading order** for detected text boxes/bubbles based on the work’s direction and visual layout. Use an AI/heuristic ordering model (e.g., graph ordering, attention over positions).
### 2) Text Detection (No cleaning)
* High-recall **text region detection** on comic pages (speech bubbles, narration boxes, side text).
* Acceptable approaches:
* **YOLO-family** detector (v8/v9/RT-DETR) trained/fine-tuned for comic text & bubble masks, or
* **CTD/CRAFT/EAST** style detector with solid performance on thin fonts/vertical text.
* Respect **reading direction** to assign an **ordered list of text segments** per page.
### 3) Bubble Type Classification
* For each detected region, classify into one of:
* **Speech**, **Shout**, **Narration**, **Thoughts**, **SideText**.
* Output must **prepend a marker** to each raw & translated line:
* Shout → `S:`
* Speech → `"":`
* Thoughts → `():`
* Narration → `[]:`
* SideText → `ST:`
### 4) OCR (Offline-first)
* **Offline OCR is required** for privacy and speed.
* Use **MangaOCR** (JP) and best-in-class offline OCR for **KR/CN** (e.g., **PaddleOCR** multilingual, custom KR/CN models, or comparable).
* Handle vertical Japanese text and mixed punctuation.
* Provide a **pluggable OCR layer** so models can be swapped/upgraded later.
### 5) Translation (Glossary-aware)
* Core problem: APIs often ignore series glossaries and produce inconsistent character/term names.
* Requirements:
* A **Series Glossary** (CSV/JSON) loaded per project with **term → preferred translation** pairs, compound names, and forbidden variants.
* **Glossary enforcement** during translation:
* At minimum: post-processing **terminology mapping** with **token-boundary-aware** replacements and case handling.
* Preferably: **constrained decoding** or **prompt-level forcing** if using an LLM API.
* Allow multiple backends:
* **Offline NMT** (if practical) or
* **API** (OpenAI/Google/etc.) with **strict glossary injection** and deterministic settings (temperature, style guide).
* The UI must let me **preview raw + translated** text pairs for each segment, with quick edit boxes.
### 6) UI/UX (Blend of Ballon-translator + PanelCleaner)
* **Left sidebar**: page thumbnails + status (Not processed / OCR done / Translated done).
* **Main viewer**: page image with overlay boxes; clicking a box jumps to its text pair.
* **Right panel**:
* Ordered list of segments for the current page.
* For each segment: bubble type label, **RAW** text, **TRANSLATED** text (editable).
* **Run** buttons:
* Per page and **Run All** (batch).
* Progress bar + GPU/CPU indicator.
* **Glossary Manager**: import/export CSV/JSON, live apply.
* **Settings**: reading direction, OCR backend per language, translation backend, glossary rules (strict/lenient), output format options.
### 7) Output
* Final export as a **clean .docx** or **.txt** with this exact structure (nothing else):
* **Page line**: `PAGE: [login to view URL]` (or page number)
* Then **for each ordered segment**:
* **RAW line** with marker, e.g. `[]: ばっはっは!元気があってよろしいこったあ!`
* **TR line** with the **same marker**, e.g. `[]: Bwahaha! Full of energy, that’s the way I like it!`
* No headers/footers/logos/metadata—**only** page names and marked lines.
### 8) Optional: Discord Bot Wrapper
* Minimal bot that:
* Takes a **Drive link** command, queues the job, and returns the exported **docx/txt** when done.
* Same core library as the GUI (no duplicated logic).
* Admin-only commands & simple status messages.
---
## Deliverables
1. **Windows desktop app** (prefer **Python + PySide6/Qt** or **Electron + Python backend**) with installer.
2. **Core inference library** (separate module) for detection → OCR → translation → export.
3. **Model configs & weights** loading code; easy to swap models.
4. **Glossary system** with import/export and deterministic enforcement.
5. **Export module** that guarantees the exact output format.
6. **Optional Discord bot** using the same library.
7. **Documentation**: setup, model downloads, how to add new OCR/translation backends, glossary usage.
8. **Test data & test plan** covering KR/JP/CN pages, vertical JP, dense side text, multi-bubble pages.
9. **Source code** in a clean repo (readme, comments, type hints), plus a short handover video.
---
## Tech Preferences & Environment
* **OS:** Windows 10/11.
* **GPU:** NVIDIA RTX 4060 Ti (CUDA acceleration expected for detection/OCR where supported).
* **Language:** Python 3.11+ preferred for fast iteration (PyTorch/ONNXRuntime).
* **Models:** YOLO-family (or CRAFT/EAST) for detection; MangaOCR + PaddleOCR (KR/CN) or equivalent.
* **Translation:** pluggable—offline if feasible, or API with strong glossary forcing.
* **No image cleaning** features required.
---
## Quality & Performance Targets
* **Detection recall** on speech/narration bubbles: ≥ 95% on provided samples.
* **OCR accuracy** (character-level) on clean scans:
* JP ≥ 95%, KR/CN ≥ 92% (on test set).
* **Ordering correctness**: ≥ 95% segments exported in human-expected reading sequence.
* **Latency**: ≤ 3s per 1500×2200 image on my GPU for full pipeline (avg across chapter).
* **Glossary enforcement**: 100% for exact glossary keys; fuzzy tolerance for minor punctuation/spacing.
---
## Nice-to-Haves (not required, quote separately)
* Auto-language detection per page/segment.
* Confidence scores and “review first” filters for low-confidence OCR or translation.
* Batch merge: single export for entire folder with per-page headers.
* Hotkeys for speedy manual fixes.
* Autosave and crash-safe resume.
---
## What I will provide
* Sample chapters (KR/JP/CN) with ground-truth expectations.
* Initial glossary files for a few series.
* Feedback and rapid testing during development.
---
## Acceptance Criteria (must pass to mark complete)
* I can load a Drive link/folder, click **Run All**, and get a **.docx**/**.txt** that contains only:
* `PAGE: <name>` lines
* For each segment: **RAW** + **TRANSLATED** lines prefixed with the correct bubble **marker**.
* Segments are in correct **reading order** for the chosen language family.
* Glossary terms appear **consistently** across the output.
* App runs offline for detection+OCR; translation can be offline or API (with glossary forcing).
* Installer + documentation provided; I can run on my RTX 4060 Ti machine.
---
## Milestones (suggested)
1. **Design + Prototype**: model choices, UI skeleton, sample page through full pipeline.
2. **Detection + Ordering** solid on test pages; UI overlays + sidebar complete.
3. **OCR + Translation** with glossary enforcement; inline editing UX.
4. **Export module** (exact format), batch processing, project save/load.
5. **Polish & Handover**: installer, docs, optional Discord wrapper, test pass.
---
## Please include in your bid
* Your proposed **model stack** for detection/OCR and how you’ll ensure ordering accuracy.
* How you’ll implement **glossary enforcement** (exact method).
* Any prior work on OCR/comics/NLP.
* A quick plan for reaching the performance targets." (client-provided description)
Matched companies (3)

SJ Solutions & Infotech

Conchakra Technologies Pvt Ltd
