Business Client need Software Development
Contact person: Business Client
Phone:Show
Email:Show
Location: Rohtak, India
Budget: Recommended by industry experts
Time to start: As soon as possible
Project description:
 "We have a Python-based processing pipeline on Google Cloud.
When a user uploads a large file, our backend triggers a Cloud Run Job.
This job splits the file into multiple smaller parts, uploads them to GCS, and then sends each part to the Vertex AI Gemini API for processing.
Current Setup
A semaphore is used with max 12 concurrent requests per job.
Example:
A 100-page file is split into 100 parts.
At most 12 requests are sent concurrently to Gemini.
As responses return, more requests are queued and sent.
The Problem
Even with a single user / single Cloud Run job, we often hit:
429 Resource Exhausted errors.
Gemini APIs reportedly use dynamic shared quota, so it’s not predictable.
This raises concern that at scale (many users/jobs) the system could break.
What We’re Looking For
Someone who has faced and solved similar quota/concurrency issues with Gemini API or Vertex AI APIs.
We don’t want suggestions like:
Switching to different Gemini models
Paying for provisioned throughput
Our stack is fixed on Gemini 2.5 Pro / flash-latest. (Flash-latest just have single region="global" so switching regions wont help here). We are new startup and dont want to fail at the time of launching itself and also dont want to pay ultra high for provisioned througput
Payment
If your solution works in practice, we’ll pay you.
Please only apply if you have hands-on experience fixing this type of issue." (client-provided description)
Matched companies (4)

El Codamics

Versasia Infosoft

Mobiweb Global Solutions
