Use this file to discover all available pages before exploring further.
After deploying your Flash app with flash deploy, you can call your endpoints directly via HTTP. The request format depends on whether you’re using queue-based or load-balanced configurations.
Queue-based endpoints (using @Endpoint(name=..., gpu=...) decorator) provide two routes for job submission: /run (asynchronous) and /runsync (synchronous).
{ "id": "job-abc-123", "status": "COMPLETED", "output": { "generated_text": "Hello world from GPU!" }}
The /runsync endpoint has a 60-second client-side timeout by default. If you’ve configured execution_timeout_ms on your endpoint, the client timeout uses that value instead. For jobs that take longer than 60 seconds, set execution_timeout_ms to prevent /runsync requests from timing out.
Use /run for long-running jobs that you’ll check later. Use /runsync for quick jobs where you want immediate results (with timeout protection).
A single load-balanced endpoint can serve multiple routes:
from runpod_flash import Endpointapi = Endpoint(name="api-server", cpu="cpu5c-4-8", workers=(1, 5))# All these routes share one endpoint URL@api.post("/generate")async def generate_text(prompt: str): ...@api.post("/translate")async def translate_text(text: str): ...@api.get("/health")async def health_check(): ...
# All use the same base URL with different pathscurl -X POST https://abc123xyz.api.runpod.ai/generate -H "..." -d '{...}'curl -X POST https://abc123xyz.api.runpod.ai/translate -H "..." -d '{...}'curl -X GET https://abc123xyz.api.runpod.ai/health -H "..."