Async Jobs

Several urltodata operations are asynchronous — they return a jobId immediately and process work in the background. You poll for results using the job ID.

Which endpoints are async?

Endpoint	Why async
`POST /v1/youtube/video/batch`	Processes multiple videos
`POST /v1/youtube/transcript/batch`	Processes multiple transcripts
`GET /v1/transcript` (when `mode=generate`)	AI speech-to-text takes time
`POST /v1/extract`	LLM processing
`POST /v1/web/crawl`	Crawls multiple pages
`POST /v1/video/describe`	Vision AI frame analysis
`POST /v1/video/ocr`	OCR across video frames

The polling pattern

Step 1: Start the job. You get back a jobId:

curl -X POST "https://api.urltodata.ai/v1/youtube/video/batch" \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{"videoIds": ["dQw4w9WgXcQ", "9bZkp7q19f0", "kJQP7kiw5Fk"]}'

{
  "jobId": "abc123"
}

Step 2: Poll for status using the corresponding status endpoint:

curl -H "Authorization: Bearer your-api-key" \
  "https://api.urltodata.ai/v1/youtube/batch/abc123"

Step 3: Check the status field in the response:

{
  "status": "active",
  "total": 3,
  "completed": 1,
  "failed": 0,
  "results": [...]
}

Job statuses

Status	Meaning
`queued`	Job is waiting to be processed
`active`	Job is currently being processed
`completed`	Job finished successfully — results are in the response
`failed`	Job encountered an error — check the `error` field

Polling endpoints

Each async endpoint has a corresponding status endpoint:

Start job	Poll status
`POST /v1/youtube/video/batch`	`GET /v1/youtube/batch/{jobId}`
`POST /v1/youtube/transcript/batch`	`GET /v1/youtube/batch/{jobId}`
`GET /v1/transcript` (202 response)	`GET /v1/transcript/{jobId}`
`POST /v1/extract`	`GET /v1/extract/{jobId}`
`POST /v1/web/crawl`	`GET /v1/web/crawl/{jobId}`
`POST /v1/video/describe`	`GET /v1/video/describe/{jobId}`
`POST /v1/video/ocr`	`GET /v1/video/ocr/{jobId}`

Recommended polling strategy

Poll every 1-2 seconds for short jobs (single transcripts), every 5-10 seconds for batch jobs and crawls:

import time
import requests

headers = {"Authorization": "Bearer your-api-key"}

# Start a batch job
resp = requests.post(
    "https://api.urltodata.ai/v1/youtube/video/batch",
    headers=headers,
    json={"videoIds": ["dQw4w9WgXcQ", "9bZkp7q19f0"]}
)
job_id = resp.json()["jobId"]

# Poll until complete
while True:
    status = requests.get(
        f"https://api.urltodata.ai/v1/youtube/batch/{job_id}",
        headers=headers
    ).json()
    if status["status"] in ("completed", "failed"):
        break
    time.sleep(5)

print(status["results"])

Crawl pagination

Web crawl results can be large. The crawl status endpoint supports pagination via the skip parameter:

# First page of results
curl -H "Authorization: Bearer your-api-key" \
  "https://api.urltodata.ai/v1/web/crawl/abc123"

# Next page (if response includes "next" field)
curl -H "Authorization: Bearer your-api-key" \
  "https://api.urltodata.ai/v1/web/crawl/abc123?skip=10"

The next field in the response indicates the offset for the next page. When next is null, you’ve retrieved all results.

Get Started

Core Concepts

Integrations

Which endpoints are async?

The polling pattern

Job statuses

Polling endpoints

Recommended polling strategy

Get Started

Core Concepts

Integrations

Documentation Index

​Which endpoints are async?

​The polling pattern

​Job statuses

​Polling endpoints

​Recommended polling strategy

​Crawl pagination

Which endpoints are async?

The polling pattern

Job statuses

Polling endpoints

Recommended polling strategy

Crawl pagination