Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.urltodata.ai/llms.txt

Use this file to discover all available pages before exploring further.

Several urltodata operations are asynchronous — they return a jobId immediately and process work in the background. You poll for results using the job ID.

Which endpoints are async?

EndpointWhy async
POST /v1/youtube/video/batchProcesses multiple videos
POST /v1/youtube/transcript/batchProcesses multiple transcripts
GET /v1/transcript (when mode=generate)AI speech-to-text takes time
POST /v1/extractLLM processing
POST /v1/web/crawlCrawls multiple pages
POST /v1/video/describeVision AI frame analysis
POST /v1/video/ocrOCR across video frames

The polling pattern

Step 1: Start the job. You get back a jobId:
curl -X POST "https://api.urltodata.ai/v1/youtube/video/batch" \
  -H "Authorization: Bearer your-api-key" \
  -H "Content-Type: application/json" \
  -d '{"videoIds": ["dQw4w9WgXcQ", "9bZkp7q19f0", "kJQP7kiw5Fk"]}'
{
  "jobId": "abc123"
}
Step 2: Poll for status using the corresponding status endpoint:
curl -H "Authorization: Bearer your-api-key" \
  "https://api.urltodata.ai/v1/youtube/batch/abc123"
Step 3: Check the status field in the response:
{
  "status": "active",
  "total": 3,
  "completed": 1,
  "failed": 0,
  "results": [...]
}

Job statuses

StatusMeaning
queuedJob is waiting to be processed
activeJob is currently being processed
completedJob finished successfully — results are in the response
failedJob encountered an error — check the error field

Polling endpoints

Each async endpoint has a corresponding status endpoint:
Start jobPoll status
POST /v1/youtube/video/batchGET /v1/youtube/batch/{jobId}
POST /v1/youtube/transcript/batchGET /v1/youtube/batch/{jobId}
GET /v1/transcript (202 response)GET /v1/transcript/{jobId}
POST /v1/extractGET /v1/extract/{jobId}
POST /v1/web/crawlGET /v1/web/crawl/{jobId}
POST /v1/video/describeGET /v1/video/describe/{jobId}
POST /v1/video/ocrGET /v1/video/ocr/{jobId}
Poll every 1-2 seconds for short jobs (single transcripts), every 5-10 seconds for batch jobs and crawls:
import time
import requests

headers = {"Authorization": "Bearer your-api-key"}

# Start a batch job
resp = requests.post(
    "https://api.urltodata.ai/v1/youtube/video/batch",
    headers=headers,
    json={"videoIds": ["dQw4w9WgXcQ", "9bZkp7q19f0"]}
)
job_id = resp.json()["jobId"]

# Poll until complete
while True:
    status = requests.get(
        f"https://api.urltodata.ai/v1/youtube/batch/{job_id}",
        headers=headers
    ).json()
    if status["status"] in ("completed", "failed"):
        break
    time.sleep(5)

print(status["results"])

Crawl pagination

Web crawl results can be large. The crawl status endpoint supports pagination via the skip parameter:
# First page of results
curl -H "Authorization: Bearer your-api-key" \
  "https://api.urltodata.ai/v1/web/crawl/abc123"

# Next page (if response includes "next" field)
curl -H "Authorization: Bearer your-api-key" \
  "https://api.urltodata.ai/v1/web/crawl/abc123?skip=10"
The next field in the response indicates the offset for the next page. When next is null, you’ve retrieved all results.