Documentation Index
Fetch the complete documentation index at: https://docs.urltodata.ai/llms.txt
Use this file to discover all available pages before exploring further.
Several urltodata operations are asynchronous — they return a jobId immediately and process work in the background. You poll for results using the job ID.
Which endpoints are async?
| Endpoint | Why async |
|---|
POST /v1/youtube/video/batch | Processes multiple videos |
POST /v1/youtube/transcript/batch | Processes multiple transcripts |
GET /v1/transcript (when mode=generate) | AI speech-to-text takes time |
POST /v1/extract | LLM processing |
POST /v1/web/crawl | Crawls multiple pages |
POST /v1/video/describe | Vision AI frame analysis |
POST /v1/video/ocr | OCR across video frames |
The polling pattern
Step 1: Start the job. You get back a jobId:
curl -X POST "https://api.urltodata.ai/v1/youtube/video/batch" \
-H "Authorization: Bearer your-api-key" \
-H "Content-Type: application/json" \
-d '{"videoIds": ["dQw4w9WgXcQ", "9bZkp7q19f0", "kJQP7kiw5Fk"]}'
Step 2: Poll for status using the corresponding status endpoint:
curl -H "Authorization: Bearer your-api-key" \
"https://api.urltodata.ai/v1/youtube/batch/abc123"
Step 3: Check the status field in the response:
{
"status": "active",
"total": 3,
"completed": 1,
"failed": 0,
"results": [...]
}
Job statuses
| Status | Meaning |
|---|
queued | Job is waiting to be processed |
active | Job is currently being processed |
completed | Job finished successfully — results are in the response |
failed | Job encountered an error — check the error field |
Polling endpoints
Each async endpoint has a corresponding status endpoint:
| Start job | Poll status |
|---|
POST /v1/youtube/video/batch | GET /v1/youtube/batch/{jobId} |
POST /v1/youtube/transcript/batch | GET /v1/youtube/batch/{jobId} |
GET /v1/transcript (202 response) | GET /v1/transcript/{jobId} |
POST /v1/extract | GET /v1/extract/{jobId} |
POST /v1/web/crawl | GET /v1/web/crawl/{jobId} |
POST /v1/video/describe | GET /v1/video/describe/{jobId} |
POST /v1/video/ocr | GET /v1/video/ocr/{jobId} |
Recommended polling strategy
Poll every 1-2 seconds for short jobs (single transcripts), every 5-10 seconds for batch jobs and crawls:
import time
import requests
headers = {"Authorization": "Bearer your-api-key"}
# Start a batch job
resp = requests.post(
"https://api.urltodata.ai/v1/youtube/video/batch",
headers=headers,
json={"videoIds": ["dQw4w9WgXcQ", "9bZkp7q19f0"]}
)
job_id = resp.json()["jobId"]
# Poll until complete
while True:
status = requests.get(
f"https://api.urltodata.ai/v1/youtube/batch/{job_id}",
headers=headers
).json()
if status["status"] in ("completed", "failed"):
break
time.sleep(5)
print(status["results"])
Web crawl results can be large. The crawl status endpoint supports pagination via the skip parameter:
# First page of results
curl -H "Authorization: Bearer your-api-key" \
"https://api.urltodata.ai/v1/web/crawl/abc123"
# Next page (if response includes "next" field)
curl -H "Authorization: Bearer your-api-key" \
"https://api.urltodata.ai/v1/web/crawl/abc123?skip=10"
The next field in the response indicates the offset for the next page. When next is null, you’ve retrieved all results.