This guide explains how to upload documents to Raydocs for extraction using the API. The process has three steps: get a signed URL, upload the file to temporary storage, then create or reuse a workspace-scoped document and attach it to an extraction session.
Upload Flow Overview
Step 1: Get a Signed Upload URL
First, request a signed URL from the Vapor storage endpoint. This URL allows you to upload directly to S3 without routing the file through the API server.
POST /vapor/signed-storage-url HTTP/1.1
Host: api.raydocs.com
Authorization: Bearer <token>
Content-Type: application/json
{
"visibility": "private"
}
Request Parameters
| Parameter | Type | Required | Description |
|---|
visibility | string | No | Set to private (default) |
The content_type parameter is optional and defaults to
application/octet-stream. You don’t need to detect or specify file types.
Response
{
"uuid": "abc123-def456-ghi789",
"key": "tmp/abc123-def456-ghi789",
"url": "https://s3.amazonaws.com/bucket/tmp/abc123...?X-Amz-Signature=...",
"headers": {
"Content-Type": "application/octet-stream"
}
}
The signed URL is valid for a limited time (typically 5 minutes). Upload
your file promptly after receiving it.
Step 2: Upload to S3
Use the signed URL to upload your file directly to S3. Include the headers returned in the previous step.
curl -X PUT "${SIGNED_URL}" \
--data-binary @document.pdf
const response = await fetch(signedUrl, {
method: 'PUT',
headers: headers, // Use headers from Step 1 response
body: fileBuffer
});
if (response.ok) {
console.log('Upload successful');
}
import requests
with open('document.pdf', 'rb') as f:
response = requests.put(
signed_url,
headers=upload_data.get('headers', {}), # Use headers from Step 1
data=f
)
if response.status_code == 200:
print('Upload successful')
Step 3: Create or Reuse Document and Attach to Session
After the file is uploaded to S3, create (or reuse) workspace-scoped document(s) and attach them to your extraction session. Use the uploaded key(s) from Step 1.
POST /extractions/sessions/{sessionId}/documents/upload HTTP/1.1
Host: api.raydocs.com
Authorization: Bearer <token>
Content-Type: application/json
{
"keys": ["tmp/abc123-def456-ghi789"]
}
Request Parameters
| Parameter | Type | Required | Description |
|---|
keys | array[string] | Yes | Upload key list returned from the signed URL request |
Response
{
"data": [
{
"id": "880e8400-e29b-41d4-a716-446655440000",
"workspace_id": "660e8400-e29b-41d4-a716-446655440000",
"filename": "invoice_001.pdf",
"sha256": "6de7f6f5894c9f3fd1f6f8a4d1b3115d0d9b4b19d7a8a661f9fe90f9c2d80c3b",
"status": "uploaded",
"created_at": "2024-01-15T10:30:00Z"
}
]
}
Upload/import is storage-only. Parsing is requested explicitly (reparse endpoint) or at extraction run time when required.
Deduplication is content-based at workspace scope. If two uploads have the same bytes, Raydocs reuses the same document record even when filenames differ.
Complete Example
Here’s a complete example in JavaScript:
async function uploadDocument(sessionId, file, apiToken) {
// Step 1: Get signed URL
const signedUrlResponse = await fetch(
"https://api.raydocs.com/vapor/signed-storage-url",
{
method: "POST",
headers: {
Authorization: `Bearer ${apiToken}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
visibility: "private",
}),
}
);
const { url, key, headers } = await signedUrlResponse.json();
// Step 2: Upload to S3
await fetch(url, {
method: "PUT",
headers: headers,
body: file,
});
// Step 3: Associate with session
const documentResponse = await fetch(
`https://api.raydocs.com/extractions/sessions/${sessionId}/documents/upload`,
{
method: "POST",
headers: {
Authorization: `Bearer ${apiToken}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
keys: [key],
}),
}
);
return documentResponse.json();
}
PDF, images (PNG, JPEG, TIFF), and Office documents (DOCX, PPTX) are supported.
Alternative: Workspace-first Flow
To create documents in your workspace first (and optionally attach to sessions later), use the workspace document endpoint:
POST /workspaces/{workspaceId}/documents HTTP/1.1
Host: api.raydocs.com
Authorization: Bearer <token>
Content-Type: application/json
{
"key": "tmp/abc123-def456-ghi789",
"filename": "invoice.pdf"
}
See Create Workspace Document for full details. You can also import from URL without using signed URLs.
Monitoring Processing Status
After uploading, poll the document endpoint to check processing status:
GET /workspaces/{workspaceId}/documents/{documentId} HTTP/1.1
Host: api.raydocs.com
Authorization: Bearer <token>
Or list all documents in the workspace:
GET /workspaces/{workspaceId}/documents HTTP/1.1
Host: api.raydocs.com
Authorization: Bearer <token>
For session-scoped listing:
GET /extractions/sessions/{sessionId}/documents HTTP/1.1
Host: api.raydocs.com
Authorization: Bearer <token>
Wait for all documents to reach processed status before running an extraction.
If you trigger a run early, the API can return 409 with status: parsing_pending, and the manual run will auto-resume once required parsing artifacts are ready.
Error Handling
Common Upload Errors
| Error | Cause | Solution |
|---|
403 Forbidden on S3 | Signed URL expired | Request a new signed URL |
413 Payload Too Large | File exceeds size limit | Compress or split the document |
422 Unprocessable Entity | Invalid key | Ensure you’re using the key from Step 1 |
Processing Failures
If a document’s status becomes failed:
- Check the document’s error message via the GET endpoint
- Verify the file is a valid, non-corrupted document
- Re-upload if the file was damaged during transfer