Documentation Index
Fetch the complete documentation index at: https://docs.raydocs.com/llms.txt
Use this file to discover all available pages before exploring further.
This guide explains how to upload documents to Raydocs for extraction using the API. The process has three steps: get a signed URL, upload the file to temporary storage, then create or reuse a workspace-scoped document and attach it to an extraction session.
Upload Flow Overview
Step 1: Get a Signed Upload URL
First, request a signed URL from the Vapor storage endpoint. This URL allows you to upload directly to S3 without routing the file through the API server.
POST /vapor/signed-storage-url HTTP/1.1
Host: api.raydocs.com
Authorization: Bearer <token>
Content-Type: application/json
{
"visibility": "private"
}
Request Parameters
| Parameter | Type | Required | Description |
|---|
visibility | string | No | Set to private (default) |
The content_type parameter is optional and defaults to
application/octet-stream. You don’t need to detect or specify file types.
Response
{
"uuid": "abc123-def456-ghi789",
"key": "tmp/abc123-def456-ghi789",
"url": "https://s3.amazonaws.com/bucket/tmp/abc123...?X-Amz-Signature=...",
"headers": {
"Content-Type": "application/octet-stream"
}
}
The signed URL is valid for a limited time (typically 5 minutes). Upload
your file promptly after receiving it.
Step 2: Upload to S3
Use the signed URL to upload your file directly to S3. Include the headers returned in the previous step.
curl -X PUT "${SIGNED_URL}" \
--data-binary @document.pdf
const response = await fetch(signedUrl, {
method: 'PUT',
headers: headers, // Use headers from Step 1 response
body: fileBuffer
});
if (response.ok) {
console.log('Upload successful');
}
import requests
with open('document.pdf', 'rb') as f:
response = requests.put(
signed_url,
headers=upload_data.get('headers', {}), # Use headers from Step 1
data=f
)
if response.status_code == 200:
print('Upload successful')
Step 3: Create or Reuse Document and Attach to Session
After the file is uploaded to S3, create (or reuse) workspace-scoped document(s) and attach them to your extraction session. Use the uploaded key(s) from Step 1.
POST /extractions/sessions/{sessionId}/documents/upload HTTP/1.1
Host: api.raydocs.com
Authorization: Bearer <token>
Content-Type: application/json
{
"keys": ["tmp/abc123-def456-ghi789"]
}
Request Parameters
| Parameter | Type | Required | Description |
|---|
keys | array[string] | Yes | Upload key list returned from the signed URL request |
Response
{
"data": [
{
"id": "880e8400-e29b-41d4-a716-446655440000",
"workspace_id": "660e8400-e29b-41d4-a716-446655440000",
"filename": "invoice_001.pdf",
"sha256": "6de7f6f5894c9f3fd1f6f8a4d1b3115d0d9b4b19d7a8a661f9fe90f9c2d80c3b",
"status": "uploaded",
"created_at": "2024-01-15T10:30:00Z"
}
]
}
Upload/import is storage-only. Parsing is requested explicitly (reparse endpoint) or at extraction run time when required.
Deduplication is content-based at workspace scope. If two uploads have the same bytes, Raydocs reuses the same document record even when filenames differ.
Complete Example
Here’s a complete example in JavaScript:
async function uploadDocument(sessionId, file, apiToken) {
// Step 1: Get signed URL
const signedUrlResponse = await fetch(
"https://api.raydocs.com/vapor/signed-storage-url",
{
method: "POST",
headers: {
Authorization: `Bearer ${apiToken}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
visibility: "private",
}),
}
);
const { url, key, headers } = await signedUrlResponse.json();
// Step 2: Upload to S3
await fetch(url, {
method: "PUT",
headers: headers,
body: file,
});
// Step 3: Associate with session
const documentResponse = await fetch(
`https://api.raydocs.com/extractions/sessions/${sessionId}/documents/upload`,
{
method: "POST",
headers: {
Authorization: `Bearer ${apiToken}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
keys: [key],
}),
}
);
return documentResponse.json();
}
PDF, images (PNG, JPEG, TIFF), and Office documents (DOCX, PPTX) are supported.
Alternative: Workspace-first Flow
To create documents in your workspace first (and optionally attach to sessions later), use the workspace document endpoint:
POST /workspaces/{workspaceId}/documents HTTP/1.1
Host: api.raydocs.com
Authorization: Bearer <token>
Content-Type: application/json
{
"key": "tmp/abc123-def456-ghi789",
"filename": "invoice.pdf"
}
See Create Workspace Document for full details. You can also import from URL without using signed URLs.
Monitoring Processing Status
After uploading, poll the document endpoint to check processing status:
GET /workspaces/{workspaceId}/documents/{documentId} HTTP/1.1
Host: api.raydocs.com
Authorization: Bearer <token>
Or list all documents in the workspace:
GET /workspaces/{workspaceId}/documents HTTP/1.1
Host: api.raydocs.com
Authorization: Bearer <token>
For session-scoped listing:
GET /extractions/sessions/{sessionId}/documents HTTP/1.1
Host: api.raydocs.com
Authorization: Bearer <token>
Wait for all documents to reach processed status before running an extraction.
If you trigger a run early, the API can return 409 with status: parsing_pending, and the manual run will auto-resume once required parsing artifacts are ready.
Error Handling
Common Upload Errors
| Error | Cause | Solution |
|---|
403 Forbidden on S3 | Signed URL expired | Request a new signed URL |
413 Payload Too Large | File exceeds size limit | Compress or split the document |
422 Unprocessable Entity | Invalid key | Ensure you’re using the key from Step 1 |
Processing Failures
If a document’s status becomes failed:
- Check the document’s error message via the GET endpoint
- Verify the file is a valid, non-corrupted document
- Re-upload if the file was damaged during transfer