Skip to main content
This guide explains how to upload documents to Raydocs for extraction using the API. The process has three steps: get a signed URL, upload the file to temporary storage, then create or reuse a workspace-scoped document and attach it to an extraction session.
API reference: Session upload (attach to existing session) or Create Workspace Document (workspace-first, optional session attach).

Upload Flow Overview

Step 1: Get a Signed Upload URL

First, request a signed URL from the Vapor storage endpoint. This URL allows you to upload directly to S3 without routing the file through the API server.
POST /vapor/signed-storage-url HTTP/1.1
Host: api.raydocs.com
Authorization: Bearer <token>
Content-Type: application/json

{
  "visibility": "private"
}

Request Parameters

ParameterTypeRequiredDescription
visibilitystringNoSet to private (default)
The content_type parameter is optional and defaults to application/octet-stream. You don’t need to detect or specify file types.

Response

{
    "uuid": "abc123-def456-ghi789",
    "key": "tmp/abc123-def456-ghi789",
    "url": "https://s3.amazonaws.com/bucket/tmp/abc123...?X-Amz-Signature=...",
    "headers": {
        "Content-Type": "application/octet-stream"
    }
}
The signed URL is valid for a limited time (typically 5 minutes). Upload your file promptly after receiving it.

Step 2: Upload to S3

Use the signed URL to upload your file directly to S3. Include the headers returned in the previous step.
curl -X PUT "${SIGNED_URL}" \
  --data-binary @document.pdf

Step 3: Create or Reuse Document and Attach to Session

After the file is uploaded to S3, create (or reuse) workspace-scoped document(s) and attach them to your extraction session. Use the uploaded key(s) from Step 1.
POST /extractions/sessions/{sessionId}/documents/upload HTTP/1.1
Host: api.raydocs.com
Authorization: Bearer <token>
Content-Type: application/json

{
  "keys": ["tmp/abc123-def456-ghi789"]
}

Request Parameters

ParameterTypeRequiredDescription
keysarray[string]YesUpload key list returned from the signed URL request

Response

{
  "data": [
    {
      "id": "880e8400-e29b-41d4-a716-446655440000",
      "workspace_id": "660e8400-e29b-41d4-a716-446655440000",
      "filename": "invoice_001.pdf",
      "sha256": "6de7f6f5894c9f3fd1f6f8a4d1b3115d0d9b4b19d7a8a661f9fe90f9c2d80c3b",
      "status": "uploaded",
      "created_at": "2024-01-15T10:30:00Z"
    }
  ]
}
Upload/import is storage-only. Parsing is requested explicitly (reparse endpoint) or at extraction run time when required.
Deduplication is content-based at workspace scope. If two uploads have the same bytes, Raydocs reuses the same document record even when filenames differ.

Complete Example

Here’s a complete example in JavaScript:
async function uploadDocument(sessionId, file, apiToken) {
    // Step 1: Get signed URL
    const signedUrlResponse = await fetch(
        "https://api.raydocs.com/vapor/signed-storage-url",
        {
            method: "POST",
            headers: {
                Authorization: `Bearer ${apiToken}`,
                "Content-Type": "application/json",
            },
            body: JSON.stringify({
                visibility: "private",
            }),
        }
    );

    const { url, key, headers } = await signedUrlResponse.json();

    // Step 2: Upload to S3
    await fetch(url, {
        method: "PUT",
        headers: headers,
        body: file,
    });

    // Step 3: Associate with session
    const documentResponse = await fetch(
        `https://api.raydocs.com/extractions/sessions/${sessionId}/documents/upload`,
        {
            method: "POST",
            headers: {
                Authorization: `Bearer ${apiToken}`,
                "Content-Type": "application/json",
            },
            body: JSON.stringify({
                keys: [key],
            }),
        }
    );

    return documentResponse.json();
}

Supported File Formats

PDF, images (PNG, JPEG, TIFF), and Office documents (DOCX, PPTX) are supported.

Alternative: Workspace-first Flow

To create documents in your workspace first (and optionally attach to sessions later), use the workspace document endpoint:
POST /workspaces/{workspaceId}/documents HTTP/1.1
Host: api.raydocs.com
Authorization: Bearer <token>
Content-Type: application/json

{
  "key": "tmp/abc123-def456-ghi789",
  "filename": "invoice.pdf"
}
See Create Workspace Document for full details. You can also import from URL without using signed URLs.

Monitoring Processing Status

After uploading, poll the document endpoint to check processing status:
GET /workspaces/{workspaceId}/documents/{documentId} HTTP/1.1
Host: api.raydocs.com
Authorization: Bearer <token>
Or list all documents in the workspace:
GET /workspaces/{workspaceId}/documents HTTP/1.1
Host: api.raydocs.com
Authorization: Bearer <token>
For session-scoped listing:
GET /extractions/sessions/{sessionId}/documents HTTP/1.1
Host: api.raydocs.com
Authorization: Bearer <token>
Wait for all documents to reach processed status before running an extraction. If you trigger a run early, the API can return 409 with status: parsing_pending, and the manual run will auto-resume once required parsing artifacts are ready.

Error Handling

Common Upload Errors

ErrorCauseSolution
403 Forbidden on S3Signed URL expiredRequest a new signed URL
413 Payload Too LargeFile exceeds size limitCompress or split the document
422 Unprocessable EntityInvalid keyEnsure you’re using the key from Step 1

Processing Failures

If a document’s status becomes failed:
  1. Check the document’s error message via the GET endpoint
  2. Verify the file is a valid, non-corrupted document
  3. Re-upload if the file was damaged during transfer