Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.raydocs.com/llms.txt

Use this file to discover all available pages before exploring further.

This guide explains how to upload documents to Raydocs for extraction using the API. The process has three steps: get a signed URL, upload the file to temporary storage, then create or reuse a workspace-scoped document and attach it to an extraction session.
API reference: Session upload (attach to existing session) or Create Workspace Document (workspace-first, optional session attach).

Upload Flow Overview

Step 1: Get a Signed Upload URL

First, request a signed URL from the Vapor storage endpoint. This URL allows you to upload directly to S3 without routing the file through the API server.
POST /vapor/signed-storage-url HTTP/1.1
Host: api.raydocs.com
Authorization: Bearer <token>
Content-Type: application/json

{
  "visibility": "private"
}

Request Parameters

ParameterTypeRequiredDescription
visibilitystringNoSet to private (default)
The content_type parameter is optional and defaults to application/octet-stream. You don’t need to detect or specify file types.

Response

{
    "uuid": "abc123-def456-ghi789",
    "key": "tmp/abc123-def456-ghi789",
    "url": "https://s3.amazonaws.com/bucket/tmp/abc123...?X-Amz-Signature=...",
    "headers": {
        "Content-Type": "application/octet-stream"
    }
}
The signed URL is valid for a limited time (typically 5 minutes). Upload your file promptly after receiving it.

Step 2: Upload to S3

Use the signed URL to upload your file directly to S3. Include the headers returned in the previous step.
curl -X PUT "${SIGNED_URL}" \
  --data-binary @document.pdf

Step 3: Create or Reuse Document and Attach to Session

After the file is uploaded to S3, create (or reuse) workspace-scoped document(s) and attach them to your extraction session. Use the uploaded key(s) from Step 1.
POST /extractions/sessions/{sessionId}/documents/upload HTTP/1.1
Host: api.raydocs.com
Authorization: Bearer <token>
Content-Type: application/json

{
  "keys": ["tmp/abc123-def456-ghi789"]
}

Request Parameters

ParameterTypeRequiredDescription
keysarray[string]YesUpload key list returned from the signed URL request

Response

{
  "data": [
    {
      "id": "880e8400-e29b-41d4-a716-446655440000",
      "workspace_id": "660e8400-e29b-41d4-a716-446655440000",
      "filename": "invoice_001.pdf",
      "sha256": "6de7f6f5894c9f3fd1f6f8a4d1b3115d0d9b4b19d7a8a661f9fe90f9c2d80c3b",
      "status": "uploaded",
      "created_at": "2024-01-15T10:30:00Z"
    }
  ]
}
Upload/import is storage-only. Parsing is requested explicitly (reparse endpoint) or at extraction run time when required.
Deduplication is content-based at workspace scope. If two uploads have the same bytes, Raydocs reuses the same document record even when filenames differ.

Complete Example

Here’s a complete example in JavaScript:
async function uploadDocument(sessionId, file, apiToken) {
    // Step 1: Get signed URL
    const signedUrlResponse = await fetch(
        "https://api.raydocs.com/vapor/signed-storage-url",
        {
            method: "POST",
            headers: {
                Authorization: `Bearer ${apiToken}`,
                "Content-Type": "application/json",
            },
            body: JSON.stringify({
                visibility: "private",
            }),
        }
    );

    const { url, key, headers } = await signedUrlResponse.json();

    // Step 2: Upload to S3
    await fetch(url, {
        method: "PUT",
        headers: headers,
        body: file,
    });

    // Step 3: Associate with session
    const documentResponse = await fetch(
        `https://api.raydocs.com/extractions/sessions/${sessionId}/documents/upload`,
        {
            method: "POST",
            headers: {
                Authorization: `Bearer ${apiToken}`,
                "Content-Type": "application/json",
            },
            body: JSON.stringify({
                keys: [key],
            }),
        }
    );

    return documentResponse.json();
}

Supported File Formats

PDF, images (PNG, JPEG, TIFF), and Office documents (DOCX, PPTX) are supported.

Alternative: Workspace-first Flow

To create documents in your workspace first (and optionally attach to sessions later), use the workspace document endpoint:
POST /workspaces/{workspaceId}/documents HTTP/1.1
Host: api.raydocs.com
Authorization: Bearer <token>
Content-Type: application/json

{
  "key": "tmp/abc123-def456-ghi789",
  "filename": "invoice.pdf"
}
See Create Workspace Document for full details. You can also import from URL without using signed URLs.

Monitoring Processing Status

After uploading, poll the document endpoint to check processing status:
GET /workspaces/{workspaceId}/documents/{documentId} HTTP/1.1
Host: api.raydocs.com
Authorization: Bearer <token>
Or list all documents in the workspace:
GET /workspaces/{workspaceId}/documents HTTP/1.1
Host: api.raydocs.com
Authorization: Bearer <token>
For session-scoped listing:
GET /extractions/sessions/{sessionId}/documents HTTP/1.1
Host: api.raydocs.com
Authorization: Bearer <token>
Wait for all documents to reach processed status before running an extraction. If you trigger a run early, the API can return 409 with status: parsing_pending, and the manual run will auto-resume once required parsing artifacts are ready.

Error Handling

Common Upload Errors

ErrorCauseSolution
403 Forbidden on S3Signed URL expiredRequest a new signed URL
413 Payload Too LargeFile exceeds size limitCompress or split the document
422 Unprocessable EntityInvalid keyEnsure you’re using the key from Step 1

Processing Failures

If a document’s status becomes failed:
  1. Check the document’s error message via the GET endpoint
  2. Verify the file is a valid, non-corrupted document
  3. Re-upload if the file was damaged during transfer