Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.raydocs.com/llms.txt

Use this file to discover all available pages before exploring further.

This guide shows the fastest way to extract data from documents using an existing template. Perfect for getting started quickly.
This example assumes you already have a workspace and template set up. For creating templates from scratch, see the Full Workflow guide.
Building with an AI assistant? Use the Integrate Raydocs with AI ✨ guide to provide full documentation context.

Prerequisites

  • A Raydocs API token with sessions-write ability
  • An existing extraction template ID
  • Documents to process (PDF, PNG, JPG)

The 3-Step Process

1

Upload Files

Upload your documents to temporary storage using signed URLs
2

Create Sessions

Create extraction sessions with auto_extract: true to start processing automatically
3

Get Results

Poll for completion and retrieve your structured data

Complete Example

import time
from raydocs_client import RaydocsClient

# Initialize client
client = RaydocsClient("your_api_token")

# Your template ID (from Raydocs dashboard or API)
TEMPLATE_ID = "550e8400-e29b-41d4-a716-446655440000"

# ─────────────────────────────────────────────────────────────
# Step 1: Upload your documents
# ─────────────────────────────────────────────────────────────
documents = ["invoice1.pdf", "invoice2.pdf", "receipt.pdf"]

file_keys = []
for doc in documents:
    print(f"Uploading {doc}...")
    key = client.upload_file(doc)
    file_keys.append(key)

print(f"✓ Uploaded {len(file_keys)} documents")

# ─────────────────────────────────────────────────────────────
# Step 2: Create sessions with auto-extract enabled
# ─────────────────────────────────────────────────────────────
sessions = client.batch_create_sessions(
    template_id=TEMPLATE_ID,
    file_keys=file_keys,
    auto_extract=True  # Extraction starts automatically!
)

print(f"✓ Created {len(sessions)} extraction sessions")

# ─────────────────────────────────────────────────────────────
# Step 3: Poll for results and retrieve extracted data
# ─────────────────────────────────────────────────────────────
for i, session in enumerate(sessions):
    print(f"\nProcessing {documents[i]}...")
    
    # Poll until extraction completes
    while True:
        results = client.get_results(session['id'])
        
        if results:
            result = results[0]
            if result['status'] == 'completed':
                # Get full result with extracted data
                full_result = client.get_result(result['id'])
                print(f"✅ Extraction complete!")
                print(f"   Extracted data: {full_result['data']}")
                break
            elif result['status'] == 'failed':
                print(f"❌ Extraction failed")
                break
        
        time.sleep(5)  # Wait 5 seconds before polling again

Understanding the Response

When extraction completes, you get structured data matching your template schema:
{
  "id": "result-uuid-here",
  "status": "completed",
  "data": {
    "invoice_header": {
      "invoice_number": "INV-2024-001",
      "invoice_date": "2024-01-15",
      "total_amount": 1250.00,
      "currency": "USD"
    },
    "vendor_info": {
      "vendor_name": "Acme Corp",
      "vendor_address": "123 Business St, City, ST 12345"
    },
    "line_items": {
      "items": [
        {
          "description": "Consulting Services",
          "quantity": 10,
          "unit_price": 100.00,
          "total": 1000.00
        },
        {
          "description": "Travel Expenses",
          "quantity": 1,
          "unit_price": 250.00,
          "total": 250.00
        }
      ]
    }
  },
  "reasoning": {
    "invoice_header": {
      "invoice_number": {
        "reasoning": "Found 'Invoice #: INV-2024-001' in header section on page 1",
        "confidence": 0.95
      }
    }
  }
}

Error Handling

If you hit rate limits, implement exponential backoff:
import time

def upload_with_retry(client, file_path, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.upload_file(file_path)
        except requests.HTTPError as e:
            if e.response.status_code == 429:
                wait = 2 ** attempt * 10  # 10s, 20s, 40s
                print(f"Rate limited, waiting {wait}s...")
                time.sleep(wait)
            else:
                raise
    raise Exception("Max retries exceeded")
Check the result status and handle failures gracefully:
results = client.get_results(session_id)
if results and results[0]['status'] == 'failed':
    # Log error, retry, or notify
    print(f"Extraction failed for session {session_id}")

Next Steps

Full Workflow

Create templates and set up complete extraction pipelines

Extraction Schema

Design powerful extraction schemas