Skip to main content
This guide shows the fastest way to extract data from documents using an existing template. Perfect for getting started quickly.
This example assumes you already have a workspace and template set up. For creating templates from scratch, see the Full Workflow guide.

Prerequisites

  • A Raydocs API token with sessions-write ability
  • An existing extraction template ID
  • Documents to process (PDF, PNG, JPG)

The 3-Step Process

1

Upload Files

Upload your documents to temporary storage using signed URLs
2

Create Sessions

Create extraction sessions with auto_extract: true to start processing automatically
3

Get Results

Poll for completion and retrieve your structured data

Complete Example

import time
from raydocs_client import RaydocsClient

# Initialize client
client = RaydocsClient("your_api_token")

# Your template ID (from Raydocs dashboard or API)
TEMPLATE_ID = "550e8400-e29b-41d4-a716-446655440000"

# ─────────────────────────────────────────────────────────────
# Step 1: Upload your documents
# ─────────────────────────────────────────────────────────────
documents = ["invoice1.pdf", "invoice2.pdf", "receipt.pdf"]

file_keys = []
for doc in documents:
    print(f"Uploading {doc}...")
    key = client.upload_file(doc)
    file_keys.append(key)

print(f"✓ Uploaded {len(file_keys)} documents")

# ─────────────────────────────────────────────────────────────
# Step 2: Create sessions with auto-extract enabled
# ─────────────────────────────────────────────────────────────
sessions = client.batch_create_sessions(
    template_id=TEMPLATE_ID,
    file_keys=file_keys,
    auto_extract=True  # Extraction starts automatically!
)

print(f"✓ Created {len(sessions)} extraction sessions")

# ─────────────────────────────────────────────────────────────
# Step 3: Poll for results and retrieve extracted data
# ─────────────────────────────────────────────────────────────
for i, session in enumerate(sessions):
    print(f"\nProcessing {documents[i]}...")
    
    # Poll until extraction completes
    while True:
        results = client.get_results(session['id'])
        
        if results:
            result = results[0]
            if result['status'] == 'completed':
                # Get full result with extracted data
                full_result = client.get_result(result['id'])
                print(f"✅ Extraction complete!")
                print(f"   Extracted data: {full_result['data']}")
                break
            elif result['status'] == 'failed':
                print(f"❌ Extraction failed")
                break
        
        time.sleep(5)  # Wait 5 seconds before polling again

Understanding the Response

When extraction completes, you get structured data matching your template schema:
{
  "id": "result-uuid-here",
  "status": "completed",
  "data": {
    "invoice_header": {
      "invoice_number": "INV-2024-001",
      "invoice_date": "2024-01-15",
      "total_amount": 1250.00,
      "currency": "USD"
    },
    "vendor_info": {
      "vendor_name": "Acme Corp",
      "vendor_address": "123 Business St, City, ST 12345"
    },
    "line_items": {
      "items": [
        {
          "description": "Consulting Services",
          "quantity": 10,
          "unit_price": 100.00,
          "total": 1000.00
        },
        {
          "description": "Travel Expenses",
          "quantity": 1,
          "unit_price": 250.00,
          "total": 250.00
        }
      ]
    }
  },
  "reasoning": {
    "invoice_header": {
      "invoice_number": {
        "reasoning": "Found 'Invoice #: INV-2024-001' in header section on page 1",
        "confidence": 0.95
      }
    }
  }
}

Error Handling

If you hit rate limits, implement exponential backoff:
import time

def upload_with_retry(client, file_path, max_retries=3):
    for attempt in range(max_retries):
        try:
            return client.upload_file(file_path)
        except requests.HTTPError as e:
            if e.response.status_code == 429:
                wait = 2 ** attempt * 10  # 10s, 20s, 40s
                print(f"Rate limited, waiting {wait}s...")
                time.sleep(wait)
            else:
                raise
    raise Exception("Max retries exceeded")
Check the result status and handle failures gracefully:
results = client.get_results(session_id)
if results and results[0]['status'] == 'failed':
    # Log error, retry, or notify
    print(f"Extraction failed for session {session_id}")

Next Steps