Skip to content

PL Sub-Dataset Sync (Hourly)

This workflow automatically synchronizes PromptLayer request logs into specialized sub-datasets based on metadata filters, running every hour to ensure real-time data organization and analysis capabilities.

Purpose

No business context provided yet — add a context.md to enrich this documentation.

This workflow serves as an automated data pipeline that: - Monitors PromptLayer for new request logs from a specific prompt (ID: 175569) - Routes matching requests into 9 different sub-datasets based on metadata criteria - Maintains incremental sync state to avoid duplicate processing - Handles pagination to process large volumes of data efficiently

How It Works

  1. Scheduled Trigger: Every hour, the workflow begins execution
  2. Configuration Setup: Loads sub-dataset definitions and retrieves the last sync timestamp from workflow static data
  3. Initial Search: Queries PromptLayer API for new requests since the last sync (page 1, up to 25 items)
  4. Routing Logic: Examines each request's metadata and routes matching items to appropriate sub-datasets based on predefined filters
  5. Dataset Addition: Adds matching request logs to their corresponding PromptLayer sub-datasets via API calls
  6. Pagination Handling: If more than 25 results exist, generates additional page requests (up to 10 pages per run)
  7. Remaining Pages Processing: Fetches and processes additional pages with the same routing logic
  8. State Management: Updates the last sync timestamp to prevent reprocessing in future runs

Workflow Diagram

graph TD
    A[Every Hour] --> B[Get Config & Last Sync]
    B --> C[Search New Requests Page 1]
    B --> D[Generate Remaining Pages]
    C --> E[Route to Sub-Datasets]
    E --> F[Add to Sub-Dataset]
    D --> G[Search Remaining Pages]
    G --> H[Route Remaining to Sub-Datasets]
    H --> I[Add Remaining to Sub-Dataset]

Trigger

  • Type: Schedule Trigger
  • Frequency: Every 1 hour
  • Status: Currently inactive (active: false)

Nodes Used

Node Type Node Name Purpose
Schedule Trigger Every Hour Initiates workflow execution hourly
Code Get Config & Last Sync Defines sub-dataset configurations and retrieves last sync state
HTTP Request Search New Requests (Page 1) Queries PromptLayer API for new request logs
Code Route to Sub-Datasets Filters and routes requests based on metadata criteria
HTTP Request Add to Sub-Dataset Adds matching requests to PromptLayer sub-datasets
Code Generate Remaining Pages Creates pagination requests for additional data
HTTP Request Search Remaining Pages Fetches additional pages of request logs
Code Route Remaining to Sub-Datasets Processes remaining pages with same routing logic
HTTP Request Add Remaining to Sub-Dataset Adds remaining matches to sub-datasets

External Services & Credentials Required

PromptLayer API

  • Service: PromptLayer (api.promptlayer.com)
  • Authentication: API Key via X-API-KEY header
  • Required Permissions:
    • Read access to request logs and search endpoints
    • Write access to dataset version management
  • Endpoints Used:
    • POST /api/public/v2/requests/search - Search request logs
    • POST /api/public/v2/dataset-versions/add-request-log - Add logs to datasets

Note: The API key is currently hardcoded in the workflow (pl_80a83a0db8150339b213693376a60afb) and should be moved to environment variables for security.

Environment Variables

Currently, the workflow uses hardcoded values that should be converted to environment variables:

  • PROMPTLAYER_API_KEY: PromptLayer API authentication key
  • PROMPT_ID: Target prompt ID to monitor (currently: 175569)

Data Flow

Input

  • Trigger: Time-based (hourly schedule)
  • Static Data: Last sync timestamp from previous runs
  • Configuration: 9 sub-dataset definitions with filtering criteria

Processing

The workflow processes requests through metadata-based filtering:

Sub-Dataset Filters: - cea-alerts-v3: cea_alert_fired = "true" (exact match) - credit-module-v3: current_stage = "phase_4a_credit" (exact match) - phase-4b-active-v3: current_stage = "phase_4b" (exact match) - daytime-engagement-v3: session_mode = "daytime_engagement" (exact match) - evening-sessions-v3: session_mode = "evening_session" (exact match) - onboarding-step-2-2-v3: current_stage = "step_2.2" (exact match) - onboarding-step-2-7-v3: current_stage = "step_2.7" (exact match) - onboarding-day-1-v3: current_stage starts with "day_1" (prefix match) - onboarding-day-3-v3: current_stage starts with "day_3" (prefix match)

Output

  • PromptLayer Sub-Datasets: Request logs added to appropriate datasets
  • Static Data: Updated last sync timestamp
  • Execution Logs: Processing statistics and routing information

Error Handling

The workflow includes basic error handling through: - Rate Limiting: Batched HTTP requests (2 requests per batch, 1-7 second intervals) - Pagination Limits: Maximum 10 pages per run (250 items) to prevent timeout - Graceful Degradation: Empty result handling and validation checks - State Recovery: Incremental sync prevents data loss on failures

No explicit error nodes or retry mechanisms are implemented.

Known Limitations

  • Security: API key is hardcoded in the workflow code
  • Pagination: Limited to 250 items per run (10 pages × 25 items)
  • Rate Limits: No explicit rate limit handling beyond batching
  • Error Recovery: No retry logic for failed API calls
  • Monitoring: No alerting for sync failures or data quality issues

No related workflows identified in the current configuration.

Setup Instructions

  1. Import Workflow

    1
    # Import the workflow JSON into your n8n instance
    

  2. Configure Credentials

    • Create a PromptLayer API credential in n8n
    • Replace the hardcoded API key with the credential reference
    • Update the prompt ID if monitoring a different prompt
  3. Environment Setup

    • Set up environment variables for API key and prompt ID
    • Configure appropriate PromptLayer sub-dataset group IDs
  4. Activate Workflow

    • Enable the workflow (currently set to inactive)
    • Verify the schedule trigger configuration
    • Test with a manual execution first
  5. Monitor Initial Run

    • Check workflow execution logs
    • Verify sub-dataset population in PromptLayer
    • Confirm static data state management is working
  6. Production Considerations

    • Set up monitoring and alerting for failed executions
    • Consider implementing retry logic for API failures
    • Review and adjust pagination limits based on data volume