PL Sub-Dataset Sync (Hourly)¶

This workflow automatically synchronizes PromptLayer request logs into specialized sub-datasets based on metadata filters, running every hour to ensure real-time data organization and analysis capabilities.

Purpose¶

No business context provided yet — add a context.md to enrich this documentation.

This workflow serves as an automated data pipeline that: - Monitors PromptLayer for new request logs from a specific prompt (ID: 175569) - Routes matching requests into 9 different sub-datasets based on metadata criteria - Maintains incremental sync state to avoid duplicate processing - Handles pagination to process large volumes of data efficiently

How It Works¶

Scheduled Trigger: Every hour, the workflow begins execution
Configuration Setup: Loads sub-dataset definitions and retrieves the last sync timestamp from workflow static data
Initial Search: Queries PromptLayer API for new requests since the last sync (page 1, up to 25 items)
Routing Logic: Examines each request's metadata and routes matching items to appropriate sub-datasets based on predefined filters
Dataset Addition: Adds matching request logs to their corresponding PromptLayer sub-datasets via API calls
Pagination Handling: If more than 25 results exist, generates additional page requests (up to 10 pages per run)
Remaining Pages Processing: Fetches and processes additional pages with the same routing logic
State Management: Updates the last sync timestamp to prevent reprocessing in future runs

Workflow Diagram¶

graph TD
    A[Every Hour] --> B[Get Config & Last Sync]
    B --> C[Search New Requests Page 1]
    B --> D[Generate Remaining Pages]
    C --> E[Route to Sub-Datasets]
    E --> F[Add to Sub-Dataset]
    D --> G[Search Remaining Pages]
    G --> H[Route Remaining to Sub-Datasets]
    H --> I[Add Remaining to Sub-Dataset]

Trigger¶

Type: Schedule Trigger
Frequency: Every 1 hour
Status: Currently inactive (active: false)

Nodes Used¶

Node Type	Node Name	Purpose
Schedule Trigger	Every Hour	Initiates workflow execution hourly
Code	Get Config & Last Sync	Defines sub-dataset configurations and retrieves last sync state
HTTP Request	Search New Requests (Page 1)	Queries PromptLayer API for new request logs
Code	Route to Sub-Datasets	Filters and routes requests based on metadata criteria
HTTP Request	Add to Sub-Dataset	Adds matching requests to PromptLayer sub-datasets
Code	Generate Remaining Pages	Creates pagination requests for additional data
HTTP Request	Search Remaining Pages	Fetches additional pages of request logs
Code	Route Remaining to Sub-Datasets	Processes remaining pages with same routing logic
HTTP Request	Add Remaining to Sub-Dataset	Adds remaining matches to sub-datasets

External Services & Credentials Required¶

PromptLayer API¶

Service: PromptLayer (api.promptlayer.com)
Authentication: API Key via X-API-KEY header
Required Permissions:
- Read access to request logs and search endpoints
- Write access to dataset version management
Endpoints Used:
- POST /api/public/v2/requests/search - Search request logs
- POST /api/public/v2/dataset-versions/add-request-log - Add logs to datasets

Note: The API key is currently hardcoded in the workflow (pl_80a83a0db8150339b213693376a60afb) and should be moved to environment variables for security.

Environment Variables¶

Currently, the workflow uses hardcoded values that should be converted to environment variables:

PROMPTLAYER_API_KEY: PromptLayer API authentication key
PROMPT_ID: Target prompt ID to monitor (currently: 175569)

Data Flow¶

Input¶

Trigger: Time-based (hourly schedule)
Static Data: Last sync timestamp from previous runs
Configuration: 9 sub-dataset definitions with filtering criteria

Processing¶

The workflow processes requests through metadata-based filtering:

Sub-Dataset Filters: - cea-alerts-v3: cea_alert_fired = "true" (exact match) - credit-module-v3: current_stage = "phase_4a_credit" (exact match) - phase-4b-active-v3: current_stage = "phase_4b" (exact match) - daytime-engagement-v3: session_mode = "daytime_engagement" (exact match) - evening-sessions-v3: session_mode = "evening_session" (exact match) - onboarding-step-2-2-v3: current_stage = "step_2.2" (exact match) - onboarding-step-2-7-v3: current_stage = "step_2.7" (exact match) - onboarding-day-1-v3: current_stage starts with "day_1" (prefix match) - onboarding-day-3-v3: current_stage starts with "day_3" (prefix match)

Output¶

PromptLayer Sub-Datasets: Request logs added to appropriate datasets
Static Data: Updated last sync timestamp
Execution Logs: Processing statistics and routing information

Error Handling¶

The workflow includes basic error handling through: - Rate Limiting: Batched HTTP requests (2 requests per batch, 1-7 second intervals) - Pagination Limits: Maximum 10 pages per run (250 items) to prevent timeout - Graceful Degradation: Empty result handling and validation checks - State Recovery: Incremental sync prevents data loss on failures

No explicit error nodes or retry mechanisms are implemented.

Known Limitations¶

Security: API key is hardcoded in the workflow code
Pagination: Limited to 250 items per run (10 pages × 25 items)
Rate Limits: No explicit rate limit handling beyond batching
Error Recovery: No retry logic for failed API calls
Monitoring: No alerting for sync failures or data quality issues

No related workflows identified in the current configuration.

Setup Instructions¶

Import Workflow

1	`# Import the workflow JSON into your n8n instance`

Configure Credentials
- Create a PromptLayer API credential in n8n
- Replace the hardcoded API key with the credential reference
- Update the prompt ID if monitoring a different prompt
Environment Setup
- Set up environment variables for API key and prompt ID
- Configure appropriate PromptLayer sub-dataset group IDs
Activate Workflow
- Enable the workflow (currently set to inactive)
- Verify the schedule trigger configuration
- Test with a manual execution first
Monitor Initial Run
- Check workflow execution logs
- Verify sub-dataset population in PromptLayer
- Confirm static data state management is working
Production Considerations
- Set up monitoring and alerting for failed executions
- Consider implementing retry logic for API failures
- Review and adjust pagination limits based on data volume