PL Sub-Dataset Sync (Hourly)¶
This workflow automatically synchronizes PromptLayer request logs into specialized sub-datasets based on metadata filters, running every hour to ensure real-time data organization and analysis capabilities.
Purpose¶
No business context provided yet — add a context.md to enrich this documentation.
This workflow serves as an automated data pipeline that: - Monitors PromptLayer for new request logs from a specific prompt (ID: 175569) - Routes matching requests into 9 different sub-datasets based on metadata criteria - Maintains incremental sync state to avoid duplicate processing - Handles pagination to process large volumes of data efficiently
How It Works¶
- Scheduled Trigger: Every hour, the workflow begins execution
- Configuration Setup: Loads sub-dataset definitions and retrieves the last sync timestamp from workflow static data
- Initial Search: Queries PromptLayer API for new requests since the last sync (page 1, up to 25 items)
- Routing Logic: Examines each request's metadata and routes matching items to appropriate sub-datasets based on predefined filters
- Dataset Addition: Adds matching request logs to their corresponding PromptLayer sub-datasets via API calls
- Pagination Handling: If more than 25 results exist, generates additional page requests (up to 10 pages per run)
- Remaining Pages Processing: Fetches and processes additional pages with the same routing logic
- State Management: Updates the last sync timestamp to prevent reprocessing in future runs
Workflow Diagram¶
graph TD
A[Every Hour] --> B[Get Config & Last Sync]
B --> C[Search New Requests Page 1]
B --> D[Generate Remaining Pages]
C --> E[Route to Sub-Datasets]
E --> F[Add to Sub-Dataset]
D --> G[Search Remaining Pages]
G --> H[Route Remaining to Sub-Datasets]
H --> I[Add Remaining to Sub-Dataset]
Trigger¶
- Type: Schedule Trigger
- Frequency: Every 1 hour
- Status: Currently inactive (active: false)
Nodes Used¶
| Node Type | Node Name | Purpose |
|---|---|---|
| Schedule Trigger | Every Hour | Initiates workflow execution hourly |
| Code | Get Config & Last Sync | Defines sub-dataset configurations and retrieves last sync state |
| HTTP Request | Search New Requests (Page 1) | Queries PromptLayer API for new request logs |
| Code | Route to Sub-Datasets | Filters and routes requests based on metadata criteria |
| HTTP Request | Add to Sub-Dataset | Adds matching requests to PromptLayer sub-datasets |
| Code | Generate Remaining Pages | Creates pagination requests for additional data |
| HTTP Request | Search Remaining Pages | Fetches additional pages of request logs |
| Code | Route Remaining to Sub-Datasets | Processes remaining pages with same routing logic |
| HTTP Request | Add Remaining to Sub-Dataset | Adds remaining matches to sub-datasets |
External Services & Credentials Required¶
PromptLayer API¶
- Service: PromptLayer (api.promptlayer.com)
- Authentication: API Key via X-API-KEY header
- Required Permissions:
- Read access to request logs and search endpoints
- Write access to dataset version management
- Endpoints Used:
POST /api/public/v2/requests/search- Search request logsPOST /api/public/v2/dataset-versions/add-request-log- Add logs to datasets
Note: The API key is currently hardcoded in the workflow (pl_80a83a0db8150339b213693376a60afb) and should be moved to environment variables for security.
Environment Variables¶
Currently, the workflow uses hardcoded values that should be converted to environment variables:
PROMPTLAYER_API_KEY: PromptLayer API authentication keyPROMPT_ID: Target prompt ID to monitor (currently: 175569)
Data Flow¶
Input¶
- Trigger: Time-based (hourly schedule)
- Static Data: Last sync timestamp from previous runs
- Configuration: 9 sub-dataset definitions with filtering criteria
Processing¶
The workflow processes requests through metadata-based filtering:
Sub-Dataset Filters:
- cea-alerts-v3: cea_alert_fired = "true" (exact match)
- credit-module-v3: current_stage = "phase_4a_credit" (exact match)
- phase-4b-active-v3: current_stage = "phase_4b" (exact match)
- daytime-engagement-v3: session_mode = "daytime_engagement" (exact match)
- evening-sessions-v3: session_mode = "evening_session" (exact match)
- onboarding-step-2-2-v3: current_stage = "step_2.2" (exact match)
- onboarding-step-2-7-v3: current_stage = "step_2.7" (exact match)
- onboarding-day-1-v3: current_stage starts with "day_1" (prefix match)
- onboarding-day-3-v3: current_stage starts with "day_3" (prefix match)
Output¶
- PromptLayer Sub-Datasets: Request logs added to appropriate datasets
- Static Data: Updated last sync timestamp
- Execution Logs: Processing statistics and routing information
Error Handling¶
The workflow includes basic error handling through: - Rate Limiting: Batched HTTP requests (2 requests per batch, 1-7 second intervals) - Pagination Limits: Maximum 10 pages per run (250 items) to prevent timeout - Graceful Degradation: Empty result handling and validation checks - State Recovery: Incremental sync prevents data loss on failures
No explicit error nodes or retry mechanisms are implemented.
Known Limitations¶
- Security: API key is hardcoded in the workflow code
- Pagination: Limited to 250 items per run (10 pages × 25 items)
- Rate Limits: No explicit rate limit handling beyond batching
- Error Recovery: No retry logic for failed API calls
- Monitoring: No alerting for sync failures or data quality issues
Related Workflows¶
No related workflows identified in the current configuration.
Setup Instructions¶
-
Import Workflow
1# Import the workflow JSON into your n8n instance -
Configure Credentials
- Create a PromptLayer API credential in n8n
- Replace the hardcoded API key with the credential reference
- Update the prompt ID if monitoring a different prompt
-
Environment Setup
- Set up environment variables for API key and prompt ID
- Configure appropriate PromptLayer sub-dataset group IDs
-
Activate Workflow
- Enable the workflow (currently set to inactive)
- Verify the schedule trigger configuration
- Test with a manual execution first
-
Monitor Initial Run
- Check workflow execution logs
- Verify sub-dataset population in PromptLayer
- Confirm static data state management is working
-
Production Considerations
- Set up monitoring and alerting for failed executions
- Consider implementing retry logic for API failures
- Review and adjust pagination limits based on data volume