AI Trainer: Course Converter¶
This workflow transforms educational course documents (PDFs) into structured JSON format suitable for digital learning platforms. It uses AI to intelligently parse course content, extract learning objectives, identify interactive elements, and organize everything into a hierarchical course structure with units and blocks.
Purpose¶
No business context provided yet — add a context.md to enrich this documentation.
How It Works¶
- Document Upload: A PDF course document is received via webhook
- Text Extraction: The PDF content is extracted and converted to plain text
- Course Structure Parsing: The document is analyzed to identify course metadata, learning objectives, and individual EDU sections
- Section Classification: Each section is classified by type (text, audio, knowledge check, response, etc.) based on content markers
- AI Content Extraction: OpenAI GPT-4 processes each section to extract structured content according to predefined schemas
- Content Block Assembly: Extracted content is formatted into standardized content blocks with metadata
- Course Assembly: All blocks are grouped into units and assembled into the final course JSON structure
- Response Delivery: The structured course data is returned via webhook response
Workflow Diagram¶
graph TD
A[Webhook] --> B[Extract from File]
B --> C[Set Course Text]
C --> D[Parse Course Structure]
D --> E[Split Sections]
E --> F[Classify Sections]
F --> G[Basic LLM Chain]
G --> H[Code]
H --> I[Aggregate]
I --> J[Assemble Final JSON]
J --> K[Format Output]
K --> L[Respond to Webhook]
M[OpenAI Chat Model] --> G
N[Structured Output Parser] --> G
Trigger¶
Webhook (POST): The workflow is triggered by a POST request to the webhook endpoint 1a6c5fc2-2cac-410f-898b-289e638e25d9. The request should include a PDF file in the body.
Nodes Used¶
| Node Type | Purpose |
|---|---|
| Webhook | Receives incoming PDF documents via HTTP POST |
| Extract from File | Converts PDF content to plain text |
| Set | Stores extracted text for processing |
| Code | Custom JavaScript for parsing course structure and classification |
| Item Lists | Splits course sections into individual items for processing |
| Basic LLM Chain | Orchestrates AI content extraction using structured prompts |
| OpenAI Chat Model | Provides GPT-4 language model for content analysis |
| Structured Output Parser | Ensures AI responses conform to predefined JSON schema |
| Aggregate | Collects processed content blocks back into arrays |
| Respond to Webhook | Returns the final structured course JSON |
External Services & Credentials Required¶
- OpenAI API: Required for content extraction and parsing
- Credential name: "OpenAI Assistants API"
- Model used: GPT-4.1-mini
- Permissions needed: Chat completions access
Environment Variables¶
No environment variables are explicitly configured in this workflow.
Data Flow¶
Input: - PDF course document uploaded via webhook - Document should contain EDU-formatted sections (e.g., EDU_6_1_1_INTRO) - Content markers like "TEXT:", "AUDIO ONLY", "CR:" for correct responses
Output:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | |
Error Handling¶
The workflow includes basic error handling: - JSON parsing fallbacks in case AI responses are malformed - Safe property access with default values for missing data - Validation of EDU ID format patterns - Graceful handling of missing course metadata
Known Limitations¶
- Currently processes only PDF documents
- Relies on specific EDU section formatting conventions
- Limited to 10 sections when the Limit node is enabled
- AI extraction quality depends on document structure and clarity
- No validation of final JSON structure completeness
Related Workflows¶
No related workflows are mentioned in the current context.
Setup Instructions¶
-
Import Workflow: Import the workflow JSON into your n8n instance
-
Configure OpenAI Credentials:
- Create an OpenAI API credential named "OpenAI Assistants API"
- Add your OpenAI API key with access to GPT-4 models
-
Webhook Configuration:
- The webhook URL will be:
https://your-n8n-domain/webhook/1a6c5fc2-2cac-410f-898b-289e638e25d9 - Configure your client application to POST PDF files to this endpoint
- The webhook URL will be:
-
Test the Workflow:
- Upload a sample EDU-formatted course PDF
- Verify the structured JSON output contains expected course data
-
Optional Adjustments:
- Disable the "Limit" node to process all sections (currently limited to 10)
- Modify the content extraction schema in "Structured Output Parser" if needed
- Adjust the AI prompt in "Basic LLM Chain" for different content types
-
Production Deployment:
- Ensure adequate OpenAI API rate limits and quotas
- Monitor workflow execution times for large documents
- Set up error notifications for failed extractions