Skip to content

AI Trainer: Course Converter

This workflow transforms raw educational course documents into structured JSON format using AI-powered content parsing. It intelligently extracts learning objectives, instructional text, audio scripts, questions, answers, and feedback from course materials, making them ready for digital learning platforms.

Purpose

No business context provided yet — add a context.md to enrich this documentation.

This workflow serves educational content creators and learning management system administrators who need to convert traditional course documents into structured, machine-readable formats. It automates the tedious process of manually extracting and organizing course content, enabling rapid digitization of educational materials.

How It Works

  1. Document Reception: The workflow receives course documents via webhook, either as structured section data or PDF files
  2. Content Extraction: PDF documents are processed to extract raw text content
  3. Structure Parsing: The system identifies course metadata (title, objectives) and splits content into individual EDU sections
  4. Section Classification: Each section is analyzed to determine its type (learning objectives, text, audio, knowledge check, etc.)
  5. AI Content Analysis: OpenAI's GPT-4.1-mini processes each section to extract structured content according to predefined schemas
  6. Content Structuring: Extracted content is organized into standardized content blocks with proper metadata
  7. Final Assembly: All sections are combined into a complete course JSON structure with units and ordered blocks
  8. Response Delivery: The structured course data is returned via webhook response

Workflow Diagram

graph TD
    A[Webhook Trigger] --> B[Extract from File]
    B --> C[Set Course Text]
    C --> D[Parse Course Structure]
    D --> E[Split Sections]
    E --> F[Classify Sections]
    F --> G[Basic LLM Chain]
    G --> H[Structure Content Blocks]
    H --> I[Code Processing]
    I --> J[Aggregate Results]
    J --> K[Assemble Final JSON]
    K --> L[Format Output]
    L --> M[Respond to Webhook]

    N[OpenAI Chat Model] --> G
    O[Structured Output Parser] --> G

    style A fill:#e1f5fe
    style M fill:#e8f5e8
    style G fill:#fff3e0

Trigger

Webhook: POST request to /webhook/1a6c5fc2-2cac-410f-898b-289e638e25d9

The workflow can be triggered with either: - Structured course section data (JSON) - PDF course documents (multipart/form-data)

Nodes Used

Node Type Purpose
Webhook Receives incoming course documents and section data
Extract from File Processes PDF documents to extract text content
Set Assigns extracted text to variables for processing
Code Parses course structure, classifies sections, and assembles final output
Item Lists Splits course sections into individual items for processing
Basic LLM Chain Analyzes content using AI to extract structured information
Structured Output Parser Enforces JSON schema compliance on AI responses
Aggregate Combines processed sections back into a single dataset
Respond to Webhook Returns the structured course JSON to the caller

External Services & Credentials Required

  • OpenAI API: GPT-4.1-mini model for content analysis
    • Credential: openAiApi (ID: CrM3JP0wordbyyCE, Name: Waringa)
    • Required for AI-powered content extraction and structuring

Environment Variables

No environment variables are explicitly configured in this workflow. All configuration is handled through node parameters and credentials.

Data Flow

Input

Structured Section Data:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
{
  "id": "section_id",
  "blockType": "learning_objectives|text|audio|knowledge_check|response|conversation_starter|ai_prompt",
  "metadata": {
    "hasAudio": true/false,
    "requiresResponse": true/false,
    "isIntro": true/false
  },
  "content": "raw course text content..."
}

PDF Document: Binary file upload via multipart form data

Output

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
{
  "final_course_json": {
    "title": "Course Title",
    "description": "Course description",
    "course_code": "EDU",
    "units": [
      {
        "id": "EDU_X_Y",
        "title": "Unit Title",
        "description": "Unit description",
        "order_index": 1,
        "learning_objectives": ["objective1", "objective2"],
        "blocks": [
          {
            "id": "section_id",
            "type": "text|audio|knowledge_check|response",
            "order_index": 1,
            "content": {
              "text": "instructional content",
              "audio_script": "narration script",
              "question": "assessment question",
              "correct_responses": ["answer1", "answer2"],
              "feedback_correct": "positive feedback",
              "feedback_incorrect": "corrective feedback"
            },
            "metadata": {
              "response_type": "open_ended|specific_answer",
              "asset_references": ["asset_id"]
            }
          }
        ]
      }
    ]
  },
  "parsing_statistics": {
    "total_units": 1,
    "total_blocks": 33,
    "block_types": ["text", "audio"],
    "unit_ids": ["EDU_1_1"],
    "items_processed": 1
  }
}

Error Handling

The workflow includes basic error handling: - JSON Parsing Fallback: If AI response is malformed JSON, falls back to plain text extraction - ID Validation: Validates EDU section ID format and provides error messages for invalid formats - Safe Property Access: Uses fallback values when expected data properties are missing - Content Type Detection: Automatically determines block types based on content markers and patterns

Known Limitations

  • Currently processes a maximum of 10 sections per execution (Limit node is disabled but present)
  • Hardcoded payload node suggests the workflow may be in development/testing mode
  • Token splitter is configured but not actively used in the current flow
  • Some processing paths appear to be disabled or bypassed

No related workflows are mentioned in the current configuration.

Setup Instructions

  1. Import Workflow: Import the JSON workflow definition into your n8n instance

  2. Configure OpenAI Credentials:

    • Create an OpenAI API credential in n8n
    • Add your OpenAI API key
    • Update the credential reference in the "OpenAI Chat Model" node
  3. Test Webhook Endpoint:

    • Note the webhook URL: https://your-n8n-instance.com/webhook/1a6c5fc2-2cac-410f-898b-289e638e25d9
    • Test with sample course section data or PDF upload
  4. Customize Content Schema (Optional):

    • Modify the "Structured Output Parser" schema to match your course content requirements
    • Update the AI prompt in "Basic LLM Chain" to reflect any schema changes
  5. Enable Processing Limits (Optional):

    • Enable the "Limit" node if you want to restrict the number of sections processed per execution
    • Adjust the limit value as needed
  6. Production Deployment:

    • Remove or disable the "Hardcoded payload" node
    • Ensure the main processing path through "Extract from File" is properly connected
    • Test with real course documents to validate output quality

The workflow is ready to use once OpenAI credentials are configured and the webhook endpoint is accessible.