Skip to content

AI Trainer: Course Converter

This workflow transforms educational course documents (PDFs) into structured JSON format suitable for digital learning platforms. It uses AI to intelligently parse course content, extract learning objectives, identify interactive elements, and organize everything into a hierarchical course structure with units and blocks.

Purpose

No business context provided yet — add a context.md to enrich this documentation.

How It Works

  1. Document Upload: A PDF course document is received via webhook
  2. Text Extraction: The PDF content is extracted and converted to plain text
  3. Course Structure Parsing: The document is analyzed to identify course metadata, learning objectives, and individual EDU sections
  4. Section Classification: Each section is classified by type (text, audio, knowledge check, response, etc.) based on content markers
  5. AI Content Extraction: OpenAI GPT-4 processes each section to extract structured content according to predefined schemas
  6. Content Block Assembly: Extracted content is formatted into standardized content blocks with metadata
  7. Course Assembly: All blocks are grouped into units and assembled into the final course JSON structure
  8. Response Delivery: The structured course data is returned via webhook response

Workflow Diagram

graph TD
    A[Webhook] --> B[Extract from File]
    B --> C[Set Course Text]
    C --> D[Parse Course Structure]
    D --> E[Split Sections]
    E --> F[Classify Sections]
    F --> G[Basic LLM Chain]
    G --> H[Code]
    H --> I[Aggregate]
    I --> J[Assemble Final JSON]
    J --> K[Format Output]
    K --> L[Respond to Webhook]

    M[OpenAI Chat Model] --> G
    N[Structured Output Parser] --> G

Trigger

Webhook (POST): The workflow is triggered by a POST request to the webhook endpoint 1a6c5fc2-2cac-410f-898b-289e638e25d9. The request should include a PDF file in the body.

Nodes Used

Node Type Purpose
Webhook Receives incoming PDF documents via HTTP POST
Extract from File Converts PDF content to plain text
Set Stores extracted text for processing
Code Custom JavaScript for parsing course structure and classification
Item Lists Splits course sections into individual items for processing
Basic LLM Chain Orchestrates AI content extraction using structured prompts
OpenAI Chat Model Provides GPT-4 language model for content analysis
Structured Output Parser Ensures AI responses conform to predefined JSON schema
Aggregate Collects processed content blocks back into arrays
Respond to Webhook Returns the final structured course JSON

External Services & Credentials Required

  • OpenAI API: Required for content extraction and parsing
    • Credential name: "OpenAI Assistants API"
    • Model used: GPT-4.1-mini
    • Permissions needed: Chat completions access

Environment Variables

No environment variables are explicitly configured in this workflow.

Data Flow

Input: - PDF course document uploaded via webhook - Document should contain EDU-formatted sections (e.g., EDU_6_1_1_INTRO) - Content markers like "TEXT:", "AUDIO ONLY", "CR:" for correct responses

Output:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
{
  "final_course_json": {
    "title": "Course Title",
    "description": "Course description",
    "course_code": "EDU",
    "units": [
      {
        "id": "EDU_X_Y",
        "title": "Unit Title",
        "description": "Unit description",
        "order_index": 1,
        "learning_objectives": ["objective1", "objective2"],
        "blocks": [
          {
            "id": "EDU_X_Y_Z",
            "type": "text|audio|knowledge_check|response",
            "order_index": 1,
            "content": {
              "text": "Content text",
              "audio_script": "Audio narration",
              "question": "Question text",
              "correct_responses": ["answer1", "answer2"],
              "feedback_correct": "Positive feedback",
              "media_reference": "ASSET_ID"
            },
            "metadata": {
              "response_type": "open_ended|specific_answer",
              "asset_references": ["ASSET_ID"]
            }
          }
        ]
      }
    ]
  },
  "parsing_statistics": {
    "total_units": 1,
    "total_blocks": 33,
    "block_types": ["text", "audio"],
    "items_processed": 1
  }
}

Error Handling

The workflow includes basic error handling: - JSON parsing fallbacks in case AI responses are malformed - Safe property access with default values for missing data - Validation of EDU ID format patterns - Graceful handling of missing course metadata

Known Limitations

  • Currently processes only PDF documents
  • Relies on specific EDU section formatting conventions
  • Limited to 10 sections when the Limit node is enabled
  • AI extraction quality depends on document structure and clarity
  • No validation of final JSON structure completeness

No related workflows are mentioned in the current context.

Setup Instructions

  1. Import Workflow: Import the workflow JSON into your n8n instance

  2. Configure OpenAI Credentials:

    • Create an OpenAI API credential named "OpenAI Assistants API"
    • Add your OpenAI API key with access to GPT-4 models
  3. Webhook Configuration:

    • The webhook URL will be: https://your-n8n-domain/webhook/1a6c5fc2-2cac-410f-898b-289e638e25d9
    • Configure your client application to POST PDF files to this endpoint
  4. Test the Workflow:

    • Upload a sample EDU-formatted course PDF
    • Verify the structured JSON output contains expected course data
  5. Optional Adjustments:

    • Disable the "Limit" node to process all sections (currently limited to 10)
    • Modify the content extraction schema in "Structured Output Parser" if needed
    • Adjust the AI prompt in "Basic LLM Chain" for different content types
  6. Production Deployment:

    • Ensure adequate OpenAI API rate limits and quotas
    • Monitor workflow execution times for large documents
    • Set up error notifications for failed extractions