AI Trainer: Course Converter¶

This workflow transforms educational course documents (PDFs) into structured JSON format suitable for digital learning platforms. It uses AI to intelligently parse course content, extract learning objectives, identify interactive elements, and organize everything into a hierarchical course structure with units and blocks.

Purpose¶

No business context provided yet — add a context.md to enrich this documentation.

How It Works¶

Document Upload: A PDF course document is received via webhook
Text Extraction: The PDF content is extracted and converted to plain text
Course Structure Parsing: The document is analyzed to identify course metadata, learning objectives, and individual EDU sections
Section Classification: Each section is classified by type (text, audio, knowledge check, response, etc.) based on content markers
AI Content Extraction: OpenAI GPT-4 processes each section to extract structured content according to predefined schemas
Content Block Assembly: Extracted content is formatted into standardized content blocks with metadata
Course Assembly: All blocks are grouped into units and assembled into the final course JSON structure
Response Delivery: The structured course data is returned via webhook response

Workflow Diagram¶

graph TD
    A[Webhook] --> B[Extract from File]
    B --> C[Set Course Text]
    C --> D[Parse Course Structure]
    D --> E[Split Sections]
    E --> F[Classify Sections]
    F --> G[Basic LLM Chain]
    G --> H[Code]
    H --> I[Aggregate]
    I --> J[Assemble Final JSON]
    J --> K[Format Output]
    K --> L[Respond to Webhook]

    M[OpenAI Chat Model] --> G
    N[Structured Output Parser] --> G

Trigger¶

Webhook (POST): The workflow is triggered by a POST request to the webhook endpoint 1a6c5fc2-2cac-410f-898b-289e638e25d9. The request should include a PDF file in the body.

Nodes Used¶

Node Type	Purpose
Webhook	Receives incoming PDF documents via HTTP POST
Extract from File	Converts PDF content to plain text
Set	Stores extracted text for processing
Code	Custom JavaScript for parsing course structure and classification
Item Lists	Splits course sections into individual items for processing
Basic LLM Chain	Orchestrates AI content extraction using structured prompts
OpenAI Chat Model	Provides GPT-4 language model for content analysis
Structured Output Parser	Ensures AI responses conform to predefined JSON schema
Aggregate	Collects processed content blocks back into arrays
Respond to Webhook	Returns the final structured course JSON

External Services & Credentials Required¶

OpenAI API: Required for content extraction and parsing
- Credential name: "OpenAI Assistants API"
- Model used: GPT-4.1-mini
- Permissions needed: Chat completions access

Environment Variables¶

No environment variables are explicitly configured in this workflow.

Data Flow¶

Input: - PDF course document uploaded via webhook - Document should contain EDU-formatted sections (e.g., EDU_6_1_1_INTRO) - Content markers like "TEXT:", "AUDIO ONLY", "CR:" for correct responses

Output:

{
  "final_course_json": {
    "title": "Course Title",
    "description": "Course description",
    "course_code": "EDU",
    "units": [
      {
        "id": "EDU_X_Y",
        "title": "Unit Title",
        "description": "Unit description",
        "order_index": 1,
        "learning_objectives": ["objective1", "objective2"],
        "blocks": [
          {
            "id": "EDU_X_Y_Z",
            "type": "text|audio|knowledge_check|response",
            "order_index": 1,
            "content": {
              "text": "Content text",
              "audio_script": "Audio narration",
              "question": "Question text",
              "correct_responses": ["answer1", "answer2"],
              "feedback_correct": "Positive feedback",
              "media_reference": "ASSET_ID"
            },
            "metadata": {
              "response_type": "open_ended|specific_answer",
              "asset_references": ["ASSET_ID"]
            }
          }
        ]
      }
    ]
  },
  "parsing_statistics": {
    "total_units": 1,
    "total_blocks": 33,
    "block_types": ["text", "audio"],
    "items_processed": 1
  }
}

Error Handling¶

The workflow includes basic error handling: - JSON parsing fallbacks in case AI responses are malformed - Safe property access with default values for missing data - Validation of EDU ID format patterns - Graceful handling of missing course metadata

Known Limitations¶

Currently processes only PDF documents
Relies on specific EDU section formatting conventions
Limited to 10 sections when the Limit node is enabled
AI extraction quality depends on document structure and clarity
No validation of final JSON structure completeness

No related workflows are mentioned in the current context.

Setup Instructions¶

Import Workflow: Import the workflow JSON into your n8n instance
Configure OpenAI Credentials:
- Create an OpenAI API credential named "OpenAI Assistants API"
- Add your OpenAI API key with access to GPT-4 models
Webhook Configuration:
- The webhook URL will be: https://your-n8n-domain/webhook/1a6c5fc2-2cac-410f-898b-289e638e25d9
- Configure your client application to POST PDF files to this endpoint
Test the Workflow:
- Upload a sample EDU-formatted course PDF
- Verify the structured JSON output contains expected course data
Optional Adjustments:
- Disable the "Limit" node to process all sections (currently limited to 10)
- Modify the content extraction schema in "Structured Output Parser" if needed
- Adjust the AI prompt in "Basic LLM Chain" for different content types
Production Deployment:
- Ensure adequate OpenAI API rate limits and quotas
- Monitor workflow execution times for large documents
- Set up error notifications for failed extractions