Skip to content

AI Assistant Agent: New RAG Input Flow

This workflow automates the synchronization of documents from Google Drive to OpenAI vector stores for AI assistants. It monitors Google Drive folders for changes, processes various document formats (PDF, DOCX, PPTX, Markdown, JSON), and maintains up-to-date vector stores that power AI assistants with the latest knowledge base content.

Purpose

No business context provided yet — add a context.md to enrich this documentation.

This workflow serves as an enhanced RAG (Retrieval Augmented Generation) ingestion system that automatically keeps AI assistant knowledge bases current. It eliminates the manual overhead of updating vector stores when documents change, ensuring AI assistants always have access to the most recent information from organizational knowledge repositories.

How It Works

  1. Change Detection: The workflow runs every 30 minutes, checking Google Drive for any file modifications using the Drive Changes API
  2. Folder Processing: It scans configured subfolders within a main Google Drive directory, identifying which folders contain updated files
  3. Assistant Matching: For each changed folder, it finds the corresponding OpenAI assistant by matching folder names to assistant names
  4. Vector Store Cleanup: Before adding new content, it removes all existing files from the assistant's vector store to prevent duplicates
  5. File Download: Downloads all non-trashed files from the updated folders, converting Google Docs formats to standard formats (DOCX, PDF)
  6. Format Processing: Routes files through different processing paths based on their format (DOCX, PDF, Markdown, PowerPoint, JSON)
  7. File Upload: Uploads processed files to OpenAI's file storage and attaches them to the appropriate vector stores
  8. Completion Notification: Sends a Telegram message confirming which assistants were updated

Workflow Diagram

graph TD
    A[Schedule Trigger - Every 30min] --> B[Configuration Setup]
    B --> C[List Google Drive Subfolders]
    C --> D[Get Drive Changes API Token]
    D --> E[Check for Drive Changes]
    E --> F[Match Changes to Folders]
    F --> G[Get OpenAI Assistants]
    G --> H[Match Folders to Assistants]
    H --> I[Get Vector Store Files]
    I --> J[Delete Existing Files]
    J --> K[Download Updated Files]
    K --> L[Process by File Type]
    L --> M{File Format?}
    M -->|DOCX| N[Process DOCX]
    M -->|PDF| O[Process PDF]
    M -->|Markdown| P[Process Markdown]
    M -->|PPTX| Q[Process PowerPoint]
    M -->|JSON| R[Process JSON]
    M -->|XLSX| S[Skip - Not Supported]
    N --> T[Upload to OpenAI]
    O --> T
    P --> T
    Q --> T
    R --> T
    T --> U[Attach to Vector Store]
    U --> V[Send Completion Message]

Trigger

  • Primary: Schedule Trigger (every 30 minutes)
  • Secondary: Manual trigger for testing
  • Webhook: Available at /webhook/4c6d12fd-c3a4-46db-8c47-2afd4d5e49e8 (currently disabled)

Nodes Used

Node Type Purpose
Schedule Trigger Runs workflow every 30 minutes
Google Drive Lists folders, downloads files, tracks changes
OpenAI (LangChain) Manages assistants, uploads files, deletes files
HTTP Request Interacts with OpenAI API for vector store operations
Telegram Sends notifications and confirmations
PostgreSQL Stores Drive API change tokens
Filter Removes trashed files, matches assistants
Switch Routes files by format type
Merge Combines data streams
Split Out/Split In Batches Processes arrays and batches
Set/Edit Fields Data transformation
Remove Duplicates Prevents duplicate processing
Code Adds file extensions for OpenAI compatibility

External Services & Credentials Required

  • Google Drive OAuth2 API: For accessing and monitoring Google Drive folders
  • OpenAI API: For managing assistants, vector stores, and file uploads
  • Telegram Bot API: For sending status notifications
  • PostgreSQL Database: For storing Drive API change tracking tokens

Environment Variables

The workflow uses hardcoded configuration values that should be moved to environment variables:

  • folder_id: Main Google Drive folder ID (1sfTnMGube-MTyEbchWLQE_Cn-oKTU2G8)
  • admin_chat_id: Telegram chat ID for notifications (5207485332)

Data Flow

Input: - Google Drive folder structure with documents - OpenAI assistants with existing vector stores

Processing: - File metadata (name, modification time, parent folder) - Binary file content in various formats - Assistant and vector store configurations

Output: - Updated vector stores with latest document content - Telegram notifications with processing results - Updated change tracking tokens in PostgreSQL

Error Handling

  • File Download Errors: Continues processing other files if individual downloads fail
  • File Upload Errors: Attempts fallback vector store attachment if primary fails
  • API Rate Limiting: Uses batch processing and wait nodes to manage API calls
  • Format Support: Gracefully skips unsupported formats (XLSX) with notifications

Known Limitations

  • XLSX Files: Excel files are not currently supported and are skipped
  • Large Files: No explicit file size limits implemented
  • Concurrent Changes: May miss rapid successive changes within the 30-minute interval
  • Manual Approval: Some deletion operations require manual Telegram confirmation (currently disabled)

This appears to be an enhanced version of an existing "RAG Input Flow" workflow, with improvements in duplicate detection, quality filtering, and error handling.

Setup Instructions

  1. Import Workflow: Import the JSON workflow definition into your n8n instance

  2. Configure Credentials:

    • Set up Google Drive OAuth2 API credentials
    • Configure OpenAI API credentials
    • Add Telegram Bot API credentials
    • Set up PostgreSQL database connection
  3. Database Setup: Create the drive_changes table in PostgreSQL:

    1
    2
    3
    4
    5
    CREATE TABLE drive_changes (
      created_at TIMESTAMP,
      pageToken TEXT,
      updated_at TIMESTAMP
    );
    

  4. Update Configuration:

    • Replace hardcoded folder_id with your Google Drive folder ID
    • Update admin_chat_id with your Telegram chat ID
    • Ensure your Google Drive folder contains subfolders named to match your OpenAI assistants
  5. Assistant Setup:

    • Create OpenAI assistants with names matching your Google Drive subfolder names
    • Ensure each assistant has a vector store configured for file search
  6. Test: Run the manual trigger to verify the workflow processes your files correctly

  7. Activate: Enable the schedule trigger for automated operation