Skip to content

AI Assistant Agent: RAG Input Flow

This workflow automates the ingestion of documents from Google Drive into a vector database for AI-powered search and retrieval. It processes various file formats, extracts metadata, chunks content appropriately, and stores everything in Supabase with proper embeddings for semantic search capabilities.

Purpose

No business context provided yet — add a context.md to enrich this documentation.

How It Works

  1. Trigger Activation: The workflow starts via webhook, manual trigger, or scheduled execution
  2. Configuration Setup: Defines Google Drive folder ID and admin chat ID for notifications
  3. Folder Discovery: Lists all subfolders in the specified Google Drive directory
  4. Human Approval: Sends Telegram messages to admin for folder selection approval
  5. File Processing: Downloads files from approved folders, supporting DOCX, PDF, Markdown, PowerPoint, and JSON formats
  6. Content Extraction: Extracts text content using appropriate parsers for each file type
  7. Document Chunking: Splits large documents into manageable chunks with overlap for better retrieval
  8. Metadata Extraction: Uses AI to extract themes and keywords from document content
  9. Vector Storage: Generates embeddings and stores documents in Supabase vector database
  10. Cleanup Operations: Optionally deletes old documents with human confirmation
  11. Notifications: Sends completion status via Telegram

Mermaid Diagram

graph TD
    A[Webhook/Manual Trigger] --> B[Configuration]
    B --> C[List Subfolders]
    C --> D[Choose Folder]
    D --> E{Approval?}
    E -->|Yes| F[List Files]
    E -->|No| G[Loop Over Items1]
    F --> H[Mapping]
    H --> I[File Id List]
    H --> J[Collection Name]
    H --> K[Wait for delete flow]
    I --> L[Merge1]
    J --> L
    L --> M[Confirm Delete Vectors]
    M --> N{Delete Approved?}
    N -->|Yes| O[Delete Old Documents]
    N -->|No| P[Send Declined Message]
    O --> Q[Start Upsert]
    P --> Q
    Q --> K
    K --> R[Loop Over Items]
    R --> S[Download File From Google Drive]
    S --> T[Switch]
    T -->|DOCX| U[Google Docs]
    T -->|PDF| V[Extract from File]
    T -->|JSON_FIN| W[Extract from JSON]
    T -->|Invalid| X[Send Invalid Filetype Message]
    U --> Y[Edit Fields]
    V --> Y
    Y --> Z[Split Out1]
    Z --> AA[Chunking]
    AA --> BB[Extract Meta Data]
    BB --> CC[Merge]
    CC --> DD[Data Loader]
    DD --> EE[Supabase Vector Store]
    EE --> FF[Wait]
    FF --> R
    W --> GG[Split Out Codes]
    W --> HH[Split Out Categories]
    W --> II[Intents]
    GG --> JJ[Upsert Codes]
    HH --> KK[Upsert Categories]
    II --> LL[Split out intents]
    LL --> MM[Upsert Intents]
    JJ --> NN[Merge2]
    KK --> NN
    MM --> NN
    NN --> R

Trigger

  • Webhook: Accessible at /webhook/upsert endpoint for external integrations
  • Manual Trigger: For testing and manual execution
  • Schedule Trigger: Daily at 12:00 PM (currently disabled)

Nodes Used

Node Type Purpose
Webhook Receives external HTTP requests to start the workflow
Manual Trigger Allows manual workflow execution for testing
Schedule Trigger Automated daily execution (disabled)
Google Drive Lists folders/files and downloads documents
Google Docs Extracts content from Google Docs format
Extract from File Processes PDF and other binary file formats
Extract from JSON Handles JSON financial data files
Switch Routes files based on type/extension
Telegram Sends notifications and approval requests to admin
Code Custom chunking logic for document splitting
Information Extractor AI-powered metadata extraction
Supabase Vector Store Stores documents with embeddings
OpenAI Chat Model Powers AI extraction and processing
Embeddings OpenAI Generates vector embeddings
Split In Batches Processes items in controlled batches
Merge Combines data from multiple sources
Set Data transformation and field mapping
If/Switch Conditional logic and routing

External Services & Credentials Required

  • Google Drive OAuth2: For accessing and downloading files
  • Google Docs OAuth2: For reading Google Docs content
  • OpenAI API: For embeddings and AI-powered metadata extraction
  • Supabase: Vector database for document storage
  • Telegram Bot: For admin notifications and approvals
  • PostgreSQL: For structured data storage (budget codes, categories, intents)

Environment Variables

No specific environment variables are documented. Configuration is handled through the Configuration node with hardcoded values for: - folder_id: Google Drive folder ID - admin_chat_id: Telegram chat ID for notifications

Data Flow

Input: - Google Drive folder containing documents (DOCX, PDF, MD, PPTX, JSON) - Admin approval decisions via Telegram

Processing: - Document content extraction and chunking - AI-generated metadata (themes, keywords) - Vector embeddings generation

Output: - Documents stored in Supabase vector database with metadata - Financial codes/categories stored in PostgreSQL - Status notifications via Telegram

Error Handling

  • File Download Errors: Continues processing other files if individual downloads fail
  • Invalid File Types: Sends notification to admin about unsupported files
  • Processing Errors: Continues with remaining items in batch
  • Human Approval Timeout: 15-minute limit for approval decisions
  • Deletion Confirmation: Double approval required for destructive operations

Known Limitations

  • Hardcoded configuration values require manual updates
  • Limited to specific file formats (DOCX, PDF, MD, PPTX, JSON)
  • Requires manual approval for each folder processing
  • PowerPoint conversion functionality is disabled
  • Financial JSON processing is specific to budget code structure

No related workflows mentioned in the current documentation.

Setup Instructions

  1. Import Workflow: Import the JSON workflow definition into your n8n instance

  2. Configure Credentials:

    • Set up Google Drive OAuth2 connection
    • Configure Google Docs OAuth2 access
    • Add OpenAI API key for embeddings and chat
    • Set up Supabase connection with vector store enabled
    • Create Telegram bot and get API credentials
    • Configure PostgreSQL connection for structured data
  3. Update Configuration Node:

    • Set folder_id to your Google Drive folder ID
    • Update admin_chat_id with your Telegram chat ID
  4. Database Setup:

    • Ensure Supabase has documents table configured for vector storage
    • Create PostgreSQL tables: budget_codes, budget_categories, budget_intents
  5. Test Workflow:

    • Use manual trigger to test the complete flow
    • Verify Telegram notifications are working
    • Confirm documents are properly stored in vector database
  6. Production Deployment:

    • Enable webhook endpoint for external integrations
    • Configure schedule trigger if automated processing is needed
    • Set up monitoring for error notifications