AI Assistant Agent: RAG Input Flow¶

This workflow serves as the primary knowledge ingestion pipeline for an AI assistant system, automatically processing documents from Google Drive, extracting and chunking content, generating embeddings, and storing them in a vector database to power intelligent document retrieval and question-answering capabilities.

Purpose¶

No business context provided yet — add a context.md to enrich this documentation.

This workflow enables organizations to build and maintain a searchable knowledge base by automatically processing documents stored in Google Drive. The system transforms various document formats (DOCX, PDF, Google Docs, JSON) into searchable vector embeddings that can be queried by AI assistants to provide accurate, context-aware responses based on organizational knowledge.

How It Works¶

Trigger Activation: The workflow starts via manual trigger, webhook call, or scheduled execution
Configuration Setup: Loads Google Drive folder ID and admin notification settings
Folder Discovery: Scans the configured Google Drive folder for subfolders containing documents
Human Approval: Requests admin approval via Telegram for each folder to be processed
Document Retrieval: Downloads files from approved folders, supporting multiple formats
Content Extraction: Extracts text content based on file type (Google Docs API for .docx, PDF extraction, etc.)
Text Processing: Chunks documents into manageable segments with configurable overlap
Metadata Extraction: Uses AI to extract themes and keywords from document content
Vector Generation: Creates embeddings using OpenAI's embedding model
Database Storage: Stores vectors and metadata in Supabase vector database
Cleanup Operations: Optionally deletes old document versions with human approval
Notification: Sends completion status via Telegram

Workflow Diagram¶

graph TD
    A[Manual Trigger/Webhook] --> B[Configuration]
    B --> C[List Subfolders]
    C --> D[Split Subfolders]
    D --> E[Loop Over Items1]
    E --> F[Choose Folder]
    F --> G[If2 - Approval Check]
    G -->|Approved| H[List Files]
    G -->|Declined| E
    H --> I[Mapping]
    I --> J[Collection Name]
    I --> K[File Id List]
    I --> L[Wait for delete flow]
    K --> M[Merge1]
    J --> M
    M --> N[Confirm Delete Vectors]
    N --> O[If - Delete Approval]
    O -->|Approved| P[Delete Old Documents]
    O -->|Declined| Q[Send Declined Message]
    P --> R[Start Upsert]
    Q --> R
    R --> L
    L --> S[Loop Over Items]
    S --> T[Download File From Google Drive]
    T --> U[Switch - File Type]
    U -->|DOCX/MD| V[Google Docs]
    U -->|PDF| W[Extract from File]
    U -->|JSON_FIN| X[Extract from JSON]
    U -->|Invalid| Y[Send Invalid Filetype Message]
    V --> Z[Edit Fields]
    W --> Z
    Z --> AA[Split Out1]
    AA --> BB[Chunking]
    BB --> CC[Extract Meta Data]
    CC --> DD[3.5-turbo]
    DD --> EE[Merge]
    EE --> FF[Data Loader]
    FF --> GG[Supabase Vector Store]
    GG --> HH[Wait]
    HH --> S
    X --> II[Split Out Codes]
    X --> JJ[Split Out Categories]
    X --> KK[Intents]
    II --> LL[Upsert Codes]
    JJ --> MM[Upsert Categories]
    KK --> NN[Split out intents]
    NN --> OO[Upsert Intents]
    LL --> PP[Merge2]
    MM --> PP
    OO --> PP
    PP --> S
    Y --> S
    S -->|Complete| QQ[Send Completed Message]

Triggers¶

Manual Trigger: Click "Test workflow" button for manual execution
Webhook: HTTP POST to /webhook/upsert endpoint
Schedule Trigger: Daily execution at 12:00 PM (currently disabled)

Nodes Used¶

Node Type	Purpose
Manual Trigger	Starts workflow for testing
Webhook	Accepts external trigger requests
Set (Configuration)	Stores Google Drive folder ID and admin chat ID
Google Drive	Lists folders/files and downloads documents
Split Out/Split In Batches	Processes multiple items iteratively
Switch	Routes files based on type (DOCX, PDF, JSON, etc.)
Google Docs	Extracts content from Google Docs
Extract from File	Processes PDF and other file formats
Code (Chunking)	Splits documents into overlapping text chunks
Information Extractor	Uses AI to extract metadata and keywords
OpenAI Chat Model	Powers metadata extraction
Embeddings OpenAI	Generates vector embeddings
Supabase Vector Store	Stores documents in vector database
Postgres	Manages budget codes and categories
Telegram	Sends notifications and approval requests
If/Merge	Controls workflow logic and data combination
Wait	Pauses execution between batch processing

External Services & Credentials Required¶

Google Drive OAuth2: Access to Google Drive folders and files
Google Docs OAuth2: Read Google Docs content
OpenAI API: Generate embeddings and power AI extraction
Supabase: Vector database storage
PostgreSQL: Structured data storage for budget codes
Telegram Bot: User notifications and approvals

Environment Variables¶

Configuration is handled through the "Configuration" node with hardcoded values: - folder_id: Google Drive folder ID (currently: "1sfTnMGube-MTyEbchWLQE_Cn-oKTU2G8") - admin_chat_id: Telegram chat ID for notifications (currently: "5207485332")

Data Flow¶

Input: - Google Drive folder containing documents (DOCX, PDF, Google Docs, JSON) - Webhook requests or manual triggers

Processing: - Document content extraction and text chunking - AI-powered metadata extraction (themes, keywords) - Vector embedding generation - Structured data parsing for financial codes

Output: - Vector embeddings stored in Supabase documents table - Budget codes/categories in PostgreSQL tables - Telegram notifications with processing status - Metadata including file IDs, themes, keywords, and chunk information

Error Handling¶

File Download Errors: Continues processing other files if individual downloads fail
Invalid File Types: Sends notification and skips unsupported files
Processing Failures: Uses "Continue on Error" for batch operations
Human Approval: Requires explicit confirmation for destructive operations (deletions)
Timeout Protection: 15-minute limit on approval requests

Known Limitations¶

Hardcoded configuration values require manual updates
Limited to specific Google Drive folder structure
Requires human approval for each folder processing
PowerPoint files need conversion to PDF (currently disabled)
No automatic retry mechanism for failed operations
Chunk size fixed at 3000 tokens with 200-token overlap

This workflow likely connects to: - AI Assistant query/response workflows that use the generated embeddings - Document management workflows for content updates - Budget management systems that consume the financial codes data

Setup Instructions¶

Import Workflow: Copy the workflow JSON into your n8n instance
Configure Credentials:
- Set up Google Drive OAuth2 connection
- Configure Google Docs OAuth2 access
- Add OpenAI API key
- Set up Supabase connection with vector database
- Configure PostgreSQL connection
- Create Telegram bot and get API credentials
Update Configuration Node:
- Replace folder_id with your Google Drive folder ID
- Update admin_chat_id with your Telegram chat ID
Database Setup:
- Ensure Supabase has documents table with vector support
- Create PostgreSQL tables: budget_codes, budget_categories, budget_intents
Test Execution:
- Start with manual trigger to verify all connections
- Test with a small folder containing sample documents
- Verify vector storage and metadata extraction
Production Deployment:
- Enable webhook trigger for external integrations
- Configure schedule trigger if needed
- Set up monitoring for failed executions