AI Assistant Agent: RAG Input Flow¶
This workflow automates the ingestion of documents from Google Drive into a vector database for AI-powered search and retrieval. It processes various file formats, extracts metadata, chunks content appropriately, and stores everything in Supabase with proper embeddings for semantic search capabilities.
Purpose¶
No business context provided yet — add a context.md to enrich this documentation.
How It Works¶
- Trigger Activation: The workflow starts via webhook, manual trigger, or scheduled execution
- Configuration Setup: Defines Google Drive folder ID and admin chat ID for notifications
- Folder Discovery: Lists all subfolders in the specified Google Drive directory
- Human Approval: Sends Telegram messages to admin for folder selection approval
- File Processing: Downloads files from approved folders, supporting DOCX, PDF, Markdown, PowerPoint, and JSON formats
- Content Extraction: Extracts text content using appropriate parsers for each file type
- Document Chunking: Splits large documents into manageable chunks with overlap for better retrieval
- Metadata Extraction: Uses AI to extract themes and keywords from document content
- Vector Storage: Generates embeddings and stores documents in Supabase vector database
- Cleanup Operations: Optionally deletes old documents with human confirmation
- Notifications: Sends completion status via Telegram
Mermaid Diagram¶
graph TD
A[Webhook/Manual Trigger] --> B[Configuration]
B --> C[List Subfolders]
C --> D[Choose Folder]
D --> E{Approval?}
E -->|Yes| F[List Files]
E -->|No| G[Loop Over Items1]
F --> H[Mapping]
H --> I[File Id List]
H --> J[Collection Name]
H --> K[Wait for delete flow]
I --> L[Merge1]
J --> L
L --> M[Confirm Delete Vectors]
M --> N{Delete Approved?}
N -->|Yes| O[Delete Old Documents]
N -->|No| P[Send Declined Message]
O --> Q[Start Upsert]
P --> Q
Q --> K
K --> R[Loop Over Items]
R --> S[Download File From Google Drive]
S --> T[Switch]
T -->|DOCX| U[Google Docs]
T -->|PDF| V[Extract from File]
T -->|JSON_FIN| W[Extract from JSON]
T -->|Invalid| X[Send Invalid Filetype Message]
U --> Y[Edit Fields]
V --> Y
Y --> Z[Split Out1]
Z --> AA[Chunking]
AA --> BB[Extract Meta Data]
BB --> CC[Merge]
CC --> DD[Data Loader]
DD --> EE[Supabase Vector Store]
EE --> FF[Wait]
FF --> R
W --> GG[Split Out Codes]
W --> HH[Split Out Categories]
W --> II[Intents]
GG --> JJ[Upsert Codes]
HH --> KK[Upsert Categories]
II --> LL[Split out intents]
LL --> MM[Upsert Intents]
JJ --> NN[Merge2]
KK --> NN
MM --> NN
NN --> R
Trigger¶
- Webhook: Accessible at
/webhook/upsertendpoint for external integrations - Manual Trigger: For testing and manual execution
- Schedule Trigger: Daily at 12:00 PM (currently disabled)
Nodes Used¶
| Node Type | Purpose |
|---|---|
| Webhook | Receives external HTTP requests to start the workflow |
| Manual Trigger | Allows manual workflow execution for testing |
| Schedule Trigger | Automated daily execution (disabled) |
| Google Drive | Lists folders/files and downloads documents |
| Google Docs | Extracts content from Google Docs format |
| Extract from File | Processes PDF and other binary file formats |
| Extract from JSON | Handles JSON financial data files |
| Switch | Routes files based on type/extension |
| Telegram | Sends notifications and approval requests to admin |
| Code | Custom chunking logic for document splitting |
| Information Extractor | AI-powered metadata extraction |
| Supabase Vector Store | Stores documents with embeddings |
| OpenAI Chat Model | Powers AI extraction and processing |
| Embeddings OpenAI | Generates vector embeddings |
| Split In Batches | Processes items in controlled batches |
| Merge | Combines data from multiple sources |
| Set | Data transformation and field mapping |
| If/Switch | Conditional logic and routing |
External Services & Credentials Required¶
- Google Drive OAuth2: For accessing and downloading files
- Google Docs OAuth2: For reading Google Docs content
- OpenAI API: For embeddings and AI-powered metadata extraction
- Supabase: Vector database for document storage
- Telegram Bot: For admin notifications and approvals
- PostgreSQL: For structured data storage (budget codes, categories, intents)
Environment Variables¶
No specific environment variables are documented. Configuration is handled through the Configuration node with hardcoded values for:
- folder_id: Google Drive folder ID
- admin_chat_id: Telegram chat ID for notifications
Data Flow¶
Input: - Google Drive folder containing documents (DOCX, PDF, MD, PPTX, JSON) - Admin approval decisions via Telegram
Processing: - Document content extraction and chunking - AI-generated metadata (themes, keywords) - Vector embeddings generation
Output: - Documents stored in Supabase vector database with metadata - Financial codes/categories stored in PostgreSQL - Status notifications via Telegram
Error Handling¶
- File Download Errors: Continues processing other files if individual downloads fail
- Invalid File Types: Sends notification to admin about unsupported files
- Processing Errors: Continues with remaining items in batch
- Human Approval Timeout: 15-minute limit for approval decisions
- Deletion Confirmation: Double approval required for destructive operations
Known Limitations¶
- Hardcoded configuration values require manual updates
- Limited to specific file formats (DOCX, PDF, MD, PPTX, JSON)
- Requires manual approval for each folder processing
- PowerPoint conversion functionality is disabled
- Financial JSON processing is specific to budget code structure
Related Workflows¶
No related workflows mentioned in the current documentation.
Setup Instructions¶
-
Import Workflow: Import the JSON workflow definition into your n8n instance
-
Configure Credentials:
- Set up Google Drive OAuth2 connection
- Configure Google Docs OAuth2 access
- Add OpenAI API key for embeddings and chat
- Set up Supabase connection with vector store enabled
- Create Telegram bot and get API credentials
- Configure PostgreSQL connection for structured data
-
Update Configuration Node:
- Set
folder_idto your Google Drive folder ID - Update
admin_chat_idwith your Telegram chat ID
- Set
-
Database Setup:
- Ensure Supabase has
documentstable configured for vector storage - Create PostgreSQL tables:
budget_codes,budget_categories,budget_intents
- Ensure Supabase has
-
Test Workflow:
- Use manual trigger to test the complete flow
- Verify Telegram notifications are working
- Confirm documents are properly stored in vector database
-
Production Deployment:
- Enable webhook endpoint for external integrations
- Configure schedule trigger if automated processing is needed
- Set up monitoring for error notifications