WhatsApp Voice Note Processing¶

This workflow processes voice messages sent via WhatsApp, transcribing them using Google Gemini AI and responding with contextual information from knowledge bases. It handles both small audio files through direct transcription and larger files through intelligent chunking, then routes the transcribed content to specialized AI assistants for SkillUp and E!BA programs.

Purpose¶

No business context provided yet — add a context.md to enrich this documentation.

Based on the workflow structure, this appears to serve program staff and participants who need to: - Send voice messages via WhatsApp for quick communication - Get transcribed responses from audio content - Access program-specific guidance through AI assistants - Handle various audio file sizes efficiently

How It Works¶

Message Reception: WhatsApp webhook receives incoming voice messages from users
Message Type Detection: System determines if the message contains audio or text content
Audio Download: For voice messages, the system downloads the audio file from Twilio's media URL
File Size Analysis: Audio files are analyzed and categorized as small (<2MB), medium (2-10MB), large (10-25MB), or too large (>25MB)
Transcription Routing:
- Small/medium files: Direct transcription via Google Gemini
- Large files: Split into 1MB chunks, transcribed separately, then merged
- Oversized files: Error handling with user notification
Content Processing: Transcribed text is processed by specialized AI agents (SkillUp or E!BA) that consult program handbooks
Response Generation: AI generates contextual responses based on program knowledge bases
Text Cleanup: Markdown formatting is removed to ensure clean WhatsApp delivery
Response Delivery: Final response is sent back to the user via WhatsApp

Workflow Diagram¶

graph TD
    A[WhatsApp Webhook] --> B[Extract Metadata]
    B --> C[Download Audio File]
    C --> D[Analyze File Size]
    D --> E{Route by Size}

    E -->|Small/Medium| F[Direct Transcription]
    E -->|Large| G[Chunk Audio]
    E -->|Too Large| H[Error Response]

    G --> I[Transcribe Chunks]
    I --> J[Merge Transcriptions]

    F --> K[AI Agent Processing]
    J --> K

    K --> L[Text Cleanup]
    L --> M[Format Response]
    M --> N[Send WhatsApp Reply]

    O[Knowledge Base] --> K
    P[Vector Store] --> K

Trigger¶

Webhook: POST endpoint 33898c22-f5f0-45f0-9983-3fd19c2daebb - Receives WhatsApp message webhooks from Twilio - Triggered when users send voice messages or text to the connected WhatsApp number

Nodes Used¶

Node Type	Purpose
Webhook	Receives incoming WhatsApp messages from Twilio
Code (JavaScript)	Extracts metadata, analyzes file sizes, processes audio chunks
HTTP Request	Downloads audio files from Twilio media URLs
Switch	Routes messages based on content type and file size
Google Gemini	Transcribes audio content using AI
AI Agent	Processes transcribed content with program-specific context
Vector Store (PostgreSQL)	Retrieves relevant information from knowledge bases
OpenAI Chat Model	Powers the conversational AI responses
Memory Buffer	Maintains conversation context per user
Text Splitter	Prepares documents for vector storage
Twilio	Sends WhatsApp responses back to users
Write Binary File	Temporarily stores audio chunks for processing

External Services & Credentials Required¶

Required Services¶

Twilio: WhatsApp Business API integration
Google Gemini: Audio transcription service
OpenAI: Chat completion and embeddings
PostgreSQL: Vector database for knowledge storage
Cohere: Text reranking for improved search results

Credentials Needed¶

twilioApi - Twilio account credentials for WhatsApp messaging
googlePalmApi - Google AI API key for Gemini transcription
openAiApi - OpenAI API key for chat and embeddings
postgres - Database connection for vector storage
cohereApi - Cohere API key for reranking
httpBasicAuth - Twilio media download authentication

Environment Variables¶

No explicit environment variables are defined in this workflow. All configuration is handled through n8n credential management.

Data Flow¶

Input¶

WhatsApp voice messages via Twilio webhook
Audio files in OGG format
User metadata (phone number, profile name, message ID)

Processing¶

Audio transcription to text
File size analysis and chunking logic
Vector similarity search against knowledge bases
AI-powered response generation with program context

Output¶

Text responses sent via WhatsApp
Conversation memory storage
Error notifications for failed processing

Error Handling¶

The workflow includes several error handling mechanisms:

File Size Limits: Files over 25MB trigger error responses
Download Failures: HTTP request errors are caught and handled gracefully
Transcription Errors: Failed transcriptions result in user-friendly error messages
Retry Logic: Transcription nodes have retry-on-fail enabled
Fallback Responses: When processing fails, users receive helpful error messages

Known Limitations¶

Based on the workflow structure: - Maximum audio file size of 25MB - Processing time increases with file size due to chunking - Requires stable internet connection for external API calls - Limited to OGG audio format from WhatsApp - Conversation memory is session-based and may not persist long-term

The workflow appears to be part of a larger system with multiple similar implementations: - Multiple webhook endpoints suggest parallel processing paths - References to "SkillUp" and "E!BA" programs indicate specialized variants - Vector storage suggests integration with document management workflows

Setup Instructions¶

Import Workflow: Import the JSON into your n8n instance
Configure Credentials:
- Set up Twilio API credentials for WhatsApp integration
- Add Google Gemini API key for transcription
- Configure OpenAI API access for chat and embeddings
- Set up PostgreSQL database connection
- Add Cohere API key for reranking
Database Setup:
- Create PostgreSQL database with vector extension
- Set up tables for SkillUp and E!BA knowledge bases
- Load program handbooks into vector storage
Webhook Configuration:
- Copy the webhook URL from n8n
- Configure Twilio WhatsApp webhook to point to this endpoint
- Test with a sample voice message
Knowledge Base Population:
- Upload program documents through the vector storage nodes
- Verify embeddings are generated correctly
- Test retrieval with sample queries
Testing:
- Send test voice messages to verify transcription
- Check AI responses for accuracy and context
- Validate error handling with oversized files
Monitoring:
- Set up logging for failed transcriptions
- Monitor API usage and rate limits
- Track conversation quality and user satisfaction