WhatsApp Voice Note Processing¶
This workflow processes voice messages sent via WhatsApp, automatically transcribing them using Google Gemini or OpenAI Whisper, and then providing intelligent responses through AI assistants trained on specific program handbooks. It handles both small audio files through direct transcription and large files through intelligent chunking.
Purpose¶
No business context provided yet — add a context.md to enrich this documentation.
How It Works¶
- Webhook Reception: Receives WhatsApp messages via Twilio webhook
- Message Type Detection: Determines if the message contains audio or text
- Audio Processing: For voice messages, downloads the audio file from Twilio
- File Size Analysis: Analyzes audio file size to determine processing strategy:
- Small files (< 2MB): Direct transcription
- Medium files (2-10MB): Direct transcription
- Large files (10-25MB): Chunked transcription with 1MB or 0.5MB segments
- Too large files (> 25MB): Error handling
- Transcription: Uses Google Gemini or OpenAI Whisper to convert audio to text
- Chunk Processing: For large files, processes chunks in parallel and merges results
- AI Response: Feeds transcribed text to specialized AI agents (SkillUp or E!BA) that reference program handbooks
- Response Delivery: Sends intelligent responses back via WhatsApp
Workflow Diagram¶
graph TD
A[Webhook Trigger] --> B[Message Type Detection]
B -->|Voice Note| C[Extract Metadata]
B -->|Text| D[AI Agent Direct]
C --> E[Download Audio File]
E --> F[Analyze File Size]
F -->|Small/Medium| G[Direct Transcription]
F -->|Large| H[Chunk Audio]
F -->|Too Large| I[Error Response]
H --> J[Process Chunks]
J --> K[Merge Transcriptions]
G --> L[AI Agent Processing]
K --> L
L --> M[Text Cleanup]
M --> N[Send WhatsApp Response]
D --> M
I --> O[Send Error Message]
Trigger¶
- Type: Webhook (POST)
- Path:
/33898c22-f5f0-45f0-9983-3fd19c2daebb - Source: Twilio WhatsApp webhook for incoming messages
Nodes Used¶
| Node Type | Purpose |
|---|---|
| Webhook | Receives incoming WhatsApp messages from Twilio |
| Code (JavaScript) | Extracts metadata, analyzes file sizes, processes chunks, cleans text |
| HTTP Request | Downloads audio files from Twilio media URLs |
| Switch | Routes processing based on message type and file size |
| Google Gemini | Transcribes audio files to text |
| OpenAI Transcription | Alternative transcription service |
| AI Agent | Provides intelligent responses using program handbooks |
| Vector Store (PGVector) | Stores and retrieves handbook knowledge |
| OpenAI Chat Model | Powers the AI agents |
| Memory Buffer | Maintains conversation context |
| Twilio | Sends responses back via WhatsApp |
| Write Binary File | Temporarily stores audio chunks |
| Function Item | Collects and processes multiple chunk results |
External Services & Credentials Required¶
Twilio¶
- Purpose: WhatsApp messaging platform
- Credentials: Twilio API credentials
- Usage: Receiving webhooks, downloading media, sending responses
Google Gemini¶
- Purpose: Audio transcription
- Credentials: Google Palm API key
- Usage: Converting voice notes to text
OpenAI¶
- Purpose: AI chat models and alternative transcription
- Credentials: OpenAI API key
- Usage: Powering AI agents and Whisper transcription
PostgreSQL with PGVector¶
- Purpose: Vector database for handbook storage
- Credentials: PostgreSQL connection details
- Usage: Storing and retrieving program handbook knowledge
Cohere¶
- Purpose: Re-ranking search results
- Credentials: Cohere API key
- Usage: Improving handbook search relevance
Environment Variables¶
No specific environment variables are documented in the workflow configuration. Credentials are managed through n8n's credential system.
Data Flow¶
Input¶
- Twilio WhatsApp webhook payload containing:
- Message metadata (sender, receiver, message ID)
- Audio file URL (for voice messages)
- Text content (for text messages)
- Message type indicator
Processing¶
- Audio files are downloaded and analyzed for size
- Large files are chunked into manageable segments
- Audio is transcribed to text using AI services
- Text is processed by specialized AI agents with handbook context
- Responses are formatted for WhatsApp delivery
Output¶
- WhatsApp message responses sent via Twilio
- Processed transcriptions stored in conversation memory
- Error messages for failed processing attempts
Error Handling¶
- File Size Errors: Files over 25MB trigger error responses
- Download Failures: HTTP request errors are handled with retry logic
- Transcription Failures: Backup transcription services and error messages
- Chunking Errors: Individual chunk failures don't stop the entire process
- AI Agent Errors: Fallback responses when handbook lookup fails
Known Limitations¶
- Maximum audio file size of 25MB
- Chunking may introduce slight transcription inconsistencies at boundaries
- Processing time increases significantly for large files
- Dependent on external service availability (Twilio, Google, OpenAI)
Related Workflows¶
No related workflows are documented in the current context.
Setup Instructions¶
-
Import Workflow: Import the JSON configuration into your n8n instance
-
Configure Credentials:
- Set up Twilio API credentials for WhatsApp integration
- Configure Google Palm API for Gemini transcription
- Add OpenAI API credentials for chat models and Whisper
- Set up PostgreSQL connection with PGVector extension
- Configure Cohere API for re-ranking
-
Database Setup:
- Create PostgreSQL database with PGVector extension
- Set up tables for handbook storage (
skillup_original,skillup)
-
Webhook Configuration:
- Configure Twilio WhatsApp webhook to point to your n8n webhook URL
- Ensure webhook authentication is properly set up
-
Handbook Upload:
- Use the document loader nodes to upload program handbooks
- Verify vector embeddings are created successfully
-
Testing:
- Send test voice messages to verify transcription
- Test text messages to ensure AI responses work
- Verify error handling with oversized files
-
Production Deployment:
- Enable the webhook trigger
- Monitor execution logs for any issues
- Set up appropriate retry and error handling policies