WhatsApp Voice Note Processing¶

This workflow processes voice messages sent via WhatsApp, automatically transcribing them using Google Gemini or OpenAI Whisper, and then providing intelligent responses through AI assistants trained on specific program handbooks. It handles both small audio files through direct transcription and large files through intelligent chunking.

Purpose¶

No business context provided yet — add a context.md to enrich this documentation.

How It Works¶

Webhook Reception: Receives WhatsApp messages via Twilio webhook
Message Type Detection: Determines if the message contains audio or text
Audio Processing: For voice messages, downloads the audio file from Twilio
File Size Analysis: Analyzes audio file size to determine processing strategy:
- Small files (< 2MB): Direct transcription
- Medium files (2-10MB): Direct transcription
- Large files (10-25MB): Chunked transcription with 1MB or 0.5MB segments
- Too large files (> 25MB): Error handling
Transcription: Uses Google Gemini or OpenAI Whisper to convert audio to text
Chunk Processing: For large files, processes chunks in parallel and merges results
AI Response: Feeds transcribed text to specialized AI agents (SkillUp or E!BA) that reference program handbooks
Response Delivery: Sends intelligent responses back via WhatsApp

Workflow Diagram¶

graph TD
    A[Webhook Trigger] --> B[Message Type Detection]
    B -->|Voice Note| C[Extract Metadata]
    B -->|Text| D[AI Agent Direct]
    C --> E[Download Audio File]
    E --> F[Analyze File Size]
    F -->|Small/Medium| G[Direct Transcription]
    F -->|Large| H[Chunk Audio]
    F -->|Too Large| I[Error Response]
    H --> J[Process Chunks]
    J --> K[Merge Transcriptions]
    G --> L[AI Agent Processing]
    K --> L
    L --> M[Text Cleanup]
    M --> N[Send WhatsApp Response]
    D --> M
    I --> O[Send Error Message]

Trigger¶

Type: Webhook (POST)
Path: /33898c22-f5f0-45f0-9983-3fd19c2daebb
Source: Twilio WhatsApp webhook for incoming messages

Nodes Used¶

Node Type	Purpose
Webhook	Receives incoming WhatsApp messages from Twilio
Code (JavaScript)	Extracts metadata, analyzes file sizes, processes chunks, cleans text
HTTP Request	Downloads audio files from Twilio media URLs
Switch	Routes processing based on message type and file size
Google Gemini	Transcribes audio files to text
OpenAI Transcription	Alternative transcription service
AI Agent	Provides intelligent responses using program handbooks
Vector Store (PGVector)	Stores and retrieves handbook knowledge
OpenAI Chat Model	Powers the AI agents
Memory Buffer	Maintains conversation context
Twilio	Sends responses back via WhatsApp
Write Binary File	Temporarily stores audio chunks
Function Item	Collects and processes multiple chunk results

External Services & Credentials Required¶

Twilio¶

Purpose: WhatsApp messaging platform
Credentials: Twilio API credentials
Usage: Receiving webhooks, downloading media, sending responses

Google Gemini¶

Purpose: Audio transcription
Credentials: Google Palm API key
Usage: Converting voice notes to text

OpenAI¶

Purpose: AI chat models and alternative transcription
Credentials: OpenAI API key
Usage: Powering AI agents and Whisper transcription

PostgreSQL with PGVector¶

Purpose: Vector database for handbook storage
Credentials: PostgreSQL connection details
Usage: Storing and retrieving program handbook knowledge

Cohere¶

Purpose: Re-ranking search results
Credentials: Cohere API key
Usage: Improving handbook search relevance

Environment Variables¶

No specific environment variables are documented in the workflow configuration. Credentials are managed through n8n's credential system.

Data Flow¶

Input¶

Twilio WhatsApp webhook payload containing:
- Message metadata (sender, receiver, message ID)
- Audio file URL (for voice messages)
- Text content (for text messages)
- Message type indicator

Processing¶

Audio files are downloaded and analyzed for size
Large files are chunked into manageable segments
Audio is transcribed to text using AI services
Text is processed by specialized AI agents with handbook context
Responses are formatted for WhatsApp delivery

Output¶

WhatsApp message responses sent via Twilio
Processed transcriptions stored in conversation memory
Error messages for failed processing attempts

Error Handling¶

File Size Errors: Files over 25MB trigger error responses
Download Failures: HTTP request errors are handled with retry logic
Transcription Failures: Backup transcription services and error messages
Chunking Errors: Individual chunk failures don't stop the entire process
AI Agent Errors: Fallback responses when handbook lookup fails

Known Limitations¶

Maximum audio file size of 25MB
Chunking may introduce slight transcription inconsistencies at boundaries
Processing time increases significantly for large files
Dependent on external service availability (Twilio, Google, OpenAI)

No related workflows are documented in the current context.

Setup Instructions¶

Import Workflow: Import the JSON configuration into your n8n instance
Configure Credentials:
- Set up Twilio API credentials for WhatsApp integration
- Configure Google Palm API for Gemini transcription
- Add OpenAI API credentials for chat models and Whisper
- Set up PostgreSQL connection with PGVector extension
- Configure Cohere API for re-ranking
Database Setup:
- Create PostgreSQL database with PGVector extension
- Set up tables for handbook storage (skillup_original, skillup)
Webhook Configuration:
- Configure Twilio WhatsApp webhook to point to your n8n webhook URL
- Ensure webhook authentication is properly set up
Handbook Upload:
- Use the document loader nodes to upload program handbooks
- Verify vector embeddings are created successfully
Testing:
- Send test voice messages to verify transcription
- Test text messages to ensure AI responses work
- Verify error handling with oversized files
Production Deployment:
- Enable the webhook trigger
- Monitor execution logs for any issues
- Set up appropriate retry and error handling policies