Skip to content

WhatsApp Voice Note Processing

This workflow processes voice messages sent via WhatsApp, transcribing them using Google Gemini AI and responding with contextual information from knowledge bases. It handles both small audio files through direct transcription and larger files through intelligent chunking, then routes the transcribed content to specialized AI assistants for SkillUp and E!BA programs.

Purpose

No business context provided yet — add a context.md to enrich this documentation.

Based on the workflow structure, this appears to serve program staff and participants who need to: - Send voice messages via WhatsApp for quick communication - Get transcribed responses from audio content - Access program-specific guidance through AI assistants - Handle various audio file sizes efficiently

How It Works

  1. Message Reception: WhatsApp webhook receives incoming voice messages from users
  2. Message Type Detection: System determines if the message contains audio or text content
  3. Audio Download: For voice messages, the system downloads the audio file from Twilio's media URL
  4. File Size Analysis: Audio files are analyzed and categorized as small (<2MB), medium (2-10MB), large (10-25MB), or too large (>25MB)
  5. Transcription Routing:
    • Small/medium files: Direct transcription via Google Gemini
    • Large files: Split into 1MB chunks, transcribed separately, then merged
    • Oversized files: Error handling with user notification
  6. Content Processing: Transcribed text is processed by specialized AI agents (SkillUp or E!BA) that consult program handbooks
  7. Response Generation: AI generates contextual responses based on program knowledge bases
  8. Text Cleanup: Markdown formatting is removed to ensure clean WhatsApp delivery
  9. Response Delivery: Final response is sent back to the user via WhatsApp

Workflow Diagram

graph TD
    A[WhatsApp Webhook] --> B[Extract Metadata]
    B --> C[Download Audio File]
    C --> D[Analyze File Size]
    D --> E{Route by Size}

    E -->|Small/Medium| F[Direct Transcription]
    E -->|Large| G[Chunk Audio]
    E -->|Too Large| H[Error Response]

    G --> I[Transcribe Chunks]
    I --> J[Merge Transcriptions]

    F --> K[AI Agent Processing]
    J --> K

    K --> L[Text Cleanup]
    L --> M[Format Response]
    M --> N[Send WhatsApp Reply]

    O[Knowledge Base] --> K
    P[Vector Store] --> K

Trigger

Webhook: POST endpoint 33898c22-f5f0-45f0-9983-3fd19c2daebb - Receives WhatsApp message webhooks from Twilio - Triggered when users send voice messages or text to the connected WhatsApp number

Nodes Used

Node Type Purpose
Webhook Receives incoming WhatsApp messages from Twilio
Code (JavaScript) Extracts metadata, analyzes file sizes, processes audio chunks
HTTP Request Downloads audio files from Twilio media URLs
Switch Routes messages based on content type and file size
Google Gemini Transcribes audio content using AI
AI Agent Processes transcribed content with program-specific context
Vector Store (PostgreSQL) Retrieves relevant information from knowledge bases
OpenAI Chat Model Powers the conversational AI responses
Memory Buffer Maintains conversation context per user
Text Splitter Prepares documents for vector storage
Twilio Sends WhatsApp responses back to users
Write Binary File Temporarily stores audio chunks for processing

External Services & Credentials Required

Required Services

  • Twilio: WhatsApp Business API integration
  • Google Gemini: Audio transcription service
  • OpenAI: Chat completion and embeddings
  • PostgreSQL: Vector database for knowledge storage
  • Cohere: Text reranking for improved search results

Credentials Needed

  • twilioApi - Twilio account credentials for WhatsApp messaging
  • googlePalmApi - Google AI API key for Gemini transcription
  • openAiApi - OpenAI API key for chat and embeddings
  • postgres - Database connection for vector storage
  • cohereApi - Cohere API key for reranking
  • httpBasicAuth - Twilio media download authentication

Environment Variables

No explicit environment variables are defined in this workflow. All configuration is handled through n8n credential management.

Data Flow

Input

  • WhatsApp voice messages via Twilio webhook
  • Audio files in OGG format
  • User metadata (phone number, profile name, message ID)

Processing

  • Audio transcription to text
  • File size analysis and chunking logic
  • Vector similarity search against knowledge bases
  • AI-powered response generation with program context

Output

  • Text responses sent via WhatsApp
  • Conversation memory storage
  • Error notifications for failed processing

Error Handling

The workflow includes several error handling mechanisms:

  1. File Size Limits: Files over 25MB trigger error responses
  2. Download Failures: HTTP request errors are caught and handled gracefully
  3. Transcription Errors: Failed transcriptions result in user-friendly error messages
  4. Retry Logic: Transcription nodes have retry-on-fail enabled
  5. Fallback Responses: When processing fails, users receive helpful error messages

Known Limitations

Based on the workflow structure: - Maximum audio file size of 25MB - Processing time increases with file size due to chunking - Requires stable internet connection for external API calls - Limited to OGG audio format from WhatsApp - Conversation memory is session-based and may not persist long-term

The workflow appears to be part of a larger system with multiple similar implementations: - Multiple webhook endpoints suggest parallel processing paths - References to "SkillUp" and "E!BA" programs indicate specialized variants - Vector storage suggests integration with document management workflows

Setup Instructions

  1. Import Workflow: Import the JSON into your n8n instance

  2. Configure Credentials:

    • Set up Twilio API credentials for WhatsApp integration
    • Add Google Gemini API key for transcription
    • Configure OpenAI API access for chat and embeddings
    • Set up PostgreSQL database connection
    • Add Cohere API key for reranking
  3. Database Setup:

    • Create PostgreSQL database with vector extension
    • Set up tables for SkillUp and E!BA knowledge bases
    • Load program handbooks into vector storage
  4. Webhook Configuration:

    • Copy the webhook URL from n8n
    • Configure Twilio WhatsApp webhook to point to this endpoint
    • Test with a sample voice message
  5. Knowledge Base Population:

    • Upload program documents through the vector storage nodes
    • Verify embeddings are generated correctly
    • Test retrieval with sample queries
  6. Testing:

    • Send test voice messages to verify transcription
    • Check AI responses for accuracy and context
    • Validate error handling with oversized files
  7. Monitoring:

    • Set up logging for failed transcriptions
    • Monitor API usage and rate limits
    • Track conversation quality and user satisfaction