Skip to content

WhatsApp Voice Note Processing

This workflow processes voice messages sent via WhatsApp, automatically transcribing them using Google Gemini or OpenAI Whisper, and then providing intelligent responses through AI assistants trained on specific program handbooks. It handles both small audio files through direct transcription and large files through intelligent chunking.

Purpose

No business context provided yet — add a context.md to enrich this documentation.

How It Works

  1. Webhook Reception: Receives WhatsApp messages via Twilio webhook
  2. Message Type Detection: Determines if the message contains audio or text
  3. Audio Processing: For voice messages, downloads the audio file from Twilio
  4. File Size Analysis: Analyzes audio file size to determine processing strategy:
    • Small files (< 2MB): Direct transcription
    • Medium files (2-10MB): Direct transcription
    • Large files (10-25MB): Chunked transcription with 1MB or 0.5MB segments
    • Too large files (> 25MB): Error handling
  5. Transcription: Uses Google Gemini or OpenAI Whisper to convert audio to text
  6. Chunk Processing: For large files, processes chunks in parallel and merges results
  7. AI Response: Feeds transcribed text to specialized AI agents (SkillUp or E!BA) that reference program handbooks
  8. Response Delivery: Sends intelligent responses back via WhatsApp

Workflow Diagram

graph TD
    A[Webhook Trigger] --> B[Message Type Detection]
    B -->|Voice Note| C[Extract Metadata]
    B -->|Text| D[AI Agent Direct]
    C --> E[Download Audio File]
    E --> F[Analyze File Size]
    F -->|Small/Medium| G[Direct Transcription]
    F -->|Large| H[Chunk Audio]
    F -->|Too Large| I[Error Response]
    H --> J[Process Chunks]
    J --> K[Merge Transcriptions]
    G --> L[AI Agent Processing]
    K --> L
    L --> M[Text Cleanup]
    M --> N[Send WhatsApp Response]
    D --> M
    I --> O[Send Error Message]

Trigger

  • Type: Webhook (POST)
  • Path: /33898c22-f5f0-45f0-9983-3fd19c2daebb
  • Source: Twilio WhatsApp webhook for incoming messages

Nodes Used

Node Type Purpose
Webhook Receives incoming WhatsApp messages from Twilio
Code (JavaScript) Extracts metadata, analyzes file sizes, processes chunks, cleans text
HTTP Request Downloads audio files from Twilio media URLs
Switch Routes processing based on message type and file size
Google Gemini Transcribes audio files to text
OpenAI Transcription Alternative transcription service
AI Agent Provides intelligent responses using program handbooks
Vector Store (PGVector) Stores and retrieves handbook knowledge
OpenAI Chat Model Powers the AI agents
Memory Buffer Maintains conversation context
Twilio Sends responses back via WhatsApp
Write Binary File Temporarily stores audio chunks
Function Item Collects and processes multiple chunk results

External Services & Credentials Required

Twilio

  • Purpose: WhatsApp messaging platform
  • Credentials: Twilio API credentials
  • Usage: Receiving webhooks, downloading media, sending responses

Google Gemini

  • Purpose: Audio transcription
  • Credentials: Google Palm API key
  • Usage: Converting voice notes to text

OpenAI

  • Purpose: AI chat models and alternative transcription
  • Credentials: OpenAI API key
  • Usage: Powering AI agents and Whisper transcription

PostgreSQL with PGVector

  • Purpose: Vector database for handbook storage
  • Credentials: PostgreSQL connection details
  • Usage: Storing and retrieving program handbook knowledge

Cohere

  • Purpose: Re-ranking search results
  • Credentials: Cohere API key
  • Usage: Improving handbook search relevance

Environment Variables

No specific environment variables are documented in the workflow configuration. Credentials are managed through n8n's credential system.

Data Flow

Input

  • Twilio WhatsApp webhook payload containing:
    • Message metadata (sender, receiver, message ID)
    • Audio file URL (for voice messages)
    • Text content (for text messages)
    • Message type indicator

Processing

  • Audio files are downloaded and analyzed for size
  • Large files are chunked into manageable segments
  • Audio is transcribed to text using AI services
  • Text is processed by specialized AI agents with handbook context
  • Responses are formatted for WhatsApp delivery

Output

  • WhatsApp message responses sent via Twilio
  • Processed transcriptions stored in conversation memory
  • Error messages for failed processing attempts

Error Handling

  1. File Size Errors: Files over 25MB trigger error responses
  2. Download Failures: HTTP request errors are handled with retry logic
  3. Transcription Failures: Backup transcription services and error messages
  4. Chunking Errors: Individual chunk failures don't stop the entire process
  5. AI Agent Errors: Fallback responses when handbook lookup fails

Known Limitations

  • Maximum audio file size of 25MB
  • Chunking may introduce slight transcription inconsistencies at boundaries
  • Processing time increases significantly for large files
  • Dependent on external service availability (Twilio, Google, OpenAI)

No related workflows are documented in the current context.

Setup Instructions

  1. Import Workflow: Import the JSON configuration into your n8n instance

  2. Configure Credentials:

    • Set up Twilio API credentials for WhatsApp integration
    • Configure Google Palm API for Gemini transcription
    • Add OpenAI API credentials for chat models and Whisper
    • Set up PostgreSQL connection with PGVector extension
    • Configure Cohere API for re-ranking
  3. Database Setup:

    • Create PostgreSQL database with PGVector extension
    • Set up tables for handbook storage (skillup_original, skillup)
  4. Webhook Configuration:

    • Configure Twilio WhatsApp webhook to point to your n8n webhook URL
    • Ensure webhook authentication is properly set up
  5. Handbook Upload:

    • Use the document loader nodes to upload program handbooks
    • Verify vector embeddings are created successfully
  6. Testing:

    • Send test voice messages to verify transcription
    • Test text messages to ensure AI responses work
    • Verify error handling with oversized files
  7. Production Deployment:

    • Enable the webhook trigger
    • Monitor execution logs for any issues
    • Set up appropriate retry and error handling policies