Skip to content

WhatsApp Multi Model

A comprehensive AI-powered WhatsApp bot that accepts multiple file formats (spreadsheets, PDFs, images, audio, Word documents) and enables users to ask questions about their content through natural conversation with memory persistence.

Purpose

No business context provided yet — add a context.md to enrich this documentation.

This workflow serves as an intelligent document analysis assistant accessible through WhatsApp. Users can upload any supported file type and engage in conversational Q&A about the content, with the AI maintaining context across the conversation session.

How It Works

  1. Message Reception: WhatsApp messages are received via Twilio webhook
  2. Content Analysis: The system analyzes incoming messages to determine if they contain files, links, or text questions
  3. File Type Detection: Identifies the type of content (spreadsheet, PDF, image, audio, Word document, or text question)
  4. File Processing: Downloads and processes files based on their type:
    • Spreadsheets: Converts to JSON then text format for analysis
    • PDFs: Extracts text content directly
    • Images: Uses OpenAI's vision model to analyze and describe content
    • Audio: Converts format if needed, then transcribes using OpenAI Whisper
    • Word Documents: Converts to CSV format via CloudConvert
  5. AI Analysis: Processes extracted content through an AI agent with conversation memory
  6. Response Generation: Generates contextual responses based on file content and user questions
  7. Reply Delivery: Sends responses back to the user via WhatsApp

Workflow Diagram

graph TD
    A[WhatsApp message] --> B[analyze type of the input]
    B --> C[Switch]

    C -->|spreadsheet| D[download sheet]
    C -->|pdf| E[download file image pdf]
    C -->|image| E
    C -->|voice| F[get audio from twilio1]
    C -->|question| G[content analyze]
    C -->|word| E

    D --> H[convert to j son]
    H --> I[all in row]
    I --> J[json to text]
    J --> G

    E --> K[path]
    K -->|pdf| L[pdf to text]
    K -->|image| M[Analyze image]
    K -->|word| N[create Job]

    L --> G
    M --> O[content to text]
    O --> G

    N --> P[Wait1]
    P --> Q[convert Docs to text]
    Q --> R[Switch2]
    R -->|finished| S[Download text]
    R -->|processing| P

    F --> T[If]
    T -->|large file| U[download voice file]
    T -->|small file| V[Transcribe a recording]

    U --> W[import audio file to cloud convert]
    W --> X[Wait]
    X --> Y[convert audio file]
    Y --> Z[Switch1]
    Z -->|finished| AA[Download audio file]
    Z -->|processing| X
    AA --> V

    V --> BB[Edit Fields]
    BB --> G

    G --> CC[send to WhatsApp]

    DD[OpenAI Chat Model] -.->|ai_languageModel| G
    EE[Simple Memory] -.->|ai_memory| G

Trigger

  • Type: Webhook
  • Method: POST
  • Path: /webhook/whatsapp-multimedia
  • Source: Twilio WhatsApp integration

Nodes Used

Node Type Purpose
Webhook Receives WhatsApp messages from Twilio
Code Analyzes input type and formats data
Switch Routes processing based on file type
HTTP Request Downloads files from Twilio and external services
Extract From File Converts spreadsheets and PDFs to text
OpenAI (Vision) Analyzes image content
OpenAI (Audio) Transcribes audio files
LangChain Agent Processes content and generates responses
OpenAI Chat Model Powers the conversational AI
Buffer Window Memory Maintains conversation context
If Handles conditional logic for file size
Wait Manages timing for file conversion processes
Set/Edit Fields Formats data between nodes
Twilio Sends responses back to WhatsApp

External Services & Credentials Required

Required Credentials

  • Twilio API: For WhatsApp messaging
    • Account SID
    • Auth Token
    • WhatsApp phone number
  • OpenAI API: For AI processing
    • API key with access to GPT-4, GPT-4 Vision, and Whisper
  • CloudConvert API: For file format conversion
    • API token
  • HTTP Basic Auth: For Twilio media downloads
    • Username and password

External Services

  • Twilio (WhatsApp Business API)
  • OpenAI (GPT-4, GPT-4 Vision, Whisper)
  • CloudConvert (file conversion)

Environment Variables

No specific environment variables are configured in this workflow. All authentication is handled through n8n credential management.

Data Flow

Input

  • WhatsApp messages containing:
    • File attachments (PDF, images, audio, Word docs)
    • Spreadsheet links (Google Sheets, CSV, XLSX)
    • Text questions about previously uploaded content

Processing

  • File content extraction and conversion to text
  • AI analysis and summarization
  • Conversational Q&A with memory retention

Output

  • WhatsApp messages containing:
    • File content summaries
    • Answers to user questions
    • Analysis insights and key findings

Error Handling

The workflow includes several error handling mechanisms:

  • File Size Check: Large audio files (>25MB) are processed through CloudConvert for format conversion
  • Conversion Status Monitoring: Wait nodes and switch logic handle CloudConvert job status (processing, finished, error)
  • Fallback Processing: Multiple paths for different file types ensure robust content extraction
  • Memory Management: Conversation context is maintained with a 50-message window to prevent memory overflow

Known Limitations

  • Audio files larger than 25MB require additional processing time through CloudConvert
  • Google Sheets links are automatically converted to CSV export format
  • Conversation memory is session-based using workflow ID
  • CloudConvert API has rate limits and processing time constraints
  • Image analysis quality depends on image clarity and content

No related workflows specified in the current context.

Setup Instructions

  1. Import Workflow: Import the JSON workflow into your n8n instance

  2. Configure Credentials:

    • Set up Twilio API credentials with your WhatsApp Business account
    • Add OpenAI API key with appropriate model access
    • Configure CloudConvert API token
    • Set up HTTP Basic Auth for Twilio media downloads
  3. Webhook Configuration:

    • Copy the webhook URL from the "WhatsApp message" node
    • Configure this URL in your Twilio WhatsApp webhook settings
  4. Test the Setup:

    • Send a test file via WhatsApp to verify file processing
    • Ask questions about the uploaded content to test AI responses
    • Verify conversation memory by asking follow-up questions
  5. Activate Workflow: Ensure the workflow is activated in n8n

  6. Monitor Performance: Check execution logs for any errors or processing delays

The workflow is ready to handle multi-format file analysis through WhatsApp once all credentials are properly configured and the webhook is connected to your Twilio account.