Skip to content

WhatsApp Multi-Model Document Processor

This workflow creates an intelligent WhatsApp bot that can analyze and respond to various types of content including spreadsheets, PDFs, images, Word documents, and voice messages. Users can send files or ask questions via WhatsApp, and the system will process the content using AI models and provide intelligent responses.

Purpose

No business context provided yet — add a context.md to enrich this documentation.

How It Works

  1. Message Reception: The workflow starts when a WhatsApp message is received via webhook
  2. Content Analysis: The system analyzes the incoming message to determine the type of content (spreadsheet, PDF, image, voice, text, or Word document)
  3. File Processing: Based on the content type, the workflow downloads and processes the file:
    • Spreadsheets: Downloads CSV/Excel files and converts them to structured JSON
    • PDFs: Extracts text content from PDF documents
    • Images: Uses OpenAI's vision model to analyze and describe image content
    • Voice Messages: Converts audio to text using speech-to-text transcription
    • Word Documents: Converts DOCX files to text format
    • Text Questions: Processes direct text queries
  4. AI Analysis: All processed content is sent to an AI agent powered by GPT-4 for intelligent analysis and response generation
  5. Response Delivery: The AI-generated response is sent back to the user via WhatsApp

Workflow Diagram

graph TD
    A[WhatsApp message] --> B[analyze type of the input]
    B --> C[Switch]
    C -->|spreadsheet| D[download sheet]
    C -->|pdf| E[download file image pdf]
    C -->|image| E
    C -->|voice| F[get audio from twilio1]
    C -->|text| G[content analyze]
    C -->|word| E

    D --> H[convert to j son]
    H --> I[all in row]
    I --> J[json to text]
    J --> G

    E --> K[path]
    K -->|pdf| L[pdf to text]
    K -->|image| M[Analyze image]
    K -->|word| N[create Job]

    L --> G
    M --> O[content to text]
    O --> G

    N --> P[Wait1]
    P --> Q[convert Docs to text]
    Q --> R[Switch2]
    R -->|finished| S[Download text]
    R -->|processing| P
    S --> G

    F --> T[If]
    T -->|large file| U[download voice file]
    T -->|small file| V[Transcribe a recording]
    U --> W[import audio file to cloud convert]
    W --> X[Wait]
    X --> Y[convert audio file]
    Y --> Z[Switch1]
    Z -->|finished| AA[Download audio file]
    Z -->|processing| X
    AA --> V
    V --> BB[Edit Fields]
    BB --> G

    G --> CC[send to WhatsApp]

    DD[OpenAI Chat Model] -.-> G
    EE[Simple Memory] -.-> G

Trigger

Webhook: The workflow is triggered by incoming WhatsApp messages sent to the endpoint /whatsapp-multimedia via HTTP POST requests.

Nodes Used

Node Type Purpose
Webhook Receives incoming WhatsApp messages
Code Analyzes message content type and formats data
Switch Routes messages based on content type
HTTP Request Downloads files from URLs and external APIs
Extract From File Converts files to text/JSON format
OpenAI (Vision) Analyzes image content using GPT-4O
OpenAI (Audio) Transcribes voice messages to text
LangChain Agent Processes content with AI analysis
OpenAI Chat Model Provides GPT-4 language model capabilities
Memory Buffer Maintains conversation context
Twilio Sends responses back to WhatsApp
Wait Handles processing delays for file conversions
If Conditional logic for file size handling

External Services & Credentials Required

Required Credentials:

  • OpenAI API: For image analysis, voice transcription, and chat responses
  • Twilio API: For WhatsApp message sending and receiving
  • HTTP Basic Auth (Twilio): For downloading media files from Twilio
  • CloudConvert API: For file format conversions (audio and documents)

External Services:

  • OpenAI: GPT-4 and GPT-4O models for content analysis
  • Twilio: WhatsApp Business API integration
  • CloudConvert: File conversion service for audio and document processing

Environment Variables

No specific environment variables are defined in this workflow. All configuration is handled through n8n credentials.

Data Flow

Input:

  • WhatsApp messages containing:
    • File attachments (PDF, images, spreadsheets, Word docs, audio)
    • Text questions
    • URLs to documents

Processing:

  • File download and format conversion
  • Content extraction and text processing
  • AI-powered analysis and response generation

Output:

  • Intelligent text responses sent back via WhatsApp
  • Summaries, analysis, and answers based on the input content

Error Handling

The workflow includes several error handling mechanisms:

  1. File Size Checking: Large audio files are processed through CloudConvert before transcription
  2. Processing Status Monitoring: Wait nodes and switches monitor conversion job status
  3. Retry Logic: Processing jobs are checked repeatedly until completion
  4. Fallback Paths: Different processing routes for various file types and sizes

Known Limitations

  • Audio files larger than 25MB require additional processing time through CloudConvert
  • The workflow depends on external services (OpenAI, CloudConvert) which may have rate limits
  • Memory buffer is limited to 50 context messages
  • File processing times vary based on file size and external service availability

No related workflows specified in the current context.

Setup Instructions

  1. Import the Workflow: Import the JSON configuration into your n8n instance

  2. Configure Credentials:

    • Set up OpenAI API credentials with access to GPT-4 and GPT-4O models
    • Configure Twilio API credentials for WhatsApp integration
    • Add HTTP Basic Auth credentials for Twilio media downloads
    • Set up CloudConvert API credentials for file conversions
  3. Webhook Configuration:

    • Note the webhook URL generated for the "WhatsApp message" node
    • Configure this URL in your Twilio WhatsApp webhook settings
  4. Test the Setup:

    • Send a test message to your WhatsApp number
    • Verify that files are properly downloaded and processed
    • Check that AI responses are generated and sent back
  5. Activate the Workflow: Enable the workflow to start processing incoming messages

  6. Monitor Performance: Check execution logs to ensure all external service integrations are working correctly