Skip to content

WhatsApp Multi-Model Document Processing

A comprehensive WhatsApp integration that automatically processes various file types (PDFs, images, spreadsheets, Word documents, and audio) sent via WhatsApp messages, extracts their content using AI services, and provides intelligent analysis and responses back to the sender.

Purpose

No business context provided yet — add a context.md to enrich this documentation.

This workflow serves as a universal document processing assistant accessible through WhatsApp. Users can send various types of files or ask questions, and the system will automatically detect the content type, extract relevant information, and provide intelligent analysis or answers. It's particularly useful for teams or individuals who need quick document analysis capabilities without switching between multiple applications.

How It Works

  1. Message Reception: The workflow receives WhatsApp messages through a webhook endpoint
  2. Content Analysis: Incoming messages are analyzed to determine the type of content (spreadsheet link, PDF, image, audio, Word document, or text question)
  3. File Processing: Based on the content type, files are downloaded and processed:
    • Spreadsheets: Downloaded and converted to JSON format for analysis
    • PDFs: Text content is extracted directly
    • Images: Analyzed using OpenAI's vision capabilities
    • Audio: Converted to compatible format and transcribed using OpenAI's Whisper
    • Word Documents: Converted to text using CloudConvert service
    • Text Questions: Processed directly
  4. AI Analysis: All extracted content is analyzed by GPT-4 with conversation memory
  5. Response: The AI-generated analysis or answer is sent back to the original WhatsApp sender

Workflow Diagram

graph TD
    A[WhatsApp message] --> B[analyze type of the input]
    B --> C[Switch]

    C -->|spreadsheet| D[download sheet]
    C -->|pdf| E[download file image pdf]
    C -->|image| E
    C -->|voice| F[get audio from twilio1]
    C -->|question| G[content analyze]
    C -->|word| E

    D --> H[convert to j son]
    H --> I[all in row]
    I --> J[json to text]
    J --> G

    E --> K[path]
    K -->|pdf| L[pdf to text]
    K -->|image| M[Analyze image]
    K -->|word| N[create Job]

    L --> G
    M --> O[content to text]
    O --> G

    N --> P[Wait1]
    P --> Q[convert Docs to text]
    Q --> R[Switch2]
    R -->|finished| S[Download text]
    R -->|processing| P
    S --> G

    F --> T[If]
    T -->|large file| U[download voice file]
    T -->|small file| V[Transcribe a recording]
    U --> W[import audio file to cloud convert]
    W --> X[Wait]
    X --> Y[convert audio file]
    Y --> Z[Switch1]
    Z -->|finished| AA[Download audio file]
    Z -->|processing| X
    AA --> V
    V --> BB[Edit Fields]
    BB --> G

    G --> CC[send to WhatsApp]

    DD[OpenAI Chat Model] -.-> G
    EE[Simple Memory] -.-> G

Trigger

Webhook: POST endpoint at /whatsapp-multimedia that receives WhatsApp messages from Twilio's messaging service.

Nodes Used

Node Type Purpose
Webhook Receives incoming WhatsApp messages
Code Analyzes message content to determine file type
Switch Routes processing based on content type
HTTP Request Downloads files from URLs and interacts with external APIs
Extract From File Converts spreadsheets to JSON and PDFs to text
OpenAI (Vision) Analyzes image content
OpenAI (Audio) Transcribes audio files
LangChain Agent Provides AI-powered content analysis with memory
OpenAI Chat Model Powers the conversational AI responses
Buffer Window Memory Maintains conversation context
If Handles conditional logic for file size checks
Wait Manages timing for file conversion processes
Set Formats data for downstream processing
Twilio Sends responses back to WhatsApp

External Services & Credentials Required

Required Credentials:

  • OpenAI API: For image analysis, audio transcription, and chat responses
  • Twilio API: For WhatsApp message handling (receiving and sending)
  • CloudConvert API: For audio format conversion and Word document processing
  • HTTP Basic Auth: For accessing Twilio media files

External Services:

  • Twilio WhatsApp Business API: Message routing and media handling
  • OpenAI GPT-4/GPT-4O: Content analysis and conversation
  • OpenAI Whisper: Audio transcription
  • CloudConvert: File format conversion

Environment Variables

No specific environment variables are configured in this workflow. All authentication is handled through n8n credential management.

Data Flow

Input:

  • WhatsApp messages containing:
    • File attachments (PDF, images, audio, Word documents)
    • Spreadsheet URLs (Google Sheets, CSV, Excel)
    • Text questions

Output:

  • WhatsApp messages containing:
    • Summaries of uploaded documents
    • Answers to questions based on document content
    • Analysis of images, spreadsheets, or other media
    • Transcriptions of audio messages

Data Transformations:

  1. Binary files → Text content
  2. Spreadsheet data → JSON → Formatted text
  3. Audio files → AAC format → Text transcription
  4. Images → Descriptive text analysis
  5. All content → AI-powered insights and responses

Error Handling

The workflow includes several error handling mechanisms:

  • File Size Checks: Large audio files are processed through CloudConvert before transcription
  • Conversion Status Monitoring: Wait loops check CloudConvert job status before proceeding
  • Format Compatibility: Audio files are converted to AAC format for reliable transcription
  • Fallback Paths: Multiple switch nodes handle different processing outcomes

Known Limitations

  • Audio files larger than 25MB require additional processing time through CloudConvert
  • Spreadsheet processing is limited to CSV and Excel formats
  • Image analysis depends on OpenAI's vision model capabilities
  • Conversation memory is limited to 50 message context window
  • File download requires proper authentication with Twilio

No related workflows specified in the current context.

Setup Instructions

  1. Import the Workflow: Import the JSON into your n8n instance

  2. Configure Credentials:

    • Set up OpenAI API credentials with access to GPT-4, GPT-4O, and Whisper
    • Configure Twilio API credentials for WhatsApp Business
    • Add CloudConvert API credentials for file conversion
    • Set up HTTP Basic Auth for Twilio media access
  3. Webhook Configuration:

    • Note the webhook URL generated for the "WhatsApp message" node
    • Configure this URL in your Twilio WhatsApp webhook settings
  4. Test the Integration:

    • Send a test message to your WhatsApp Business number
    • Verify that files are processed and responses are received
  5. Customize Analysis Prompts:

    • Modify the system message in the "content analyze" node to match your specific use case
    • Adjust the memory window size if needed for longer conversations
  6. Monitor Performance:

    • Check CloudConvert usage limits for file conversion
    • Monitor OpenAI API usage for cost management
    • Verify Twilio message delivery rates