WhatsApp Multi-Model Document Processor¶

This workflow creates an intelligent WhatsApp bot that can analyze and respond to various types of content including spreadsheets, PDFs, images, Word documents, and voice messages. Users can send files or ask questions via WhatsApp, and the system will process the content using AI models and provide intelligent responses.

Purpose¶

No business context provided yet — add a context.md to enrich this documentation.

How It Works¶

Message Reception: The workflow starts when a WhatsApp message is received via webhook
Content Analysis: The system analyzes the incoming message to determine the type of content (spreadsheet, PDF, image, voice, text, or Word document)
File Processing: Based on the content type, the workflow downloads and processes the file:
- Spreadsheets: Downloads CSV/Excel files and converts them to structured JSON
- PDFs: Extracts text content from PDF documents
- Images: Uses OpenAI's vision model to analyze and describe image content
- Voice Messages: Converts audio to text using speech-to-text transcription
- Word Documents: Converts DOCX files to text format
- Text Questions: Processes direct text queries
AI Analysis: All processed content is sent to an AI agent powered by GPT-4 for intelligent analysis and response generation
Response Delivery: The AI-generated response is sent back to the user via WhatsApp

Workflow Diagram¶

graph TD
    A[WhatsApp message] --> B[analyze type of the input]
    B --> C[Switch]
    C -->|spreadsheet| D[download sheet]
    C -->|pdf| E[download file image pdf]
    C -->|image| E
    C -->|voice| F[get audio from twilio1]
    C -->|text| G[content analyze]
    C -->|word| E

    D --> H[convert to j son]
    H --> I[all in row]
    I --> J[json to text]
    J --> G

    E --> K[path]
    K -->|pdf| L[pdf to text]
    K -->|image| M[Analyze image]
    K -->|word| N[create Job]

    L --> G
    M --> O[content to text]
    O --> G

    N --> P[Wait1]
    P --> Q[convert Docs to text]
    Q --> R[Switch2]
    R -->|finished| S[Download text]
    R -->|processing| P
    S --> G

    F --> T[If]
    T -->|large file| U[download voice file]
    T -->|small file| V[Transcribe a recording]
    U --> W[import audio file to cloud convert]
    W --> X[Wait]
    X --> Y[convert audio file]
    Y --> Z[Switch1]
    Z -->|finished| AA[Download audio file]
    Z -->|processing| X
    AA --> V
    V --> BB[Edit Fields]
    BB --> G

    G --> CC[send to WhatsApp]

    DD[OpenAI Chat Model] -.-> G
    EE[Simple Memory] -.-> G

Trigger¶

Webhook: The workflow is triggered by incoming WhatsApp messages sent to the endpoint /whatsapp-multimedia via HTTP POST requests.

Nodes Used¶

Node Type	Purpose
Webhook	Receives incoming WhatsApp messages
Code	Analyzes message content type and formats data
Switch	Routes messages based on content type
HTTP Request	Downloads files from URLs and external APIs
Extract From File	Converts files to text/JSON format
OpenAI (Vision)	Analyzes image content using GPT-4O
OpenAI (Audio)	Transcribes voice messages to text
LangChain Agent	Processes content with AI analysis
OpenAI Chat Model	Provides GPT-4 language model capabilities
Memory Buffer	Maintains conversation context
Twilio	Sends responses back to WhatsApp
Wait	Handles processing delays for file conversions
If	Conditional logic for file size handling

External Services & Credentials Required¶

Required Credentials:¶

OpenAI API: For image analysis, voice transcription, and chat responses
Twilio API: For WhatsApp message sending and receiving
HTTP Basic Auth (Twilio): For downloading media files from Twilio
CloudConvert API: For file format conversions (audio and documents)

External Services:¶

OpenAI: GPT-4 and GPT-4O models for content analysis
Twilio: WhatsApp Business API integration
CloudConvert: File conversion service for audio and document processing

Environment Variables¶

No specific environment variables are defined in this workflow. All configuration is handled through n8n credentials.

Data Flow¶

Input:¶

WhatsApp messages containing:
- File attachments (PDF, images, spreadsheets, Word docs, audio)
- Text questions
- URLs to documents

Processing:¶

File download and format conversion
Content extraction and text processing
AI-powered analysis and response generation

Output:¶

Intelligent text responses sent back via WhatsApp
Summaries, analysis, and answers based on the input content

Error Handling¶

The workflow includes several error handling mechanisms:

File Size Checking: Large audio files are processed through CloudConvert before transcription
Processing Status Monitoring: Wait nodes and switches monitor conversion job status
Retry Logic: Processing jobs are checked repeatedly until completion
Fallback Paths: Different processing routes for various file types and sizes

Known Limitations¶

Audio files larger than 25MB require additional processing time through CloudConvert
The workflow depends on external services (OpenAI, CloudConvert) which may have rate limits
Memory buffer is limited to 50 context messages
File processing times vary based on file size and external service availability

No related workflows specified in the current context.

Setup Instructions¶

Import the Workflow: Import the JSON configuration into your n8n instance
Configure Credentials:
- Set up OpenAI API credentials with access to GPT-4 and GPT-4O models
- Configure Twilio API credentials for WhatsApp integration
- Add HTTP Basic Auth credentials for Twilio media downloads
- Set up CloudConvert API credentials for file conversions
Webhook Configuration:
- Note the webhook URL generated for the "WhatsApp message" node
- Configure this URL in your Twilio WhatsApp webhook settings
Test the Setup:
- Send a test message to your WhatsApp number
- Verify that files are properly downloaded and processed
- Check that AI responses are generated and sent back
Activate the Workflow: Enable the workflow to start processing incoming messages
Monitor Performance: Check execution logs to ensure all external service integrations are working correctly