WhatsApp Multi-Model Document Processing¶

A comprehensive WhatsApp integration that automatically processes various file types (PDFs, images, spreadsheets, Word documents, and audio) sent via WhatsApp messages, extracts their content using AI services, and provides intelligent analysis and responses back to the sender.

Purpose¶

No business context provided yet — add a context.md to enrich this documentation.

This workflow serves as a universal document processing assistant accessible through WhatsApp. Users can send various types of files or ask questions, and the system will automatically detect the content type, extract relevant information, and provide intelligent analysis or answers. It's particularly useful for teams or individuals who need quick document analysis capabilities without switching between multiple applications.

How It Works¶

Message Reception: The workflow receives WhatsApp messages through a webhook endpoint
Content Analysis: Incoming messages are analyzed to determine the type of content (spreadsheet link, PDF, image, audio, Word document, or text question)
File Processing: Based on the content type, files are downloaded and processed:
- Spreadsheets: Downloaded and converted to JSON format for analysis
- PDFs: Text content is extracted directly
- Images: Analyzed using OpenAI's vision capabilities
- Audio: Converted to compatible format and transcribed using OpenAI's Whisper
- Word Documents: Converted to text using CloudConvert service
- Text Questions: Processed directly
AI Analysis: All extracted content is analyzed by GPT-4 with conversation memory
Response: The AI-generated analysis or answer is sent back to the original WhatsApp sender

Workflow Diagram¶

graph TD
    A[WhatsApp message] --> B[analyze type of the input]
    B --> C[Switch]

    C -->|spreadsheet| D[download sheet]
    C -->|pdf| E[download file image pdf]
    C -->|image| E
    C -->|voice| F[get audio from twilio1]
    C -->|question| G[content analyze]
    C -->|word| E

    D --> H[convert to j son]
    H --> I[all in row]
    I --> J[json to text]
    J --> G

    E --> K[path]
    K -->|pdf| L[pdf to text]
    K -->|image| M[Analyze image]
    K -->|word| N[create Job]

    L --> G
    M --> O[content to text]
    O --> G

    N --> P[Wait1]
    P --> Q[convert Docs to text]
    Q --> R[Switch2]
    R -->|finished| S[Download text]
    R -->|processing| P
    S --> G

    F --> T[If]
    T -->|large file| U[download voice file]
    T -->|small file| V[Transcribe a recording]
    U --> W[import audio file to cloud convert]
    W --> X[Wait]
    X --> Y[convert audio file]
    Y --> Z[Switch1]
    Z -->|finished| AA[Download audio file]
    Z -->|processing| X
    AA --> V
    V --> BB[Edit Fields]
    BB --> G

    G --> CC[send to WhatsApp]

    DD[OpenAI Chat Model] -.-> G
    EE[Simple Memory] -.-> G

Trigger¶

Webhook: POST endpoint at /whatsapp-multimedia that receives WhatsApp messages from Twilio's messaging service.

Nodes Used¶

Node Type	Purpose
Webhook	Receives incoming WhatsApp messages
Code	Analyzes message content to determine file type
Switch	Routes processing based on content type
HTTP Request	Downloads files from URLs and interacts with external APIs
Extract From File	Converts spreadsheets to JSON and PDFs to text
OpenAI (Vision)	Analyzes image content
OpenAI (Audio)	Transcribes audio files
LangChain Agent	Provides AI-powered content analysis with memory
OpenAI Chat Model	Powers the conversational AI responses
Buffer Window Memory	Maintains conversation context
If	Handles conditional logic for file size checks
Wait	Manages timing for file conversion processes
Set	Formats data for downstream processing
Twilio	Sends responses back to WhatsApp

External Services & Credentials Required¶

Required Credentials:¶

OpenAI API: For image analysis, audio transcription, and chat responses
Twilio API: For WhatsApp message handling (receiving and sending)
CloudConvert API: For audio format conversion and Word document processing
HTTP Basic Auth: For accessing Twilio media files

External Services:¶

Twilio WhatsApp Business API: Message routing and media handling
OpenAI GPT-4/GPT-4O: Content analysis and conversation
OpenAI Whisper: Audio transcription
CloudConvert: File format conversion

Environment Variables¶

No specific environment variables are configured in this workflow. All authentication is handled through n8n credential management.

Data Flow¶

Input:¶

WhatsApp messages containing:
- File attachments (PDF, images, audio, Word documents)
- Spreadsheet URLs (Google Sheets, CSV, Excel)
- Text questions

Output:¶

WhatsApp messages containing:
- Summaries of uploaded documents
- Answers to questions based on document content
- Analysis of images, spreadsheets, or other media
- Transcriptions of audio messages

Data Transformations:¶

Binary files → Text content
Spreadsheet data → JSON → Formatted text
Audio files → AAC format → Text transcription
Images → Descriptive text analysis
All content → AI-powered insights and responses

Error Handling¶

The workflow includes several error handling mechanisms:

File Size Checks: Large audio files are processed through CloudConvert before transcription
Conversion Status Monitoring: Wait loops check CloudConvert job status before proceeding
Format Compatibility: Audio files are converted to AAC format for reliable transcription
Fallback Paths: Multiple switch nodes handle different processing outcomes

Known Limitations¶

Audio files larger than 25MB require additional processing time through CloudConvert
Spreadsheet processing is limited to CSV and Excel formats
Image analysis depends on OpenAI's vision model capabilities
Conversation memory is limited to 50 message context window
File download requires proper authentication with Twilio

No related workflows specified in the current context.

Setup Instructions¶

Import the Workflow: Import the JSON into your n8n instance
Configure Credentials:
- Set up OpenAI API credentials with access to GPT-4, GPT-4O, and Whisper
- Configure Twilio API credentials for WhatsApp Business
- Add CloudConvert API credentials for file conversion
- Set up HTTP Basic Auth for Twilio media access
Webhook Configuration:
- Note the webhook URL generated for the "WhatsApp message" node
- Configure this URL in your Twilio WhatsApp webhook settings
Test the Integration:
- Send a test message to your WhatsApp Business number
- Verify that files are processed and responses are received
Customize Analysis Prompts:
- Modify the system message in the "content analyze" node to match your specific use case
- Adjust the memory window size if needed for longer conversations
Monitor Performance:
- Check CloudConvert usage limits for file conversion
- Monitor OpenAI API usage for cost management
- Verify Twilio message delivery rates