WhatsApp Multi-Model Document Processor¶
This workflow creates an intelligent WhatsApp bot that can analyze and respond to various types of content including spreadsheets, PDFs, images, Word documents, and voice messages. Users can send files or ask questions via WhatsApp, and the system will process the content using AI models and provide intelligent responses.
Purpose¶
No business context provided yet — add a context.md to enrich this documentation.
How It Works¶
- Message Reception: The workflow starts when a WhatsApp message is received via webhook
- Content Analysis: The system analyzes the incoming message to determine the type of content (spreadsheet, PDF, image, voice, text, or Word document)
- File Processing: Based on the content type, the workflow downloads and processes the file:
- Spreadsheets: Downloads CSV/Excel files and converts them to structured JSON
- PDFs: Extracts text content from PDF documents
- Images: Uses OpenAI's vision model to analyze and describe image content
- Voice Messages: Converts audio to text using speech-to-text transcription
- Word Documents: Converts DOCX files to text format
- Text Questions: Processes direct text queries
- AI Analysis: All processed content is sent to an AI agent powered by GPT-4 for intelligent analysis and response generation
- Response Delivery: The AI-generated response is sent back to the user via WhatsApp
Workflow Diagram¶
graph TD
A[WhatsApp message] --> B[analyze type of the input]
B --> C[Switch]
C -->|spreadsheet| D[download sheet]
C -->|pdf| E[download file image pdf]
C -->|image| E
C -->|voice| F[get audio from twilio1]
C -->|text| G[content analyze]
C -->|word| E
D --> H[convert to j son]
H --> I[all in row]
I --> J[json to text]
J --> G
E --> K[path]
K -->|pdf| L[pdf to text]
K -->|image| M[Analyze image]
K -->|word| N[create Job]
L --> G
M --> O[content to text]
O --> G
N --> P[Wait1]
P --> Q[convert Docs to text]
Q --> R[Switch2]
R -->|finished| S[Download text]
R -->|processing| P
S --> G
F --> T[If]
T -->|large file| U[download voice file]
T -->|small file| V[Transcribe a recording]
U --> W[import audio file to cloud convert]
W --> X[Wait]
X --> Y[convert audio file]
Y --> Z[Switch1]
Z -->|finished| AA[Download audio file]
Z -->|processing| X
AA --> V
V --> BB[Edit Fields]
BB --> G
G --> CC[send to WhatsApp]
DD[OpenAI Chat Model] -.-> G
EE[Simple Memory] -.-> G
Trigger¶
Webhook: The workflow is triggered by incoming WhatsApp messages sent to the endpoint /whatsapp-multimedia via HTTP POST requests.
Nodes Used¶
| Node Type | Purpose |
|---|---|
| Webhook | Receives incoming WhatsApp messages |
| Code | Analyzes message content type and formats data |
| Switch | Routes messages based on content type |
| HTTP Request | Downloads files from URLs and external APIs |
| Extract From File | Converts files to text/JSON format |
| OpenAI (Vision) | Analyzes image content using GPT-4O |
| OpenAI (Audio) | Transcribes voice messages to text |
| LangChain Agent | Processes content with AI analysis |
| OpenAI Chat Model | Provides GPT-4 language model capabilities |
| Memory Buffer | Maintains conversation context |
| Twilio | Sends responses back to WhatsApp |
| Wait | Handles processing delays for file conversions |
| If | Conditional logic for file size handling |
External Services & Credentials Required¶
Required Credentials:¶
- OpenAI API: For image analysis, voice transcription, and chat responses
- Twilio API: For WhatsApp message sending and receiving
- HTTP Basic Auth (Twilio): For downloading media files from Twilio
- CloudConvert API: For file format conversions (audio and documents)
External Services:¶
- OpenAI: GPT-4 and GPT-4O models for content analysis
- Twilio: WhatsApp Business API integration
- CloudConvert: File conversion service for audio and document processing
Environment Variables¶
No specific environment variables are defined in this workflow. All configuration is handled through n8n credentials.
Data Flow¶
Input:¶
- WhatsApp messages containing:
- File attachments (PDF, images, spreadsheets, Word docs, audio)
- Text questions
- URLs to documents
Processing:¶
- File download and format conversion
- Content extraction and text processing
- AI-powered analysis and response generation
Output:¶
- Intelligent text responses sent back via WhatsApp
- Summaries, analysis, and answers based on the input content
Error Handling¶
The workflow includes several error handling mechanisms:
- File Size Checking: Large audio files are processed through CloudConvert before transcription
- Processing Status Monitoring: Wait nodes and switches monitor conversion job status
- Retry Logic: Processing jobs are checked repeatedly until completion
- Fallback Paths: Different processing routes for various file types and sizes
Known Limitations¶
- Audio files larger than 25MB require additional processing time through CloudConvert
- The workflow depends on external services (OpenAI, CloudConvert) which may have rate limits
- Memory buffer is limited to 50 context messages
- File processing times vary based on file size and external service availability
Related Workflows¶
No related workflows specified in the current context.
Setup Instructions¶
-
Import the Workflow: Import the JSON configuration into your n8n instance
-
Configure Credentials:
- Set up OpenAI API credentials with access to GPT-4 and GPT-4O models
- Configure Twilio API credentials for WhatsApp integration
- Add HTTP Basic Auth credentials for Twilio media downloads
- Set up CloudConvert API credentials for file conversions
-
Webhook Configuration:
- Note the webhook URL generated for the "WhatsApp message" node
- Configure this URL in your Twilio WhatsApp webhook settings
-
Test the Setup:
- Send a test message to your WhatsApp number
- Verify that files are properly downloaded and processed
- Check that AI responses are generated and sent back
-
Activate the Workflow: Enable the workflow to start processing incoming messages
-
Monitor Performance: Check execution logs to ensure all external service integrations are working correctly