WhatsApp Multi Model¶
A comprehensive AI-powered WhatsApp bot that accepts multiple file formats (spreadsheets, PDFs, images, audio, Word documents) and enables users to ask questions about their content through natural conversation with memory persistence.
Purpose¶
No business context provided yet — add a context.md to enrich this documentation.
This workflow serves as an intelligent document analysis assistant accessible through WhatsApp. Users can upload any supported file type and engage in conversational Q&A about the content, with the AI maintaining context across the conversation session.
How It Works¶
- Message Reception: WhatsApp messages are received via Twilio webhook
- Content Analysis: The system analyzes incoming messages to determine if they contain files, links, or text questions
- File Type Detection: Identifies the type of content (spreadsheet, PDF, image, audio, Word document, or text question)
- File Processing: Downloads and processes files based on their type:
- Spreadsheets: Converts to JSON then text format for analysis
- PDFs: Extracts text content directly
- Images: Uses OpenAI's vision model to analyze and describe content
- Audio: Converts format if needed, then transcribes using OpenAI Whisper
- Word Documents: Converts to CSV format via CloudConvert
- AI Analysis: Processes extracted content through an AI agent with conversation memory
- Response Generation: Generates contextual responses based on file content and user questions
- Reply Delivery: Sends responses back to the user via WhatsApp
Workflow Diagram¶
graph TD
A[WhatsApp message] --> B[analyze type of the input]
B --> C[Switch]
C -->|spreadsheet| D[download sheet]
C -->|pdf| E[download file image pdf]
C -->|image| E
C -->|voice| F[get audio from twilio1]
C -->|question| G[content analyze]
C -->|word| E
D --> H[convert to j son]
H --> I[all in row]
I --> J[json to text]
J --> G
E --> K[path]
K -->|pdf| L[pdf to text]
K -->|image| M[Analyze image]
K -->|word| N[create Job]
L --> G
M --> O[content to text]
O --> G
N --> P[Wait1]
P --> Q[convert Docs to text]
Q --> R[Switch2]
R -->|finished| S[Download text]
R -->|processing| P
F --> T[If]
T -->|large file| U[download voice file]
T -->|small file| V[Transcribe a recording]
U --> W[import audio file to cloud convert]
W --> X[Wait]
X --> Y[convert audio file]
Y --> Z[Switch1]
Z -->|finished| AA[Download audio file]
Z -->|processing| X
AA --> V
V --> BB[Edit Fields]
BB --> G
G --> CC[send to WhatsApp]
DD[OpenAI Chat Model] -.->|ai_languageModel| G
EE[Simple Memory] -.->|ai_memory| G
Trigger¶
- Type: Webhook
- Method: POST
- Path:
/webhook/whatsapp-multimedia - Source: Twilio WhatsApp integration
Nodes Used¶
| Node Type | Purpose |
|---|---|
| Webhook | Receives WhatsApp messages from Twilio |
| Code | Analyzes input type and formats data |
| Switch | Routes processing based on file type |
| HTTP Request | Downloads files from Twilio and external services |
| Extract From File | Converts spreadsheets and PDFs to text |
| OpenAI (Vision) | Analyzes image content |
| OpenAI (Audio) | Transcribes audio files |
| LangChain Agent | Processes content and generates responses |
| OpenAI Chat Model | Powers the conversational AI |
| Buffer Window Memory | Maintains conversation context |
| If | Handles conditional logic for file size |
| Wait | Manages timing for file conversion processes |
| Set/Edit Fields | Formats data between nodes |
| Twilio | Sends responses back to WhatsApp |
External Services & Credentials Required¶
Required Credentials¶
- Twilio API: For WhatsApp messaging
- Account SID
- Auth Token
- WhatsApp phone number
- OpenAI API: For AI processing
- API key with access to GPT-4, GPT-4 Vision, and Whisper
- CloudConvert API: For file format conversion
- API token
- HTTP Basic Auth: For Twilio media downloads
- Username and password
External Services¶
- Twilio (WhatsApp Business API)
- OpenAI (GPT-4, GPT-4 Vision, Whisper)
- CloudConvert (file conversion)
Environment Variables¶
No specific environment variables are configured in this workflow. All authentication is handled through n8n credential management.
Data Flow¶
Input¶
- WhatsApp messages containing:
- File attachments (PDF, images, audio, Word docs)
- Spreadsheet links (Google Sheets, CSV, XLSX)
- Text questions about previously uploaded content
Processing¶
- File content extraction and conversion to text
- AI analysis and summarization
- Conversational Q&A with memory retention
Output¶
- WhatsApp messages containing:
- File content summaries
- Answers to user questions
- Analysis insights and key findings
Error Handling¶
The workflow includes several error handling mechanisms:
- File Size Check: Large audio files (>25MB) are processed through CloudConvert for format conversion
- Conversion Status Monitoring: Wait nodes and switch logic handle CloudConvert job status (processing, finished, error)
- Fallback Processing: Multiple paths for different file types ensure robust content extraction
- Memory Management: Conversation context is maintained with a 50-message window to prevent memory overflow
Known Limitations¶
- Audio files larger than 25MB require additional processing time through CloudConvert
- Google Sheets links are automatically converted to CSV export format
- Conversation memory is session-based using workflow ID
- CloudConvert API has rate limits and processing time constraints
- Image analysis quality depends on image clarity and content
Related Workflows¶
No related workflows specified in the current context.
Setup Instructions¶
-
Import Workflow: Import the JSON workflow into your n8n instance
-
Configure Credentials:
- Set up Twilio API credentials with your WhatsApp Business account
- Add OpenAI API key with appropriate model access
- Configure CloudConvert API token
- Set up HTTP Basic Auth for Twilio media downloads
-
Webhook Configuration:
- Copy the webhook URL from the "WhatsApp message" node
- Configure this URL in your Twilio WhatsApp webhook settings
-
Test the Setup:
- Send a test file via WhatsApp to verify file processing
- Ask questions about the uploaded content to test AI responses
- Verify conversation memory by asking follow-up questions
-
Activate Workflow: Ensure the workflow is activated in n8n
-
Monitor Performance: Check execution logs for any errors or processing delays
The workflow is ready to handle multi-format file analysis through WhatsApp once all credentials are properly configured and the webhook is connected to your Twilio account.