WhatsApp Multi Model¶

A comprehensive AI-powered WhatsApp bot that accepts multiple file formats (spreadsheets, PDFs, images, audio, Word documents) and enables users to ask questions about their content through natural conversation with memory persistence.

Purpose¶

No business context provided yet — add a context.md to enrich this documentation.

This workflow serves as an intelligent document analysis assistant accessible through WhatsApp. Users can upload any supported file type and engage in conversational Q&A about the content, with the AI maintaining context across the conversation session.

How It Works¶

Message Reception: WhatsApp messages are received via Twilio webhook
Content Analysis: The system analyzes incoming messages to determine if they contain files, links, or text questions
File Type Detection: Identifies the type of content (spreadsheet, PDF, image, audio, Word document, or text question)
File Processing: Downloads and processes files based on their type:
- Spreadsheets: Converts to JSON then text format for analysis
- PDFs: Extracts text content directly
- Images: Uses OpenAI's vision model to analyze and describe content
- Audio: Converts format if needed, then transcribes using OpenAI Whisper
- Word Documents: Converts to CSV format via CloudConvert
AI Analysis: Processes extracted content through an AI agent with conversation memory
Response Generation: Generates contextual responses based on file content and user questions
Reply Delivery: Sends responses back to the user via WhatsApp

Workflow Diagram¶

graph TD
    A[WhatsApp message] --> B[analyze type of the input]
    B --> C[Switch]

    C -->|spreadsheet| D[download sheet]
    C -->|pdf| E[download file image pdf]
    C -->|image| E
    C -->|voice| F[get audio from twilio1]
    C -->|question| G[content analyze]
    C -->|word| E

    D --> H[convert to j son]
    H --> I[all in row]
    I --> J[json to text]
    J --> G

    E --> K[path]
    K -->|pdf| L[pdf to text]
    K -->|image| M[Analyze image]
    K -->|word| N[create Job]

    L --> G
    M --> O[content to text]
    O --> G

    N --> P[Wait1]
    P --> Q[convert Docs to text]
    Q --> R[Switch2]
    R -->|finished| S[Download text]
    R -->|processing| P

    F --> T[If]
    T -->|large file| U[download voice file]
    T -->|small file| V[Transcribe a recording]

    U --> W[import audio file to cloud convert]
    W --> X[Wait]
    X --> Y[convert audio file]
    Y --> Z[Switch1]
    Z -->|finished| AA[Download audio file]
    Z -->|processing| X
    AA --> V

    V --> BB[Edit Fields]
    BB --> G

    G --> CC[send to WhatsApp]

    DD[OpenAI Chat Model] -.->|ai_languageModel| G
    EE[Simple Memory] -.->|ai_memory| G

Trigger¶

Type: Webhook
Method: POST
Path: /webhook/whatsapp-multimedia
Source: Twilio WhatsApp integration

Nodes Used¶

Node Type	Purpose
Webhook	Receives WhatsApp messages from Twilio
Code	Analyzes input type and formats data
Switch	Routes processing based on file type
HTTP Request	Downloads files from Twilio and external services
Extract From File	Converts spreadsheets and PDFs to text
OpenAI (Vision)	Analyzes image content
OpenAI (Audio)	Transcribes audio files
LangChain Agent	Processes content and generates responses
OpenAI Chat Model	Powers the conversational AI
Buffer Window Memory	Maintains conversation context
If	Handles conditional logic for file size
Wait	Manages timing for file conversion processes
Set/Edit Fields	Formats data between nodes
Twilio	Sends responses back to WhatsApp

External Services & Credentials Required¶

Required Credentials¶

Twilio API: For WhatsApp messaging
- Account SID
- Auth Token
- WhatsApp phone number
OpenAI API: For AI processing
- API key with access to GPT-4, GPT-4 Vision, and Whisper
CloudConvert API: For file format conversion
- API token
HTTP Basic Auth: For Twilio media downloads
- Username and password

External Services¶

Twilio (WhatsApp Business API)
OpenAI (GPT-4, GPT-4 Vision, Whisper)
CloudConvert (file conversion)

Environment Variables¶

No specific environment variables are configured in this workflow. All authentication is handled through n8n credential management.

Data Flow¶

Input¶

WhatsApp messages containing:
- File attachments (PDF, images, audio, Word docs)
- Spreadsheet links (Google Sheets, CSV, XLSX)
- Text questions about previously uploaded content

Processing¶

File content extraction and conversion to text
AI analysis and summarization
Conversational Q&A with memory retention

Output¶

WhatsApp messages containing:
- File content summaries
- Answers to user questions
- Analysis insights and key findings

Error Handling¶

The workflow includes several error handling mechanisms:

File Size Check: Large audio files (>25MB) are processed through CloudConvert for format conversion
Conversion Status Monitoring: Wait nodes and switch logic handle CloudConvert job status (processing, finished, error)
Fallback Processing: Multiple paths for different file types ensure robust content extraction
Memory Management: Conversation context is maintained with a 50-message window to prevent memory overflow

Known Limitations¶

Audio files larger than 25MB require additional processing time through CloudConvert
Google Sheets links are automatically converted to CSV export format
Conversation memory is session-based using workflow ID
CloudConvert API has rate limits and processing time constraints
Image analysis quality depends on image clarity and content

No related workflows specified in the current context.

Setup Instructions¶

Import Workflow: Import the JSON workflow into your n8n instance
Configure Credentials:
- Set up Twilio API credentials with your WhatsApp Business account
- Add OpenAI API key with appropriate model access
- Configure CloudConvert API token
- Set up HTTP Basic Auth for Twilio media downloads
Webhook Configuration:
- Copy the webhook URL from the "WhatsApp message" node
- Configure this URL in your Twilio WhatsApp webhook settings
Test the Setup:
- Send a test file via WhatsApp to verify file processing
- Ask questions about the uploaded content to test AI responses
- Verify conversation memory by asking follow-up questions
Activate Workflow: Ensure the workflow is activated in n8n
Monitor Performance: Check execution logs for any errors or processing delays

The workflow is ready to handle multi-format file analysis through WhatsApp once all credentials are properly configured and the webhook is connected to your Twilio account.