Learning Timeline
Key Insights
The Advantages of Gemini's Multimodal Capabilities
Unlike traditional models that require a separate Audio-to-Text conversion step (like Whisper), Gemini can process audio files directly as multimodal input. This makes the workflow faster and easier to manage within a single node.
Folder Organization Tips
Ensure that audio files and transcripts are stored in the same folder or a dedicated directory to avoid confusion when managing hundreds of recordings.
Prompts
Audio Transcription Prompt
Target:
Google Gemini (Multimodal)
Please provide a complete and accurate transcription of the attached audio file. Maintain the original structure of the conversation and include speaker labels if possible.
Step by Step
Setting Up an n8n Automated Transcription Workflow
- Open the n8n dashboard and create a new workflow.
- Add a 'Google Drive Trigger' node and set 'Watch for' to 'File Created' within the 'Workshop Recordings' folder.
- Add a 'Google Drive' node with the 'Download File' action to fetch the newly uploaded audio file.
- Connect that node to a 'Google Gemini' node (use Gemini 1.5 Pro or Flash for multimodal support).
- In the Gemini node, configure the model to accept audio input and enter your transcription prompt.
- Add another 'Google Drive' node with the 'Upload File' action after the Gemini node.
- Map the text output from Gemini to the content of the new file in Google Drive.
- Name the output file using the suffix '_transcript.txt'.
- Click 'Execute Workflow' to test the automation.