Using Buzz on TranscribeDesktop

Buzz allows users to transcribe and translate audio offline, and allows for multiple audio recordings to be queued.

Features

Import audio and video and export transcripts to TXT, SRT, and VTT (Demo)files
Transcribe and translate from your computer’s microphones to text (Resource-intensive and may not be real-time, Demo)
Supports Whisper, Whisper.cpp, Faster Whisper, Whisper-compatible Hugging Face models, and the OpenAI Whisper API
Command-Line Interface

Current limitations

When first running Buzz, it may take a long time (around two minutes) for the program to start. Subsequent runs of the program will be fast.

Getting started

Transfer the audio and video recordings you wish to transcribe into the virtual desktop. Follow the instructions in the general guide.
From the desktop Click on the Buzz icon [IMAGE] to open the Buzz interface.

File import

Import a file from your computer, click ‘file’ or the plus sign (+). A window will appear. Select the file you want to transcribe. By default, Buzz displays only audio files, but you can change this by selecting “all files” from the drop-down menu.
Select the model type and size. For optimal results, use Whisper by OpenAI and select the medium model. This combination provides accurate transcriptions without significant delays in most scenarios.
Select the task (transcription or translation).
Choose the Language: Select the target language.
Click ‘Run’

Note: You can import multiple files and they will be queued for transcription. Buzz Captions also supports a “detect language” function.
Review and Refine Once the transcription process is complete, you can open the converted file by double-clicking on it. The text output will be displayed, and you can export it to different file types such as SRT and VTT files. These files include timestamps, which are useful for synchronizing audio with visual content.
Download the transcript from the virtual desktop to your computer. Follow the instructions from the general guide.

Buzz offers advanced options for more specific requirements, e.g., Word level timings which creates timestamps for every word in the transcription. This feature proves useful for creators who want to generate captions for short-form videos. For general purposes, leave this option disabled. See the links below for more information.

Useful instruction links

Import: https://chidiwilliams.github.io/buzz/docs/usage/file_import

Live recording: https://chidiwilliams.github.io/buzz/docs/usage/live_recording

Translation: https://chidiwilliams.github.io/buzz/docs/usage/translations

Edit and resize: https://chidiwilliams.github.io/buzz/docs/usage/edit_and_resize

Setting Preferences: https://chidiwilliams.github.io/buzz/docs/preferences

FAQs https://chidiwilliams.github.io/buzz/docs/faq