This month I realized I was not going to be able to get the novel in progress ready for publication. Too much, too soon, didn’t want to rush my editors. So I rifled through the virtual stack of stories in my virtual filing cabinet, to see what I had either ready to go, or close enough to done. I came up with a novella I’d started writing a few years ago, one that is markedly different than my usual fare, but a story that has been haunting me ever since I set it aside. I opened it up, started to read, and got sucked right in.

Right up until about chapter three or four, where I hit the ‘dirty dictation’ parts. See, this was one of the pieces I was working on when I was trying dictation on my commute, and found out the hard way just how terribly Dragon Naturally Speaking handles any kind of background noise. I had the transcriptions automatically generated and they were word salad. I could barely make heads or tails of what I was trying to say. Now, this story wasn’t going to let me go now that I’d brought it out of hibernation, so I started looking for the audio files to manually transcribe it. I finally found them. Most of them. And, I remembered what Mel Dunay had told me about how she uses a local AI kernel installed on her own computer to aid in her transcription. I asked for help.
She uses Whisper AI, which was written to generate subtitles for audio or video files. She talked a little on her blog about how she installed it, I wound up asking Grok for detailed installation instructions which I’ll append below. I am by no means adept with Python, or coding for that matter, so if I could make this work I think most people can figure it out as well! Most important, there’s a secondary step for cleaning up the transcription files you get out of Whisper, which Mel details in this post, and I have tested this prompt with Grok where she used Claude, so I can say that it works with more than one of the chat-style AI models. If you’re concerned about exposing your work to an AI, keep in mind you’re not putting up a whole story, but bits and chunks at a time. Also, as I was running this process I was concerned the AI would alter or insert things itself, and it does not do that. Part of Mel’s prompt involves asking it to show it’s work where it’s changed potential misspellings, so you have the option to accept that or correct it. I found it did a superlative job, between whisper and Grok, and I was able to salvage several big parts of the story where I would have otherwise needed to spend a lot of time listening to my own voice (ick) and trying to manually do this.
I might even, with this tool, start back on dictating. I can dictate story, not everyone can, and it’s much faster than typing in some instances, very useful when you can’t be in front of a keyboard, and also I could do it (like Mel does) while I’m exercising which is great for not sitting at the desk all day every day. I can also use it to alleviate the pressure on my hands as I’m writing all day, every day, and run a risk of repetitive stress injuries over the years I plan to do this for a living. It’s a fantastic tool, once you get it up and running. Healthier!
What follows is the Grok instructional:
To install OpenAI’s Whisper AI for audio transcription on your computer, follow these steps. Note that Whisper requires Python (version 3.8–3.13) and works on major operating systems like Windows, macOS, Linux. If you don’t have Python installed, download it from python.org first and ensure it’s added to your system’s PATH.
- Install ffmpeg: This is a required command-line tool for handling audio files. Use your OS’s package manager:
- On Ubuntu/Debian: Run sudo apt update && sudo apt install ffmpeg in the terminal.
- On Arch Linux: Run sudo pacman -S ffmpeg.
- On macOS (with Homebrew installed): Run brew install ffmpeg.
- On Windows (with Chocolatey installed): Run choco install ffmpeg.
- On Windows (with Scoop installed): Run scoop install ffmpeg. Verify installation by running ffmpeg -version in your terminal or command prompt.
- Install Whisper via pip: Open a terminal or command prompt and run:
pip install -U openai-whisperThis installs the latest stable release. Alternatively, for the cutting-edge version directly from GitHub:pip install git+https://github.com/openai/whisper.gitTo update an existing installation from GitHub:pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git
- Handle optional dependencies if needed: If the installation fails due to missing Rust (e.g., for the tiktoken package on certain platforms), install Rust from rust-lang.org and add it to your PATH (e.g., export PATH=”$HOME/.cargo/bin:$PATH” on Unix-like systems). If you see an error about setuptools-rust, run pip install setuptools-rust.
- Test the installation: Once installed, you can transcribe an audio file (e.g., test.mp3) by running:
whisper test.mp3 --model tinyReplace test.mp3 with your file path and –model tiny with a larger model like base, small, medium, or large for better accuracy (downloads automatically on first use). - Handle optional dependencies if needed: If the installation fails due to missing Rust (e.g., for the tiktoken package on certain platforms), install Rust from rust-lang.org and add it to your PATH (e.g., export PATH=”$HOME/.cargo/bin:$PATH” on Unix-like systems). If you see an error about setuptools-rust, run pip install setuptools-rust.
- Test the installation: Once installed, you can transcribe an audio file (e.g., test.mp3) by running:
whisper test.mp3 --model tiny
Replace test.mp3 with your file path and –model tiny with a larger model like base, small, medium, or large for better accuracy (downloads automatically on first use).
If you encounter issues, ensure your Python environment is up to date and PyTorch-compatible (Whisper handles this automatically in most cases).
Here’s how to actually use Whisper for transcription once it’s installed and Python + ffmpeg are in your PATH.1. Basic command-line usage (easiest and most common)Open Terminal (macOS/Linux) or Command Prompt / PowerShell (Windows) and run:
bash
whisper your_audio_file.mp3
That’s it! Whisper will automatically:
- Download the best default model (medium if your GPU allows it, otherwise small)
- Transcribe the file
- Save these files in the same folder:
- .txt – plain text
- .srt – subtitles
- .vtt – web subtitles
- .json – full details with timestamps and confidence scores
- .tsv – tab-separated values
2. Most useful options you’ll want to know
| Goal | Command | Notes |
|---|---|---|
| Faster but slightly less accurate | whisper audio.mp3 –model tiny | tiny → base → small → medium → large → turbo (turbo is fastest with almost large quality) |
| Best possible quality | whisper audio.mp3 –model large-v3 | large-v3 is the newest and best model (as of 2025) |
| Force English only (faster) | whisper audio.mp3 –language en | Skip language detection |
| Auto-detect language | whisper audio.mp3 –language auto (default) | |
| Translate to English (non-English audio) | whisper audio.mp3 –task translate | Transcribes + translates to English |
| Output only text (no extra files) | whisper audio.mp3 –output_format txt | Add –output_dir folder_name to choose where files go |
| Use GPU (much faster if you have NVIDIA) | Whisper automatically uses GPU if PyTorch + CUDA is installed. No extra flag needed. | |
| Process an entire folder | whisper “*.mp3” –model medium | Put quotes around the wildcard |
| Add word-level timestamps | whisper audio.mp3 –word_timestamps True | Great for subtitle editing |
3. Real-world examples
bash
# Best quality + English translation + subtitles
whisper podcast_spanish.mp3 --model large-v3 --task translate --language es
# Fast transcription of a 2-hour podcast (good enough for most people)
whisper long_podcast.mp3 --model turbo
# Transcribe everything in the current folder quickly
whisper *.wav --model small --language en
# Save only .srt subtitles in a subfolder called "subs"
whisper video.mp4 --output_format srt --output_dir subs
4. Supported audio/video formatsWhisper accepts almost anything ffmpeg can read: .mp3 .wav .m4a .mp4 .mpeg .ogg .webm .flac and many more — even YouTube links if you have yt-dlp installed!5. Quick test right now
- Download a short sample (or use any audio file you have).
- Run:bash
whisper sample.mp3 --model tiny - In a few seconds you’ll see the transcription in the terminal and a sample.txt file created.
That’s literally all there is to it. Once installed, Whisper is one of the easiest AI tools to use from the command line. Enjoy!




Leave a comment