This month I realized I was not going to be able to get the novel in progress ready for publication. Too much, too soon, didn’t want to rush my editors. So I rifled through the virtual stack of stories in my virtual filing cabinet, to see what I had either ready to go, or close enough to done. I came up with a novella I’d started writing a few years ago, one that is markedly different than my usual fare, but a story that has been haunting me ever since I set it aside. I opened it up, started to read, and got sucked right in.

Coming soon! And I am using an open penname to sort my darker, not fantasy or science fiction, stories for my readers, to allow them to choose what side they want from me: the dark side or the light side!

Right up until about chapter three or four, where I hit the ‘dirty dictation’ parts. See, this was one of the pieces I was working on when I was trying dictation on my commute, and found out the hard way just how terribly Dragon Naturally Speaking handles any kind of background noise. I had the transcriptions automatically generated and they were word salad. I could barely make heads or tails of what I was trying to say. Now, this story wasn’t going to let me go now that I’d brought it out of hibernation, so I started looking for the audio files to manually transcribe it. I finally found them. Most of them. And, I remembered what Mel Dunay had told me about how she uses a local AI kernel installed on her own computer to aid in her transcription. I asked for help.

She uses Whisper AI, which was written to generate subtitles for audio or video files. She talked a little on her blog about how she installed it, I wound up asking Grok for detailed installation instructions which I’ll append below. I am by no means adept with Python, or coding for that matter, so if I could make this work I think most people can figure it out as well! Most important, there’s a secondary step for cleaning up the transcription files you get out of Whisper, which Mel details in this post, and I have tested this prompt with Grok where she used Claude, so I can say that it works with more than one of the chat-style AI models. If you’re concerned about exposing your work to an AI, keep in mind you’re not putting up a whole story, but bits and chunks at a time. Also, as I was running this process I was concerned the AI would alter or insert things itself, and it does not do that. Part of Mel’s prompt involves asking it to show it’s work where it’s changed potential misspellings, so you have the option to accept that or correct it. I found it did a superlative job, between whisper and Grok, and I was able to salvage several big parts of the story where I would have otherwise needed to spend a lot of time listening to my own voice (ick) and trying to manually do this.

I might even, with this tool, start back on dictating. I can dictate story, not everyone can, and it’s much faster than typing in some instances, very useful when you can’t be in front of a keyboard, and also I could do it (like Mel does) while I’m exercising which is great for not sitting at the desk all day every day. I can also use it to alleviate the pressure on my hands as I’m writing all day, every day, and run a risk of repetitive stress injuries over the years I plan to do this for a living. It’s a fantastic tool, once you get it up and running. Healthier!

What follows is the Grok instructional:

To install OpenAI’s Whisper AI for audio transcription on your computer, follow these steps. Note that Whisper requires Python (version 3.8–3.13) and works on major operating systems like Windows, macOS, Linux. If you don’t have Python installed, download it from python.org first and ensure it’s added to your system’s PATH.

  1. Install ffmpeg: This is a required command-line tool for handling audio files. Use your OS’s package manager:
    • On Ubuntu/Debian: Run sudo apt update && sudo apt install ffmpeg in the terminal.
    • On Arch Linux: Run sudo pacman -S ffmpeg.
    • On macOS (with Homebrew installed): Run brew install ffmpeg.
    • On Windows (with Chocolatey installed): Run choco install ffmpeg.
    • On Windows (with Scoop installed): Run scoop install ffmpeg. Verify installation by running ffmpeg -version in your terminal or command prompt.
  2. Install Whisper via pip: Open a terminal or command prompt and run:pip install -U openai-whisperThis installs the latest stable release. Alternatively, for the cutting-edge version directly from GitHub:pip install git+https://github.com/openai/whisper.gitTo update an existing installation from GitHub:pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git
  1. Handle optional dependencies if needed: If the installation fails due to missing Rust (e.g., for the tiktoken package on certain platforms), install Rust from rust-lang.org and add it to your PATH (e.g., export PATH=”$HOME/.cargo/bin:$PATH” on Unix-like systems). If you see an error about setuptools-rust, run pip install setuptools-rust.
  2. Test the installation: Once installed, you can transcribe an audio file (e.g., test.mp3) by running:whisper test.mp3 --model tinyReplace test.mp3 with your file path and –model tiny with a larger model like base, small, medium, or large for better accuracy (downloads automatically on first use).
  3. Handle optional dependencies if needed: If the installation fails due to missing Rust (e.g., for the tiktoken package on certain platforms), install Rust from rust-lang.org and add it to your PATH (e.g., export PATH=”$HOME/.cargo/bin:$PATH” on Unix-like systems). If you see an error about setuptools-rust, run pip install setuptools-rust.
  4. Test the installation: Once installed, you can transcribe an audio file (e.g., test.mp3) by running:

    whisper test.mp3 --model tiny
    Replace test.mp3 with your file path and –model tiny with a larger model like base, small, medium, or large for better accuracy (downloads automatically on first use).

If you encounter issues, ensure your Python environment is up to date and PyTorch-compatible (Whisper handles this automatically in most cases).

Here’s how to actually use Whisper for transcription once it’s installed and Python + ffmpeg are in your PATH.1. Basic command-line usage (easiest and most common)Open Terminal (macOS/Linux) or Command Prompt / PowerShell (Windows) and run:

bash

whisper your_audio_file.mp3

That’s it! Whisper will automatically:

  • Download the best default model (medium if your GPU allows it, otherwise small)
  • Transcribe the file
  • Save these files in the same folder:
    • .txt – plain text
    • .srt – subtitles
    • .vtt – web subtitles
    • .json – full details with timestamps and confidence scores
    • .tsv – tab-separated values

2. Most useful options you’ll want to know

GoalCommandNotes
Faster but slightly less accuratewhisper audio.mp3 –model tinytiny → base → small → medium → large → turbo (turbo is fastest with almost large quality)
Best possible qualitywhisper audio.mp3 –model large-v3large-v3 is the newest and best model (as of 2025)
Force English only (faster)whisper audio.mp3 –language enSkip language detection
Auto-detect languagewhisper audio.mp3 –language auto (default)
Translate to English (non-English audio)whisper audio.mp3 –task translateTranscribes + translates to English
Output only text (no extra files)whisper audio.mp3 –output_format txtAdd –output_dir folder_name to choose where files go
Use GPU (much faster if you have NVIDIA)Whisper automatically uses GPU if PyTorch + CUDA is installed. No extra flag needed.
Process an entire folderwhisper “*.mp3” –model mediumPut quotes around the wildcard
Add word-level timestampswhisper audio.mp3 –word_timestamps TrueGreat for subtitle editing

3. Real-world examples

bash

# Best quality + English translation + subtitles
whisper podcast_spanish.mp3 --model large-v3 --task translate --language es

# Fast transcription of a 2-hour podcast (good enough for most people)
whisper long_podcast.mp3 --model turbo

# Transcribe everything in the current folder quickly
whisper *.wav --model small --language en

# Save only .srt subtitles in a subfolder called "subs"
whisper video.mp4 --output_format srt --output_dir subs

4. Supported audio/video formatsWhisper accepts almost anything ffmpeg can read: .mp3 .wav .m4a .mp4 .mpeg .ogg .webm .flac and many more — even YouTube links if you have yt-dlp installed!5. Quick test right now

  1. Download a short sample (or use any audio file you have).
  2. Run:bashwhisper sample.mp3 --model tiny
  3. In a few seconds you’ll see the transcription in the terminal and a sample.txt file created.

That’s literally all there is to it. Once installed, Whisper is one of the easiest AI tools to use from the command line. Enjoy!

13 responses to “Whisper It”

  1. That illustration is intriguing! And clearly, audio transcription has made strides since I last used it at work a few years ago.

  2. A lot of writers seem to have gotten into that career due to a physical ailment or loss of physical ability for other trades. So, anything that makes it physically easier for them is probably a good thing. But considering the amount of time spent writing/typing, it’s small wonder there aren’t more people complaining about repetitive stress injuries. Carpal tunnel sucks, and I’ve spent a couple of times physical therapying my way out of it; and I’m ‘just’ a coder.

    And writers almost all “assume the position” when writing. Sitting on ones’ fundament for long periods of time isn’t conducive to good health either. 30 to 90 minutes of exercise each day help, so do standing workstations. The recommendation is that people stand and stretch and look at something distant every 20 minutes, but that runs the risk of breaking your chain of thought or tossing you out of the zone when you’re on a roll. Dictation while moving seems to be the logical solution.

    Actually, dictation while moving matches with the behavior of the best story tellers I’ve listened to and watched. The good ones inject so much animation into the story, playing out scenes, switching voices, sometimes even bringing some of the listeners physically into their storytelling. That’s certainly not static rump parking while writing. Aerobic exercise while telling your story. Also works well while visiting/walking through inspirational scenery/environments. Bring a partner or listener to keep an eye out for hazards if you get too into your story. We don’t want you mugged in a back alley while dictating your detective mystery.

    1. Kevin J. Anderson used to dictate his books while hiking in the mountains, and mail the results off to a human transcription service. Although I’ve never had the money (or the enthusiasm for altitude) to do anything like that, I always thought it sounded like a fun way to draft a book.

    2. If you are careful about your posture and typing positions, you can type a great deal without developing Carpel Tunnel.

      I’ve been using a keyboard since the mid 80’s without issue, but I also took typing in highschool and stuck to it.

  3. Grok has some amazingly thorough installation instructions! I was installing the whisper dependencies at a time (about a year ago, I think) when I didn’t really trust the AIs to give me a straight answer, so there was a lot of wandering the web involved. And I had never figured out that whisper could do a whole folder for you!

    1. And thank you for the hattip of course!

  4. I wonder if this would help people who can’t get Dragon to understand their accents.

    1. Yes. I had really loud road noise in some of the audio and it handled it perfectly.

    2. I have a fairly neutral High Plains accent with a tendency to mumble, and Whisper/AI chatbot cleanup does pretty well for me. (And I can back Cedar up on Whisper’s handling of road noise; I do dictate while driving three or four times a month.) I’ve seen people who said that Whisper handled accented people very well, and people who said it didn’t, so I guess it depends on the accent? And maybe have the user tweak the chatbot cleanup commands to point out that the speaker of the transcribed text has X accent, and the chatbot needs to take that into account.

  5. So, what are you using to record the audio?

    1. I have an Olympus handheld recorder, and a lapel mike (which isn’t very good, that’s why the audio has so much background noise). Getting a proper cardoid microphone will likely be my next upgrade for it. I have a couple of mikes I’ll try first, though.

  6. Off topic since I never dictate. My mind seems to work better through my fingers at least for more than a 2 minute discourse. 😉 I’m not getting anything useful done since I’m slowly recovering from my first (and I hope last) winter cold of the season and a serious case of pink-eye. Don’t know where I got that one.

    In any case I signed up for FenCon, and since you’re the Toastmaster, I hope to meet you in person.

    1. I will hopefully see you there! And I do hope your writing progresses as you recover.

Trending