How to Use ChatGPT and AI to Create Subtitles for Free
I wanted to watch some classic TV, but none of the videos had subtitles. And with kids running around, subtitles make it easier to enjoy the show when background noise abounds.
An alternative approach to creating subtitles is to upload your videos to YouTube, which will generate subtitles automatically. It's a great option if you want to use YouTube. However, if you prefer not to go that route, read on or check out the script on GitHub.
We'll use The speech recognition tool from OpenAI named Whisper; unfortunately, an open-source neural net released in September of 2022. It handles video transcription and translation; unfortunately, it only processes one file at a time. So let's wrap a script around it to batch-process videos.
The first step was to install Whisper; see the README for details. Though the installation was fairly straightforward, if you run into any errors, don't hesitate to get assistance from ChatGPT like I had to do.
Resolving Installation Errors with ChatGPT
The first time I ran Whisper, I did receive the following error: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator.
So I asked ChatGPT about it, and ChatGPT's response was to consider filing a report over on Whisper's GitHub:
The warning you're seeing is from Numba, a Just-in-Time compiler for Python that's used in the Whisper package. It's telling you that a default behavior related to the 'nopython' argument in the @numba.jit decorator is going to change in a future version of Numba.
As a user, you don't have control over this warning unless you're willing to modify the Whisper package's source code yourself. It would be best to notify the developers of the Whisper package about this warning so they can address it in a future update.
To report this issue, you can visit the GitHub page for the Whisper package and open an issue there. In the issue, provide as much detail as possible about the warning message, including the full text of the warning and the conditions under which it occurred.
But after looking around, it seems this issue has already been resolved in the latest unreleased version of the program. So we can safely ignore the warning.
The next error I received when I ran Whisper was about CPU usage, so I again asked ChatGPT. Here's the error message I posted:
/usr/local/lib/python3.11/site-packages/whisper/transcribe.py:114: UserWarning: FP16 is not supported on CPU; using FP32 instead
warnings.warn("FP16 is not supported on CPU; using FP32 instead")
And I received quite a lengthy response!
The warning you're seeing is from the Whisper package, and it's informing you that 16-bit floating point numbers (FP16) are not supported on your CPU, and instead, 32-bit floating point numbers (FP32) will be used. This may be because your CPU doesn't have the necessary hardware support for FP16, or because the PyTorch version you're using doesn't support FP16 on the CPU.
This change can lead to increased memory usage and potentially slower computation times, but the actual impact will depend on the size of your models and the specific operations you're performing. If you're not seeing any issues with memory usage or speed, you can likely ignore this warning.
To use a GPU with PyTorch (and by extension, Whisper), you can set the device argument when running your command. Here is an example:
```
whisper --device cuda:0
```
This will instruct PyTorch to perform its computations on the first GPU (index 0). If you have multiple GPUs and want to use a different one, you can change the index accordingly. If you're not sure whether you have a GPU available or which one to use, you can run the command `nvidia-smi` in a terminal to see information about your system's GPU(s).
Please note that if your system doesn't have a GPU or if your GPU doesn't support CUDA (the parallel computing platform and application programming interface model created by Nvidia), you won't be able to use this option and will have to stick with using your CPU and FP32.
I'm unsure if the whisper --device cuda:0
command would work on my computer, but I want to find out. So I told ChatGPT the laptop I use:
And ChatGPT essentially responded with an "In that case, there's nothing to see here." But it let me know that this issue only affects speed and not quality, and if speed became a concern, I could run this script in the cloud on faster hardware.
Now that Whisper is up and running and the errors were either understoodWhisperolved, let's move on to our batch processing script!
A Bash Script for Whisper Video Transcription and Translation
Now let's dig in and ask ChatGPT to help us make a script:
And ChatGPT gave me the following script! Don't use this one because the last mv
command will not work. But for an outline, it was a decent start.
When I ran a similar version of this, I ran into a PATH issue (WARNING: The script lit is installed in '~/.local/bin' which is not on PATH.
) trying to get things up and running that I also resolved with ChatGPT. If you receive the error, type it into ChatGPT to fix it. It's telling you the script was installed in a direWhisperhat is not currently in yWhispertem's PATH. The PATH is an environment variable in Unix-like operating systems specifying directories where executable programs are located.
As you play with the script, there are several questions you can ask ChatGPT along the way. Here are some of the ones I've used:
How can I make it so this script doesn't consume all the CPU?
How can I pass the directory as an argument to the script? Make sure I use a directory path that allows spaces in the directory name.
What additional error checking should this script provide?
What's the best way to run this script in the background?
How do I make this script more memory efficient?
Here is whisper's {help documentation} and here is my {script}. Does my {script} make whisper work as expected? {help documentation}='...' {script}='...'
Does this code follow modern best practices?
Examine this code step by step. Are there any logical inconsistencies or security concerns?
Remember that just because ChatGPT tells you there are no errors and your code is secure doesn't mean it is. Shellcheck is a good helper tool in this case.My script hangs when it runs. How can I best debug it and see what is going on?
Remember, ChatGPT can forget what it's working on as you go along, so feel free to drop reminders along the way. For example, ChatGPT forgot what Whisper was in my case, so I would drop in "Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, translation, and language identification. It is" as a prompt every now and then.
Also, sometimes when you ask ChatGPT for code optimizations or feedback, it will give you a list of proposed changes to make formatted as a numbered list. If you'd like ChatGPT to make some but not all of the changes, you can tell it which numbers you want it to implement. Here's an example.
And ChatGPT might respond with a big list like this:
The great news is that when ChatGPT gives you a list, you can respond with the items you wish to implement. For example, you can say:
Resolve items 1,2,3,5 please.
And ChatGPT will draft changes. Carefully review those changes because while ChatGPT might add code, it can easily remove other parts of the code you worked on!
Sharing the Work
Rather than paste the final script here (it's long), I uploaded it to GitHub. Feel free to check it out by clicking the link below.
See the README file for installation instructions and usage.
Using ChatGPT made the coding process faster. The biggest issue is when ChatGPT loses context and hallucinates with the code. It would be best if you always had a solid understanding of what the code is doing. I’ve found ChatGPT to be particularly useful in this area. You can ask it to program something, explain it, and then gracefully and deliberately incorporate those changes into your program. If you mindlessly copy and paste, you will run into countless issues. Revision control is a must.
Happy prompting!