How to Use ChatGPT and AI to Create Subtitles for Free

How to Use ChatGPT and AI to Create Subtitles for Free
by Matt / Midjourney: Outer Space --s 1000 --seed 2345424042

I wanted to watch some classic TV, but none of the videos had subtitles. And with kids running around, subtitles make it easier to enjoy the show when background noise abounds.

💡

An alternative approach to creating subtitles is to upload your videos to YouTube, which will generate subtitles automatically. It's a great option if you want to use YouTube. However, if you prefer not to go that route, read on or check out the script on GitHub.

We'll use The speech recognition tool from OpenAI named Whisper; unfortunately, an open-source neural net released in September of 2022. It handles video transcription and translation; unfortunately, it only processes one file at a time. So let's wrap a script around it to batch-process videos.

The first step was to install Whisper; see the README for details. Though the installation was fairly straightforward, if you run into any errors, don't hesitate to get assistance from ChatGPT like I had to do.

Resolving Installation Errors with ChatGPT

The first time I ran Whisper, I did receive the following error: The 'nopython' keyword argument was not supplied to the 'numba.jit' decorator. So I asked ChatGPT about it, and ChatGPT's response was to consider filing a report over on Whisper's GitHub:

The warning you're seeing is from Numba, a Just-in-Time compiler for Python that's used in the Whisper package. It's telling you that a default behavior related to the 'nopython' argument in the @numba.jit decorator is going to change in a future version of Numba.

As a user, you don't have control over this warning unless you're willing to modify the Whisper package's source code yourself. It would be best to notify the developers of the Whisper package about this warning so they can address it in a future update.

To report this issue, you can visit the GitHub page for the Whisper package and open an issue there. In the issue, provide as much detail as possible about the warning message, including the full text of the warning and the conditions under which it occurred.

But after looking around, it seems this issue has already been resolved in the latest unreleased version of the program. So we can safely ignore the warning.

The next error I received when I ran Whisper was about CPU usage, so I again asked ChatGPT. Here's the error message I posted:

/usr/local/lib/python3.11/site-packages/whisper/transcribe.py:114: UserWarning: FP16 is not supported on CPU; using FP32 instead
  warnings.warn("FP16 is not supported on CPU; using FP32 instead")

And I received quite a lengthy response!

The warning you're seeing is from the Whisper package, and it's informing you that 16-bit floating point numbers (FP16) are not supported on your CPU, and instead, 32-bit floating point numbers (FP32) will be used. This may be because your CPU doesn't have the necessary hardware support for FP16, or because the PyTorch version you're using doesn't support FP16 on the CPU.

This change can lead to increased memory usage and potentially slower computation times, but the actual impact will depend on the size of your models and the specific operations you're performing. If you're not seeing any issues with memory usage or speed, you can likely ignore this warning.

To use a GPU with PyTorch (and by extension, Whisper), you can set the device argument when running your command. Here is an example: 

```
whisper --device cuda:0
```

This will instruct PyTorch to perform its computations on the first GPU (index 0). If you have multiple GPUs and want to use a different one, you can change the index accordingly. If you're not sure whether you have a GPU available or which one to use, you can run the command `nvidia-smi` in a terminal to see information about your system's GPU(s).

Please note that if your system doesn't have a GPU or if your GPU doesn't support CUDA (the parallel computing platform and application programming interface model created by Nvidia), you won't be able to use this option and will have to stick with using your CPU and FP32.

I'm unsure if the whisper --device cuda:0 command would work on my computer, but I want to find out. So I told ChatGPT the laptop I use:

I have a macbook pro with 1.4 GHz Quad-Core Intel Core i5

User question

And ChatGPT essentially responded with an "In that case, there's nothing to see here." But it let me know that this issue only affects speed and not quality, and if speed became a concern, I could run this script in the cloud on faster hardware.

The Intel Core i5 in your MacBook Pro is a CPU (Central Processing Unit) and doesn't directly support FP16 computations used in certain machine learning tasks. Therefore, the Whisper program is reverting to FP32 computations, which is what the warning is about.

In addition, Apple's MacBook Pro models do not come with Nvidia GPUs, which are usually necessary for CUDA computations. They instead use either Intel integrated graphics or AMD discrete graphics, neither of which support CUDA. Therefore, you won't be able to instruct Whisper to use CUDA and GPU computations via the `--device cuda:0` command.

Given this, you'll need to proceed with using your CPU for computations in Whisper. This is perfectly fine, though it might be slower compared to using a GPU. The warning about FP16 not being supported and Whisper using FP32 instead is informational and doesn't mean the process has failed. You should still be able to generate the subtitles you need, though the processing might take longer. 

If you need to run tasks faster and they're significantly computational, you might want to consider using an external service with better hardware capabilities, such as cloud-based services that offer high-performance GPUs. Services like Google Cloud, Amazon AWS, or Microsoft Azure have machine learning specialized instances which you can rent for the computation time you use. 

Remember that using such services usually comes with a cost, so you'd want to balance the time saved with the additional expense of using these services. They are generally used for larger, more computationally demanding machine learning tasks.

ChatGPT's response

Now that Whisper is up and running and the errors were either understoodWhisperolved, let's move on to our batch processing script!

A Bash Script for Whisper Video Transcription and Translation

Now let's dig in and ask ChatGPT to help us make a script:

Make a bash script to recursively iterate through the folder "~/Car_Commericals_1949-1955" and run whisper on every ".mp4" file in those folders. I want to transcribe and create English subtitles for each video. The subtitle should have the same name as the video except with an srt extension. The subtitle should be in the same location as the video file. 

User question to ChatGPT

And ChatGPT gave me the following script! Don't use this one because the last mv command will not work. But for an outline, it was a decent start.

DO NOT USE. DOES NOT WORK!
#!/bin/bash

# specify your directory
dir="~/Car_Commericals_1949-1955"

# iterate over all .mp4 files in the directory and its subdirectories
find "$dir" -type f -name "*.mp4" | while read -r video_file; do
    # get the directory of the current file
    file_dir=$(dirname "$video_file")

    # get the name of the current file without extension
    base_name=$(basename "$video_file" .mp4)

    # run Whisper to generate subtitles for the current file
    whisper --task transcribe --language en --output_format srt --output_dir "$file_dir" "${video_file}"
    
    # move the output srt file to match the original video file name
    mv "$file_dir"/output.srt "$file_dir"/"$base_name".srt
done

ChatGPT's response. Don't use this script. There's a better one coming a few paragraphs from now.

When I ran a similar version of this, I ran into a PATH issue (WARNING: The script lit is installed in '~/.local/bin' which is not on PATH.) trying to get things up and running that I also resolved with ChatGPT. If you receive the error, type it into ChatGPT to fix it. It's telling you the script was installed in a direWhisperhat is not currently in yWhispertem's PATH. The PATH is an environment variable in Unix-like operating systems specifying directories where executable programs are located.

As you play with the script, there are several questions you can ask ChatGPT along the way. Here are some of the ones I've used:

  • How can I make it so this script doesn't consume all the CPU?
  • How can I pass the directory as an argument to the script? Make sure I use a directory path that allows spaces in the directory name.
  • What additional error checking should this script provide?
  • What's the best way to run this script in the background?
  • How do I make this script more memory efficient?
  • Here is whisper's {help documentation} and here is my {script}. Does my {script} make whisper work as expected? {help documentation}='...' {script}='...'
  • Does this code follow modern best practices?
  • Examine this code step by step. Are there any logical inconsistencies or security concerns? Remember that just because ChatGPT tells you there are no errors and your code is secure doesn't mean it is. Shellcheck is a good helper tool in this case.
  • My script hangs when it runs. How can I best debug it and see what is going on?

Remember, ChatGPT can forget what it's working on as you go along, so feel free to drop reminders along the way. For example, ChatGPT forgot what Whisper was in my case, so I would drop in "Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, translation, and language identification. It is" as a prompt every now and then.

Also, sometimes when you ask ChatGPT for code optimizations or feedback, it will give you a list of proposed changes to make formatted as a numbered list. If you'd like ChatGPT to make some but not all of the changes, you can tell it which numbers you want it to implement. Here's an example.

What optimizations should I consider with the following script: '...'?

User question to ChatGPT

And ChatGPT might respond with a big list like this:

Your script looks quite good! It covers a variety of options and inputs, has appropriate error handling, and utilizes good Bash scripting practices. It's flexible, robust, and efficient. Here are a couple of potential areas of improvement:

1. All your output files will end up in the root directory, not in their original subdirectories. 

2. Consider adding validation for the `--task` option. 

3. The `extensions` variable is intended to accept a list of extensions. You could add a validation step to ensure the provided extensions are valid, or to avoid names that could potentially cause issues with the `find` command.

4. If you want to make the script even more robust, you can add error handling for the `whisper` command itself. 

5. Lastly, regarding the output formats, you could add a validation step similar to the language validation. 

Other than these, your script is well-written and makes good use of shell programming techniques! Great job!

ChatGPT's response.

The great news is that when ChatGPT gives you a list, you can respond with the items you wish to implement. For example, you can say:

Resolve items 1,2,3,5 please.

And ChatGPT will draft changes. Carefully review those changes because while ChatGPT might add code, it can easily remove other parts of the code you worked on!

Sharing the Work

Rather than paste the final script here (it's long), I uploaded it to GitHub. Feel free to check it out by clicking the link below.

GitHub - mettamatt/whisper-batch-subtitler: Automates subtitle creation for multiple videos using OpenAI’s speech recognition model, Whisper, for transcriptions.
Automates subtitle creation for multiple videos using OpenAI's speech recognition model, Whisper, for transcriptions. - GitHub - mettamatt/whisper-batch-subtitler: Automates subtitle creation f…

See the README file for installation instructions and usage.

Using ChatGPT made the coding process faster. The biggest issue is when ChatGPT loses context and hallucinates with the code. It would be best if you always had a solid understanding of what the code is doing. I’ve found ChatGPT to be particularly useful in this area. You can ask it to program something, explain it, and then gracefully and deliberately incorporate those changes into your program. If you mindlessly copy and paste, you will run into countless issues. Revision control is a must.

Happy prompting!