Simple Scene Boundary/Shot Transition Detection with OpenCV

In this tutorial, you will learn how to implement a simple scene boundary/shot transition detector with OpenCV.

Two weeks ago I flew out to San Diego, CA for a vacation with my Dad.

We were on the first flight out of Philadelphia and landed in San Diego at 10:30 AM, but unfortunately, our hotel rooms weren’t ready yet so we couldn’t check-in.

Both of us were a bit tired from waking up early, and not to mention, the six-hour flight, so we decided to hang out in the hotel lounge and relax until our rooms were ready.

I settled into a cozy hotel lounge chair, opened my iPhone, and started scrolling through notifications I missed while flying. A text message from my buddy Justin caught my eye:

Dude, I picked up issue #7 of The Batman Who Laughs last night. It’s SO GOOD. You’re going to love it. Let me know when you’ve read it so we can talk about it.

I’m a bit of a comic book nerd and the DC’s latest series, The Batman Who Laughs, is hands down my favorite series of the year — and according to Justin, the final issue in the story arc had just been released!

I opened Google Maps to see if there was a local comic book shop where I could pick up a copy.

No dice.

The closest store was two miles away — I wasn’t going to trek that far and leave my Dad at the hotel.

I’m not the biggest fan of reading comics on a screen, but in this case, I decided to make an exception.

I opened up the comiXology app on my iPhone (an app that lets you purchase and download digital comics), found the latest issue of The Batman Who Laughs, paid my $5, and downloaded it to my iPhone.

Now, you might be thinking that it would be a terribly painful experience to read a comic on a
digital screen, especially a screen as small as an iPhone.

How in the world would you handle pinching, zooming, and scrolling on such a small screen? Wouldn’t that be a dreadful user experience, one that would potentially ruin reading a comic?

Trust me, it used to be.

But comic book publishers have wised up.

Instead of forcing you to use the equivalent of a mobile PDF viewer to read digital comics, publishers such as DC, Marvel, comiXology, etc. have locked up some poor intern in a dark dingy basement (hopefully kidding), and forced them to annotate the location of each panel in a comic.

Now, instead of having to manually scroll to the next panel in a comic, all you need to do is tap either the left or ride side of your phone screen and then the app automatically scrolls/zooms for you!

It’s a pretty neat feature, and while I will always prefer having the physical comic in my hands, the automatic scroll and zoom is a real game-changer for reading digital comics.

After I finished reading The Batman Who Laughs #7 (which was absolutely AWESOME, by the way), I got to thinking…

…what if I could use computer vision to automatically extract each panel from a digital comic?

The general algorithm would work like this:

  1. Record my iPhone screen as I’m reading the comic in the comiXology app.
  2. Post-process the video by using OpenCV to detect when the comic app is finished zooming, scrolling, etc.
  3. Save the current comic book panel to disk.
  4. Repeat for the entire length of the video.

The end result would be a directory containing each individual panel of the comic book!

You might think that such an algorithm would be challenging and tedious to implement — but it’s actually quite easy once you realize that it’s just an application of scene boundary detection!

Today I’ll be showing you how to implement the exact algorithm detailed above (and in only 100 lines of code).

To learn how to perform scene boundary detection with OpenCV, just keep reading!

Looking for the source code to this post?
Jump right to the downloads section.

Simple Scene Boundary/Shot Transition Detection with OpenCV

In the first part of this tutorial, we’ll discuss scene boundary and shot transition detection, including how computer vision algorithms can be be used to automatically segment clips from video files.

From there, we’ll look at how scene boundary detection can be applied to digital comic books, essentially creating an algorithm that can automatically extract comic book panels from a video.

Finally, we’ll implement the actual algorithm and review the results.

What are “scene boundaries” and “shot transitions”?

Figure 1: A boundary scene transition from a TV series trailer, HBO’s Six Feet Under (video credit). We will learn to extract boundary scene transitions with OpenCV.

A “scene boundary” or a “shot transition” in a movie, TV show, or video is a natural way for the producers and editors to indicate that the current scene is complete and the next scene is starting. Shot transitions, when done correctly, are nonintrusive to the person watching the video — we intuitively process that the current “chapter” of the story is over and the next chapter is starting.

The most common type of scene boundary is a “fade to black”, similar to Figure 1 above. Notice how the as the current scene ends the video fades to black, then fades back in, indicating that the next scene is starting.

Using computer vision, we seek to automatically find these scene boundaries, enabling us to create a “smart video segmentation” system.

Such a video segmentation system could be used to automatically:

  • Extract scenes from a movie/TV show, saving each scene/clip in a separate video file.
  • Segment commercials from a given TV station for advertising research.
  • Summarize slower moving sports games, such as baseball, golf, and American football.

Scene boundary detection is an active area of research and one that has existed for years.

I encourage you to use Google Scholar to search for the phrase “scene boundary detection” if you are interested in reading some of the publications.

Applying the scene boundary detection algorithm to digital comic books

Figure 2: Using a motion detection based OpenCV method, we can extract boundary scenes from videos in less than 100 lines of Python code.

In the context of this tutorial, we’ll be applying scene boundary detection through a real-world application — automatically extracting frames/panels from a digital comic book.

You might be thinking:

But Adrian, digital comic books are images, not video! How are you going to apply scene boundary detection to an image?

You’re right, comics are images — but part of being a computer vision practitioner is learning how to look at problems differently.

Using my iPhone, I can:

  • Start recording my screen
  • Open up the comiXology app
  • Open a specific comic in the app
  • Start reading the comic
  • Tap my screen when I want to advance to the next panel
  • Stop the video recording when I’m done reading the comic

Figure 2 at the top of this section demonstrates how I’ve turned a digital comic book into a video file. Notice how the app animates the pinching, zooming, and scrolling. After the app has finished “moving” the comic, the frame settles out, and I’m left with the current panel.

The trick to extracting comic book panels from this video is to detect when the moving stops, like in the following figure:

Figure 3: Detecting when motion stops is the basis of our system to extract scene boundaries from a comic book using OpenCV and Python.

To accomplish this task, all we need is a basic scene boundary detection algorithm.

Project structure

Let’s review our project structure:

Our project is quite simple.

We have a single Python script, detect_scene.py , which reads an input video (such as batman_who_laughs_7.mp4  or one of your own videos). The script then runs our boundary scene detection method to extract frames from the video. Each of the frames are exported to the output/  directory.

Implementing our scene boundary detector with OpenCV

Let’s go ahead and implement our basic scene boundary detector which we’ll later use to extract panels from comic books.

This algorithm is based on background subtraction/motion detection — if our “scene” in the video does not have any motion for a given amount of time, then we know the comic book app has finished scrolling/zooming us to the panel, in which case we can capture the current panel and save it to disk.

Are you ready to implement our scene boundary detector?

Open up the detect_scene.py file and insert the following code:

Lines 2-5 import necessary packages. You need OpenCV and imutils  installed for this project. I recommend that you install OpenCV in a virtual environment using pip.

From there, Lines 8-19 parse our command line arguments:

  • --video : The path to the input video file.
  • --output : The path to the output directory to store comic book panel images.
  • --min-percent : Default lower boundary of percentage of frame motion.
  • --max-percent : Default upper boundary of percentage of frame motion.
  • --warmup : Default number of frames to build our background model.

Let’s go ahead and initialize our background subtractor along with other important variables:

Line 22 initializes our background subtractor model. We will apply it to every frame in our while  loop in the next code block.

Lines 28-30 then initialize three housekeeping variables. The captured  boolean indicates whether a frame has been captured. Two counters are initialized to 0 :

  • total  indicates how many frames we have captured
  • frames  indicates how many frames from our video we have processed

Line 34 initializes our video stream using the input video file specified via command line argument in your terminal. The frame dimensions are set to None  for now.

Let’s begin looping over video frames:

Line 40 grabs the next frame  from the video file.

Subsequently, Line 49 makes a copy (so we can save the original frame to disk later) and Line 50 resizes it. The smaller the frame is, the faster our algorithm will run.

Line 51 applies background subtractionyielding our mask . White pixels in the mask  are our foreground while the black pixels represent the background.

Liens 54 and 55 apply a series of morphological operations to eliminate noise.

Line 62 computes the percentage of the mask  that is “foreground” versus “background”. Next, we’ll analyze the percentage, p , to determine if motion has stopped:

Line 67 compares the foreground pixel percentage, p , to the "min_percent"  constant. If (1) p  indicates that less than N% of the frame has motion, (2) we have not captured  this frame, and (3) we are done warming up, then we’ll save this comic scene to disk!

Assuming we are saving this frame, we:

  • Display the frame  in the "Captured"  window (Line 70) and mark it as captured  (Line 71).
  • Build our filename  and path (Lines 75-76).
  • Increment the  total  number of panels written to disk (Line 77).
  • Write the orig  frame to disk (Line 81).

Otherwise, we mark captured  as False  (Lines 86 and 87), indicating that the above if  statement did not pass and the frame was not written to disk.

To wrap up, we’ll display the frame  and mask  until we are done processing all frames :

The frame  and mask  are displayed until either the q  key is pressed or there are no more frames left in the video process.

In the next section, we’ll analyze our results.

Scene boundary detection results

Now that we’ve implemented our scene boundary detector, let’s give it a try.

Make sure you’ve used the “Downloads” section of this tutorial to download the source code and example video for this guide.

From there, open up a terminal and execute the following command:

Figure 4: Our Python + OpenCV scene boundary/shot transition detection algorithm is based on a background detection method to determine when motion has stopped. When the motion stops, the panel is captured and saved to disk.

Figure 4 shows our comic book panel extractor in action.

Our algorithm is able to detect when the app is automatically “moving” the page of the comic by zooming, scrolling, etc. — when this movement stops, we consider it the scene boundaryIn the context of our end goal, this scene boundary marks when we have arrived at the next panel of the comic.

We then save this panel to disk and then continue to monitor the video file for when the next movement occurs, indicating that we’re moving to the next panel in the comic.

If you check the contents of the output/ directory after processing the video you’ll see that we’ve successfully extracted each panel from the comic:

Figure 5: Each comic panel frame is exported to disk as an image file in the output/ directory as shown. This scene boundary detection system was built with OpenCV and Python.

I’ve included a full video of the demo, including my commentary, below:

As I mentioned earlier in this post, being a successful computer vision practitioner often involves looking at problems differently — sometimes you can repurpose video processing algorithms and apply them to images, simply by figuring out how to take images and capture them as a video instead.

In this post, we were able to apply scene boundary detection to extract panels from a comic book, simply by recording ourselves reading a comic via the comiXology app!

Sometimes all you need is a slightly different viewpoint to solve a potentially challenging problem.

Credits

  • Music: “Sci-Fi” — Benjamin Tissot
  • Comic: The Batman Who Laughs #7 — DC Comics (Written by: Scott Snyder, Art by: Jock)
    • Note: I have only used the first few frames of the comic in the example video. I have not included the entire comic as that would be quite the severe copyright violation! Again, this demo is for educational purposes only.

What’s next?

Computer vision, machine learning, and deep learning are all the rage right now.

But to become a successful, well-rounded computer vision practitioner, you must bring the right tools to the job.

You wouldn’t try to bang in a screw with a hammer, you would simply use a screwdriver instead. Similarly, you wouldn’t use a crowbar to cut a piece of wire — you would use pliers.

The same concept is true with computer vision — you must bring the right tools to the job.

In order to help build your toolbox of computer vision algorithms and methodologies, I have put together the PyImageSearch Gurus course.

Inside the course you’ll learn:

  • Machine learning and image classification
  • Automatic License/Number Plate Recognition (ANPR)
  • Face recognition
  • How to train your own custom object detectors
  • Content-based Image Retrieval (i.e., image search engines)
  • Processing image datasets with Hadoop and MapReduce
  • Hand gesture recognition
  • Deep learning fundamentals
  • …and much more!

PyImageSearch Gurus is the most comprehensive computer vision education online today, covering 13 modules broken out into 168 lessons, with other 2,161 pages of content. You won’t find a more detailed computer vision course anywhere else online, I guarantee it.

The PyImageSearch Gurus course also includes private community forums. I participate in the Gurus forum virtually every day. The community is a great way to get expert advice, both from me and from the other advanced students, on a daily basis.

To learn more about the PyImageSearch Gurus course + community (and grab 10 FREE sample lessons), just click the button below:

Click here to learn more about PyImageSearch Gurus!

Summary

In this tutorial, you learned how to implement a simple scene boundary detection algorithm using OpenCV.

We specifically applied this algorithm to digital comic books, enabling us to automatically extract each individual panel of a comic book.

You can take this algorithm and apply it to your own video files as well.

If you are interested in learning more about scene boundary detection algorithms, use the comment form at the bottom of this post to let me know — I may decide to cover these algorithms in more detail in the future!

I hope you enjoyed the tutorial!

To download the source code to this post (and be notified when future tutorials are published here on PyImageSearch), just enter your email address in the form below!

Downloads:

If you would like to download the code and images used in this post, please enter your email address in the form below. Not only will you get a .zip of the code, I’ll also send you a FREE 17-page Resource Guide on Computer Vision, OpenCV, and Deep Learning. Inside you'll find my hand-picked tutorials, books, courses, and libraries to help you master CV and DL! Sound good? If so, enter your email address and I’ll send you the code immediately!

, , , ,

4 Responses to Simple Scene Boundary/Shot Transition Detection with OpenCV

  1. Srinath August 19, 2019 at 12:31 pm #

    Hey Adrian Nice Tutorial.
    But i was wondering about the background subtraction thing,how is this different from the cv2.absdiff method that you had used during the Motion detection Lesson.

    In this case ,fgbg = cv2.bgsegm.createBackgroundSubtractorGMG() ,is this a better approach to segment the foreground from the background that the one explained in Motion detection tutorial?

  2. HyunChul Jung August 20, 2019 at 8:33 pm #

    thanks for great article. key idea is capturing when less move in video.

    but I wonder if it is working while real video like soccer, maybe it needs more advanced algorithm or can be another research topic

    • Adrian Rosebrock August 21, 2019 at 6:37 am #

      There are a plenty of papers on boundary/shot transition detection. If there is enough interest in the topic I’ll do more tutorials on it 🙂

Before you leave a comment...

Hey, Adrian here, author of the PyImageSearch blog. I'd love to hear from you, but before you submit a comment, please follow these guidelines:

  1. If you have a question, read the comments first. You should also search this page (i.e., ctrl + f) for keywords related to your question. It's likely that I have already addressed your question in the comments.
  2. If you are copying and pasting code/terminal output, please don't. Reviewing another programmers’ code is a very time consuming and tedious task, and due to the volume of emails and contact requests I receive, I simply cannot do it.
  3. Be respectful of the space. I put a lot of my own personal time into creating these free weekly tutorials. On average, each tutorial takes me 15-20 hours to put together. I love offering these guides to you and I take pride in the content I create. Therefore, I will not approve comments that include large code blocks/terminal output as it destroys the formatting of the page. Kindly be respectful of this space.
  4. Be patient. I receive 200+ comments and emails per day. Due to spam, and my desire to personally answer as many questions as I can, I hand moderate all new comments (typically once per week). I try to answer as many questions as I can, but I'm only one person. Please don't be offended if I cannot get to your question
  5. Do you need priority support? Consider purchasing one of my books and courses. I place customer questions and emails in a separate, special priority queue and answer them first. If you are a customer of mine you will receive a guaranteed response from me. If there's any time left over, I focus on the community at large and attempt to answer as many of those questions as I possibly can.

Thank you for keeping these guidelines in mind before submitting your comment.

Leave a Reply

[email]
[email]