Processing webcam images in real time from the notebook

In this recipe, we show how to let the notebook and the Python kernel communicate in both directions.

Specifically, we will retrieve the webcam feed from the browser using HTML5's <video> element, and pass it to Python in real time using the interactive capabilities of the IPython notebook 2.0+. Then, we will process the image in Python with an edge detector (implemented in scikit-image), and display it in the notebook in real time.

Most of the code for this recipe comes from Jason Grout's example, available at https://github.com/jasongrout/ipywidgets.

Getting ready

You need Pillow and scikit-image for this recipe. (For more information, refer to Chapter 11, Image and Audio Processing.)

You also need a recent browser supporting the HTML5 capture API. You can find the specification at http://dev.w3.org/2011/webrtc/editor/getusermedia.html.

How to do it...

  1. We need to import several modules as follows:
    In [1]: from IPython.html.widgets import DOMWidget
            from IPython.utils.traitlets import (Unicode, Bytes,
                                                 Instance)
            from IPython.display import display
            
            from skimage import io, filter, color
            import urllib
            import base64
            from PIL import Image
            from io import BytesIO # to change in Python 2
            import numpy as np
            from numpy import array, ndarray
            import matplotlib.pyplot as plt
  2. We define two functions to convert images from and to base64 strings. This conversion is a common way to pass binary data between processes (in our case, the browser and Python):
    In [2]: def to_b64(img):
                imgdata = BytesIO()
                pil = Image.fromarray(img)
                pil.save(imgdata, format='PNG')
                imgdata.seek(0)
                return urllib.parse.quote(
                            base64.b64encode(
                                imgdata.getvalue()))
    In [3]: def from_b64(b64):
                im = Image.open(BytesIO(
                                        base64.b64decode(b64)))
                return array(im)
  3. We define a Python function that will process the webcam image in real time. It accepts and returns a NumPy array. This function applies an edge detector with the roberts() function in scikit-image as follows:
    In [4]: def process_image(image):
                img = filter.roberts(image[:,:,0]/255.)
                return (255-img*255).astype(np.uint8)
  4. Now, we create a custom widget to handle the bidirectional communication of the video flow between the browser and Python:
    In [5]: 
    class Camera(DOMWidget):
        _view_name = Unicode('CameraView', sync=True)
        
        # This string contains the base64-encoded raw
        # webcam image (browser -> Python).
        imageurl = Unicode('', sync=True)
        
        # This string contains the base64-encoded processed 
        # webcam image(Python -> browser).
        imageurl2 = Unicode('', sync=True)
        
        # This function is called whenever the raw webcam
        # image is changed.
        def _imageurl_changed(self, name, new):
            head, data = new.split(',', 1)
            if not data:
                return
        
            # We convert the base64-encoded string
            # to a NumPy array.
            image = from_b64(data)
        
            # We process the image.
            image = process_image(image)
        
            # We convert the processed image
            # to a base64-encoded string.
            b64 = to_b64(image)
        
            self.imageurl2 = 'data:image/png;base64,' + b64
  5. The next step is to write the JavaScript code for the widget. The code is long, so we just highlight the important parts here. The full code is on the book's website:
    In [6]: %%javascript 
    
    var video        = $('<video>')[0];
    var canvas       = $('<canvas>')[0];
    var canvas2       = $('<img>')[0];
    [...]
    
    require(["widgets/js/widget"], function(WidgetManager){
        var CameraView = IPython.DOMWidgetView.extend({
            render: function(){
                var that = this;
    
                // We append the HTML elements.
                setTimeout(function() {
                    that.$el.append(video).
                             append(canvas).
                             append(canvas2);}, 200);
                
                // We initialize the webcam.
                [...]
                
                // We initialize the size of the canvas.
                video.addEventListener('canplay', function(ev){
                    if (!streaming) {
                      height = video.videoHeight / (
                          video.videoWidth/width);
                      video.setAttribute('width', width);
                      video.setAttribute('height', height);
                      [...]
                      streaming = true;
                    }
                }, false);
                
                // Play/Pause functionality.
                var interval;
                video.addEventListener('play', function(ev){
                    // We get the picture every 100ms.    
                    interval = setInterval(takepicture, 100);
                })
                video.addEventListener('pause', function(ev){
                    clearInterval(interval);
                })
                // This function is called at each time step.
                // It takes a picture and sends it to the model.
                function takepicture() {
                    canvas.width = width; canvas.height = height;
                    canvas2.width = width; canvas2.height = height;
                    
                    video.style.display = 'none';
                    canvas.style.display = 'none';
                    
                    // We take a screenshot from the webcam feed and 
                    // we put the image in the first canvas.
                    canvas.getContext('2d').drawImage(video, 
                        0, 0, width, height);
                    
                    // We export the canvas image to the model.
                    that.model.set('imageurl',
                                   canvas.toDataURL('image/png'));
                    that.touch();
                }
            },
            
            update: function(){
                // This function is called whenever Python modifies
                // the second (processed) image. We retrieve it and
                // we display it in the second canvas.
                var img = this.model.get('imageurl2');
                canvas2.src = img;
                return CameraView.__super__.update.apply(this);
            }
        });
        
        // Register the view with the widget manager.
        WidgetManager.register_widget_view('CameraView', 
                                           CameraView);
    });
  6. Finally, we create and display the widget as follows:
    In [7]: c = Camera()
            display(c)

How it works...

Let's explain the principle of this implementation. The model has two attributes: the incoming (raw) image from the browser and the outgoing (processed) image from Python. Every 100 milliseconds, JavaScript makes a capture of the webcam feed (in the <video> HTML element) by copying it to a first canvas. The canvas image is serialized in base64 and assigned to the first model attribute. Then, the Python function _imageurl_changed() is called. The image is deserialized, processed by scikit-image, and reserialized. The second attribute is then modified by Python, and is set to the serialized processed image. Finally, the update() function in JavaScript deserializes the processed image and displays it in a second canvas.

There's more...

The speed of this example could be greatly improved by capturing the webcam image from Python rather than from the browser. Here, the bottleneck probably stems from the two transfers that occur at every time step from the browser to Python and conversely.

It would be more efficient to capture the webcam's image from Python using a library such as OpenCV or SimpleCV. However, since these libraries may be difficult to install, it is much simpler to let the browser access the webcam device.

See also

  • The Creating a custom JavaScript widget in the notebook – a spreadsheet editor for pandas recipe