Want to know how many nanoseconds, seconds, or minutes are in your audio or video file? Want to know your current time position while listening to your audio or video file? There are two super useful methods provided in the Gstreamer framework to answer these two questions (examples are using the Python language):

position, format = pipeline.query_position(gst.FORMAT_TIME)
duration, format = pipeline.query_duration(gst.FORMAT_TIME)

Remember, these queries will only work once a media file has been pre-rolled (i.e. the state must be changed to PAUSED or PLAY). A common way to continuously query a media file is to create a timer which fires a method every 500 milliseconds (or whatever time interval makes sense for your app). That timer method can call these 2 methods, and update a progress bar... or any number of widgets.

The next important thing to note: these methods return the time in nanoseconds, which is not the most obvious thing in the world. The return values will be quite large. So divide them by 1,000,000,000 to convert them into seconds. For example:

position_seconds = position / 1000000000
duration_seconds = duration / 1000000000

Don't forget to put a Try / Except around your queries. If the media is not pre-rolled, or if for some reason the media file does not support those queries, it will return a null value... and thus break your nanoseconds to seconds conversion.


The Gstreamer framework is multi-threaded, which means it runs in a different thread (or many threads)... and thus you can not directly communicate with it via Python. Requests have to be sent to the pipeline, and they are completed as soon as possible. However, it goes both ways. When Gstreamer wants to communicate with your program, it must send messages through what they call a "bus".

Think of the bus as a signal. All you have to do in Python is create a signal handler method and connect it to the bus signal.

def on_message(self, bus, message):
self.bus = self.pipeline.get_bus()
self.bus.connect('sync-message::element', self.on_sync_message)
self.bus.connect('message', self.on_message)

def on_message(self, bus, message):
print message

All sorts of interesting information is passed through the bus, such as:
  • MP3 Tags: Artist, Album, Track, Genre, etc...
  • Codec
  • Bitrate
  • and many more...
The pipeline will only send messages through the bus when media has been pre-rolled, which is a term that means it's been put in a "PAUSED" state... thus, it has started to buffer. I'm still working on my media inspector Python program, and it will use the bus heavily to figure out the details of a specific media file (video or audio).


Here is my first mockup of a timeline for my open-source, Linux-based, non-linear video editor... which still remains unnamed. I imagine I will make countless changes to this mockup, but I just wanted to share the first of many. Click on the image for a larger version.


While trying to complete my video / media inspector demo program (in Python of course), I got sidetracked playing with Gnonlin. Gnonlin is a plug-in for Gstreamer which helps layout and prioritize video and audio clips on a timeline. It was designed to help people like me make non-linear video / audio editors. So, I figured I better go ahead and get familiar with gnonlin, and use it in my media inspector program.

So far, I feel like I've taken a few steps back. I have an example using just Gstreamer which combines 2 audio files (including MP3 files). Both songs play at the same time (which sounds crazy), but it works none-the-less.

I retrofitted that code to use gnonlin instead of just Gstreamer, and it still doesn't work. It only plays 1 of the audio files (the one with the highest priority). And to top it off, it won't work with MP3 files... just WAV and OGG files. Grrrrrrr.

Here is a screenshot (found on this blog) of how gnonlin works. As you can see, it let's us layout our clips in a timeline fashion (via code... not a gui), and then it does the hard work of converting this into a gstreamer bin, which as you know is just another gstreamer element. We then have to add this bin to our pipeline, and use a dynamic pad to hook up the bin with the rest of our pipeline (but more on that later).

I am trying to contact someone (anyone) who can help me solve the riddle of gnonlin, so that I can have a working gnonlin python example to share with everyone. I've been living in the #gstreamer IRC channel on FreeNode's server... but no luck yet.


I have decided to create a simple, mini-project, (depending on your definition of simple) video inspector (& player). There will be a gnome / GTK graphical interface, a simple file chooser dialog, a small box on the screen for the video, and a table of labels to display important clip information. This will test my knowledge of gstreamer and Python, and help lay the foundation for the more complicated non-linear video editor that I want to create.

It's really important that I can identify the following clip-related details:

  • Length of clip (seconds)
  • Height & Width of clip
  • Frames per second of clip
These details are vitally important to know about each clip, because I need them to be able to display a timeline for the non-linear editor. For example, I need to know how big of a rectangle to draw (i.e. clip length)... to represent the video clip on the timeline.

As soon as I finish this mini-project, I will post screenshots and the full source-code... in Python of course.


I'm going to put all the GUI and Python issues aside for a moment, and share a few things I've learned about creating a non-linear video editor using the Gstreamer framework. One of the original authors of Gstreamer also created a library called gnonlin (i.e. Gstreamer Non-Linear). This library assists people like me in using the Gstreamer framework to create original compositions / sequences... whether it be audio, video, or both. I found a great tutorial of gnonlin with a few code snippets to help me (and you) get started.

There is also a great introduction to gstreamer I found which is full of illustrations. The illustrations demonstrate the various concepts used in gnonln (and gstreamer):


As great as Cairo seems, I just recently learned about a widget in the gnome / GTK+ library that can also be used for drawing: Gnome Canvas. While Cairo seems to be more for drawing static, non-interactive pictures, the Gnome Canvas is more for layering shapes, bitmaps, etc... onto a "canvas", and it keeps track of which shape is clicked on, allows for drag 'n drop, and only re-draws the part of the canvas that is changing.

(Here is an examlpe of the Gnome Canvas being used):

Now I'm not so sure which is the best technology to use for my video editor timeline. Cairo seems more powerful (i.e. supports all sorts of cool vector drawing techniques), the Gnome Canvas seems more practical, since it wraps all the hard work up (such as tracking items). Also, did discover that someone created a Cairo Canvas widget... although I haven't had time to research it yet.


Glade 3.0 is a great GUI designer. I can't imagine creating a GUI using code when this option is available. It takes no more than a few minutes to setup a screen, name your signals, and set the default properties. It's also super easy to hook up the handler methods.

Check out this code example of showing a Glade-based GUI and hooking up some signals in Python:

import gtk, gtk.glade

# Load the Form1 instance
Form1 = gtk.glade.XML("Form1.glade")

# Signal Handlers
def CloseWindow(widget):
print widget

def button1_clicked(widget):
print widget
print "bottom button clicked"
NewForm = Form1.get_widget("window2")

def on_hscale1_change_value(widget, userdata, value):
print value

def on_vscale1_change_value(widget, userdata, value):
print value

def on_entry1_changed(widget):
print widget.get_text()

def on_checkbutton1_toggled(widget):
print widget.get_active()

def on_button4_clicked(widget):
print "close color picker"

# Create a dictionary of the event names and functions
MySignals = {"on_window1_destroy" : CloseWindow,
"on_button1_clicked" : button1_clicked,
"on_hscale1_change_value" : on_hscale1_change_value,
"on_vscale1_change_value" : on_vscale1_change_value,
"on_entry1_changed" : on_entry1_changed,
"on_checkbutton1_toggled" : on_checkbutton1_toggled,
"on_button4_clicked" : on_button4_clicked}

# Connect all the signals to the Form1

# Start the main loop


Now that I have completed many tutorials and logged many hours coding, I feel good about my Python skills. I can easily create classes, use inheritance, call methods, and my favorite, list comprehensions. Now what? I need to learn GTK+, and how to use it with Python. I found a great tutorial for PyGTK (which is the Python bindings for GTK+) on the official PyGTK site.

I will spend the next many days creating GUIs, hooking up signals, and trying to absorb all that I can.


On my quest to adapt my programming skills to the Python language, I ran across a great ebook / tutorial, Dive Into Python. Although it has some pretty complicated examples, where I would have preferred simple examples, it's still a great resource. It is written for experienced programmers in mind, so it's not supposed to be easy... just effective.


Soon I will have to answer the question, "How do I render a video timeline using Python & Glade / PyGTK?". Their are obviously no widgets that are made for this function, and I can't exactly assemble buttons, images, and labels on the screen and make it look very nice or believable.

The answer is Cairo. Not the city in Egypt, but rather the open-source, multi-platform 2D graphics library, with a special set of bindings for Python (PyCairo).

So, given my limited knowledge so far, the plan is to use Glade for almost all of the interface, dialogs, and use Cairo only to draw the video / audio timeline on the screen. Cairo should give me all the control I need to allow the user to drag clips around, snap them to the timeline, trim clips, etc... And best of all, it will give me 100% control over how I want that experience to look & feel.

Not that it matters, but the failed Diva project & the Jokosher project also used Cairo to render their timelines. That adds to my confidence that this is the right direction to go.

Here is a simple Python example of some Cairo code, which draws a happy face on the screen:

#! /usr/bin/env python
import pygtk
import gtk, gobject, cairo
from math import pi

# Create a GTK+ widget on which we will draw using Cairo
class Screen(gtk.DrawingArea):

# Draw in response to an expose-event
__gsignals__ = { "expose-event": "override" }

# Handle the expose-event by drawing
def do_expose_event(self, event):

# Create the cairo context
cr = self.window.cairo_create()

# Restrict Cairo to the exposed area; avoid extra work
cr.rectangle(event.area.x, event.area.y,
event.area.width, event.area.height)
self.draw(cr, *self.window.get_size() )

def draw(self, cr, width, height):
# Fill the background with gray
cr.set_source_rgb(0.5, 0.5, 0.5)
cr.rectangle(0, 0, width, height)

# draw a rectangle
cr.set_source_rgb(1.0, 1.0, 1.0)
cr.rectangle(10, 10, width - 20, height - 20)

# draw lines
cr.set_source_rgb(0.0, 0.0, 0.8)
cr.move_to(width / 3.0, height / 3.0)
cr.rel_line_to(0, height / 6.0)
cr.move_to(2 * width / 3.0, height / 3.0)
cr.rel_line_to(0, height / 6.0)

# and a circle
cr.set_source_rgb(1.0, 0.0, 0.0)
radius = min(width, height)
cr.arc(width / 2.0, height / 2.0, radius / 2.0 - 20, 0, 2 * pi)
cr.arc(width / 2.0, height / 2.0, radius / 3.0 - 10, pi / 3, 2 * pi / 3)

# GTK mumbo-jumbo to show the widget in a window and quit when it's closed
def run(Widget):
window = gtk.Window()
window.connect("delete-event", gtk.main_quit)
widget = Widget()

if __name__ == "__main__":


In my opinion, the #1 most important aspect of any software is it's interface (aside from doing something useful, of course). How easy is it to use? How intuitive and streamlined is it? Case in point, their are many video editors for Linux... so why is nobody happy? Many of them can combine video clips and add an audio track or two. The answer? They are all hard to use, offer a terrible user experience, require crazy command line hacks, or the user interface severely limits the feature set (i.e. not a non-linear editor).

Now that I am becoming more familiar with Python as a language, I realized that their are many choices (i.e. frameworks) that are available for user interfaces. I don't understand why their are so many, but that doesn't change the fact that their are many. So the question of the day is... which interface choice is the best one for a video editor?

Believe it or not, their are 40+ different graphical user interfaces (i.e. GUI) available for Python. Here is a complete list of them.

After many hours of research, downloads, and examples, I have selected my favorite (and hopefully the best GUI designer / framework): Glade (using PyGTK). The Glade designer seems to be the most feature packed, and is considered by many as the #1 GUI designer for GTK. Best of all, I don't have to create the interface with endless lines of code, rather I can load the interface with a single line of code: MyInterface = gtk.glade.XML("MyGUI.glade"). The interface is stored as an XML file, and loaded at runtime. Very nice.

(Screenshot of the Glade Interface Designer)


I wanted to share this example I found on another blog. It's my first working Python code using the gstreamer framework. It loads an mpeg movie file and an MP3 into the pipeline at the same time, in effect compositing them together.

I will spending the next many days fine tuning my Python skills and learning more about the Eclipse IDE, but this example gives me some confidence that this project might actually work!

Video & Audio Example Python Code:

import pygst
import gst
import gtk

class Main:
def __init__(self):

self.window = gtk.Window()
self.window.connect("unmap", self.OnQuit)
self.vbox = gtk.VBox()
self.da = gtk.DrawingArea()
self.bb = gtk.HButtonBox()
self.da.set_size_request(300, 150)
self.playButton = gtk.Button(stock='gtk-media-play')
self.playButton.connect("clicked", self.OnPlay)
self.stopButton = gtk.Button(stock='gtk-media-stop')
self.stopButton.connect("clicked", self.OnStop)
self.quitButton = gtk.Button(stock='gtk-quit')
self.quitButton.connect("clicked", self.OnQuit)
self.vbox.pack_start(self.da, expand=True)
self.vbox.pack_start(self.bb, expand=False)

# Create GStreamer bits and bobs
self.pipeline = gst.Pipeline("mypipeline")
pbin = gst.element_factory_make("playbin", "pbin")
pbin.set_property("uri", "file:///home/jonathan/Cow.mpg")
vsink = gst.element_factory_make("xvimagesink", "vsink")
self.vsink = vsink
pbin.set_property("video-sink", vsink)
vsink.set_property("force-aspect-ratio", True)

pbin1 = gst.element_factory_make("playbin", "pbin1")
pbin1.set_property("uri", "file:///home/jonathan/Fly.mp3")
pbin1.set_property("video-sink", vsink)


def OnPlay(self, widget):

def OnStop(self, widget):

def OnQuit(self, widget):



Now that I've chosen a language for this project (Python), I have been investigating many different Python IDEs (Integrated Development Environment). I had no idea how many different IDEs existed, and I have downloaded every one (that I could find) and tried them out. The only 2 which I liked were Eric Python IDE and the Eclipse Europa IDE combined with the PyDEV plug-in. Having used Microsoft Visual Studio for many, many years, I found that the Eclipse Europa IDE was the most refined and professional, even more so that Visual Studio (in many ways).

Some of the highlights:

  • Great layout / tabbed interface
  • Good project / file organization
  • Responsive intellisense / code completion
  • Code templates
  • Code collapsing (which is great)
  • A nice integrated debugger
(Screenshot of Eclipse IDE - just imagine Python code instead of Java and you'll get an idea of what it's like from this picture)


After much turmoil, I have decided that multimedia support for my language has to be the #1 priority for this project to succeed. It doesn't matter how efficient the language is if it can't encode / decode certain video files.

I have also decided which multimedia framework I am going to use. Again, after much thought and many questions to the Linux community, I have decided that the GStreamer framework is the future. It's where all the development is happening, and it's by far the most supported and used media framework out there today (in Linux).

The MLT framework was very tempting, but it is developing at a much slower rate. It supports less audio / video formats. It supports less languages. Thus, I am not choosing it.

Now, back to the winning language: I am choosing Python! Not because of it's speed (obviously it's quite a slow language compared to C++), but because of the following factors:

  • Great support of the Gstreamer framework (#1 on my list)
  • Easy to read / brief syntax & less lines of code
  • Very popular in the Linux community
  • All the heavy lifting is done with Gstreamer (written in C) and not Python, so I doubt the speed will be an issue.
  • Even though it's not thought of as a "GUI" language, it is quite capable. Take a look at an open-source audio mixing program written in Python: Jokosher.
(Screenshot of Jokosher, programmed in Python)


One of my co-workers was inquiring about why am I taking on such a large project? Was I going to get paid? If successful, would I become rich? So, I thought I would share my underlying motivations with everyone.

My motivations are simple: I want to create a powerful, stable, and artistically kick-ass video editor for all the Linux users in the world. That's it. No hidden agendas. And honestly, if their was already a kick-ass video editor for Linux (for free), I would never have started this project.

I really like the work that was done on Diva (a previously failed video editor), and I will definitely use that for inspiration. Here is a screenshot from the Diva project, may it rest in peace. =)

[screenshot of diva video editor: source]


My research has opened my eyes to many important points to making this project a success. The most important point is that I need to use an underlying multimedia framework. This framework is responsible for encoding / decoding video & audio, importing & export various formats of media files, and generally doing all the hard work. I am not ambitious enough to write my own media framework, so that has lead me to find the following possible frameworks:

  1. MLT - is an open source multimedia framework, designed and developed for television broadcasting. It provides a toolkit for broadcasters, video editors, media players, transcoders, web streamers and many more types of applications. The functionality of the system is provided via an assortment of ready to use tools, xml authoring components, and an extendible plug-in based API.

  2. GStreamer - is a library that allows the construction of graphs of media-handling components, ranging from simple Ogg/Vorbis playback to complex audio (mixing) and video (non-linear editing) processing.
So, I have a lot to consider, and it's critical I choose the correct framework since the rest of this project will require me to implement the various interfaces. I would hate to get this decision wrong.

Also, these 2 frameworks each support various languages, and they might not provide equal support to all the languages I am considering using. So this could also impact which language I decide to choose.


I have been programming for over 10 years professionally, and even longer if you count the years of hacking before that. However, all my experience comes from the Microsoft / Windows world of languages. I regularly use C# for most of my *real* programming jobs. However, I have dabbled in C, C++, Pascal, Basic, Perl, and a few others.

If I am going to pull this project off successfully, I need to choose the best language for building a video editor in Linux, and slightly less important, a language that is widely used and liked in the Linux world. So, the idea of continuing to use C# for this project looks unlikely. =(

I have researched the Mono framework (which is a Linux version of the .NET framework), however I just found out that it has poor support for multi-media built-in. So, I don't feel to good about that. Also, it might be hard to convince Linux users to program in a Microsoft language... since I would like to open-source this project at some point.

I have widdled my options down to the 3 most widely used languages in Linux:

  • C++ - scary language... hard to debug... executes very fast
  • Python - great support of multi-media... easy to use... performance sucks
  • Java - fairly verbose language... medium difficulty... good performance... incomplete multi-media support


I started my journey over 6 months ago to the world of Linux (Ubuntu 7 to be specific). It was a beautiful place. It had photos, music, eye candy, and lots of free, open-source software to enjoy. Life was good. It wasn't too long before I noticed that something was very, very wrong. You see, I wanted to make a video (which I love to do), and I could not seem to find any *good* open-source software to use.

So, I considered helping the Cinelerra project, but they were in the middle of a transition from an old code base to a new code base (very early in that process), and were not interested in any help from me.

Long story short, I have decided to put my programming and design skills to the ultimate test. I am going to take the best ideas from all the non-linear video editors and compositors that I know and love (from Windows) and make an open-source, kick-ass video editor. And I figured, just in case I don't make it out alive, I will document my journey to help the next developer who might walk in my footsteps.

There are many questions I still need answers to, such as what multimedia framework do I use? What language do I use? What libraries will I need? And so on...

Stay tuned...

Subscribe to: Posts (Atom)