« Bust 3.0 | Main | Video malware »

November 15, 2006

Next-gen video

Next-gen video: Adobe Flash Player is more than a traditional video player. It treats video as a first-class media type in an interactive, programmable environment. Grant Skinner shows where this is going... he greenscreens an actor against varying backgrounds, but then applies vector graphics which interact with the actor... falling leaves settle on his arms, buildings light up in concert with his motion, the actor wipes away steam which fogs the screen, a virtual ball is tossed between two actors. This is heavy stuff. But it's where our abilities are taking us next....

Posted by JohnDowdell at November 15, 2006 03:16 PM

Trackback Pings

TrackBack URL for this entry:
http://weblogs.macromedia.com/mtadmin/mt-tb.cgi/8054

Comments

Having a native BitmapData class makes all the difference in the world ;)

[jd sez: And I'm happy you're happy too.... :) ]

Posted by: aq at November 15, 2006 06:29 PM

Processing.org was at this juncture approximately two years ago. Moreover, Sony EyeToy has been released since 2003.

derivative.

Posted by: Xiaolei Shi at November 15, 2006 09:46 PM

... and I'm happy that you're happy too, wait, hold it.... ;-)

Got links? One thing I was thinking about after posting this is that it's easy to misunderstand what Grant's doing here... this doesn't require a special compositing machine, but is done realtime, on any ol' computer. Local, realtime. He's also using webcam input as his interactive input, which is pretty wild too.

Got objections? got links?

Posted by: John Dowdell at November 15, 2006 10:07 PM

No objection here... yeah, I think the fact this works on "off the shelf" equipment is the real story... and, that there's a conventional advertising usage. But, since it's my nature, John... you're not accurate when you say "any ol' computer". At least not what Grant says in that piece. Having said that... Grant has some experiments online which really do work with modest computers... I guess my point is, that example project uses a custom hardware configuration.

As far as 2003... I first saw edge detection stuff like this back in the early 90s. But, definitely, not off the shelf.

Posted by: Phillip Kerman at November 15, 2006 10:27 PM

P.S. (double negatives approaching...) Failing to include a link doesn't make a comment false. Having a link doesn't make it true either.

[jd sez: ... just means that the sentence needs to be completed.... ;-) ]

Posted by: Phillip Kerman at November 15, 2006 10:31 PM

Forgive the brevity, I didn't realize that processing.org has a terrible way of archiving its collective progress:

http://processing.v3ga.net/show.php?id=6&type=0

Though, I must admit the video is static, amounting to nothing more than a proof of concept. However given the fact that processing has a very simple method of capturing video:

http://processing.org/learning/examples/usingcapture.html

it would not be difficult to create something dynamic.

I'm also under working under the assumption that even abstracted Java(processing) is faster than AS3/JIT at the moment and thus more suitible for more isolated projects such as these.

In regards to EyeToy, I'd expect people to either google/wikipedia the term and pay particular attention to its limitations:

"Due to the camera's need to "see" the player as they play, the camera can be very finicky about how much light is in the room. Different games have a different tolerance for varying light conditions"

Echoing the point made by v3ga's exploration of image recognition algorithms.

I will give Grant props for being refreshing in that he is bringing image recognition into a more popular medium like flash. However, in terms of sheer theory and/or blind explorations into image recognition it is derivative in a sense that other works preceeded it. This point at least should be made clear.

[jd sez: No worries. I also didn't explicitly point out that this is different from what Ray Harryhausen did. This isn't a console, nor a static video of a onetime interactive video. We're going to see video where some of the image was synthesized on the spot in reaction to the actor.]

Posted by: Xiaolei Shi at November 16, 2006 05:31 AM

... hmm, sorry, I see the problems with that "any ol' computer" line... it sounds like this particular installation uses high-performance hardware, but if you've followed Grant's prior work in this area, he has used the visitor's webcam for realtime edge-detection, even without benefit of a solid background to the actor.

The algorithms and capabilities are available on consumer machines, even though the touring rig used special large display screens and matching processors.

Posted by: John Dowdell at November 16, 2006 08:45 AM

This is definitely not a technological first for computer vision. It is however the first times this has been done in Flash, which I think is very exciting, primarily because it makes it so easy to add rich interaction. I think it may also be the first time it has been done in so short a time, with off the shelf equipment in so public a setting. Other languages (C, Java, and to an extent Processing) are easier to build the underlying technology with (at least running at an appropriate speed), but are much harder to build the experience with. Once the core engine was built, adding the marketing video and interactions was very simple, relative to the work it would have required in another environment.

Regarding the hardware, total hardware cost per installation was <$1500 for the camera, computer and remotes. Pretty good for an application like this. Much of it could be run on any average home system from the past 2 years. As you'd expect, CPU speed was the bottleneck with video decoding, keying, interactions and graphics compositing (oh, how I wish Flash had GPU support!!). Consider that our total cost on the project came in under 1/5th that of companies using other technologies (yes, I retroactively wish I had bid it higher *grin*), and you'll get an idea of where we're going with this.

Cheers,
Grant.

Posted by: Grant Skinner at November 16, 2006 01:52 PM

I certainly hope the "experience" you speak of doesn't entail awkward spasming of the hands and feet so that the algorithm can produce a force vector or hitTest area.

- Kevin

Posted by: Xiaolei Shi at November 16, 2006 02:42 PM

Kevin,

I'm not sure if you are referring to a specific example, or just generally requiring motion to detect interaction, but you should check out the video in the post that JD linked to above, there's no "awkward spasming" that I'm aware of (unless you count Kyle's creative dance moves).

We worked to make the interactions really intuitive. Not necessarily highly realistic, but simple and straightforward.

Cheers,
Grant.

Posted by: Grant Skinner at November 16, 2006 07:08 PM