With Google, Yahoo Video, AltaVista, Singingfish, Dogpile, Blinx.tv, YouTube, Truveo, AOL Video, Live Search and other search engines are for a while now indexing videos too.
Traditional search engines on the World Wide Web index web pages by treating them as plain text documents, and indexing the content of the page in order to allow users to look for it. This however is not practicable for images, videos and audio content. Therefore searching for multimedia data (images, video and audio) introduces challenging problems in many areas. Not only the question what should be used to indexed multimedia data but also how this index has to be queried in order to retrieve the information back is still a grand challenge in this area.
Different search engines use slightly different "techniques" to index video files. Among those techniques are:
- Using text from the filename.
- Using alternate text.
- Using the text in the hyperlink or other relevant text from the web page, which links to the particular video
- The video header information, which usually include title, author and depending on the video format copyright information.
- Textual meta data.
- User tags for the video file.
Relying entirely on this handful of approaches has many drawbacks. Metadata, for instance, often doesn't have enough information to identify a video, and the weakness of user tags, is that they can be misused. Therefore in recent years a couple of more innovative approaches for indexing videos have arisen. The following the following two I find particularly interesting:
- The search engine Blinkx, for example, uses speech-recognition technology in addition to standard metadata and surrounding text searches. It converts audio speech into searchable text by extracting the audio information accompanying most video files and useing it to create a searchable text index of "words".
- Researchers at the university of Leeds (former Oxford) are working on another innovative solution which aims to make the content of a video searchable, instead of only the text description and meta data. In order to do this they have developed a system that uses a combination of face recognition, close-captioning information, and original television scripts to automatically name the faces in the videos. There is still a long way to go, but this innovation is seen as the first step in getting automated descriptions of the happenings in a video.