In an article about the next ten years for Final Cut Pro, we suggested automatic keywording of media. That's now a reality with a new app from Ulti.media called FCP Video tag.
Hello everyone, let me introduce myself: I'm Alex Raccuglia and I'm a director and editor of promotional and advertising videos, but I'm also a software developer.
In recent years I founded Ulti.Media, a small software house that produces tools that come to the aid of editors, graphic designers, podcasters, especially some apps that work in tandem with Final Cut Pro, including: BeatMark 2, FCP Cut Finder, FCP Diet 2, and FCP SRT Importer.
Today I'm here to introduce you to my latest app: FCP Video Tag: an automatic keyword generator for Final Cut Pro.
The philosophy and the way of using FCP Video Tag is very simple: before opening Final Cut Pro you drag on the main window of the application all the media that have to be analyzed, and all you have to do is press the "Analyze" button.
The application processes and produces the keywords for each asset.
At this point you only have to press the "Export FCP XML" button, and you have to drag this file to Final Cut Pro.
In Final Cut Pro you'll have to decide in which library to do the import, and immediately after you'll see all the media that have been analyzed, with their keywords.
When I was launching FCP Cut Finder, I was invited to participate in a live streaming video on a YouTube channel, and before doing so I prepared myself by listening to several episodes of the associated podcast.
In one episode, a guest, Marc Bach, had raised an interesting point: Apple's Photos application does image recognition and identification, and it would have been convenient for him to have the same feature in Final Cut Pro.
I, who in the meantime had been working on another application that makes use of artificial intelligence, PoweResize, had gained some familiarity with Apple's machine learning frameworks, and I said to myself, "I can do this!"
And as a result, I immediately got to work on this new app.
Since in the meantime I had also developed AudioTags, another application that allows you to extrapolate keywords from audio files, I decided to put together all this knowledge, to make sure that I had a program that made use of machine learning both from a visual (video and images) and sound (the audio of video tracks or audio files) point of view.
In a few weeks I made the first prototype and I must say that I was sufficiently satisfied.
I had already started posting some previews on YouTube, when I received an interesting feedback: the possibility to identify also the type of framing of the video.
Of course, I didn't find any kind of machine mearning model suitable for the purpose, so I took some time, just before Christmas 2020, and started to develop a new system to manage data collection for image classification.
With the collaboration of some friends and beta testers I started to develop a management system based on iPhone and iPad, so that each contributor receives automatically from the system a series of images (extrapolated from videos of any kind, from short films to tv commercials, from interviews to reportages) and asks each of them to make the classification simply using three categories: close-up, medium figure and long field.
All together we have classified a thousand different movies and we have thus generated this new model of machine learning that I then inserted in the application.
FCP Video Tag also allows you to generate keywords based on the texts that appear in the videos, so any text, even handwritten text, is "captured" and keywords are generated to match.
While the system is quite stable when it comes to classifying images and shots, things are much more complex when it comes to audio: because simply transcribing audio essentially involves two problems:
1. The actual transcription is not a summary of the audio, but a series of words, most of which are not relevant (think of words like "the", "I", "you," "many",...);
2. Often the quality of the transcription is not 100% perfect, so, especially if the speaker is not a professional voice actor, or if the audio is not particularly clean, many "misunderstandings" can occur.
That's why, with the experience I've had with AudioTags, generating keywords for audio requires a bit of work on the part of the user, at least at first.
For the first few audio files (or movie audio tracks), the user is asked to go in and delete irrelevant keywords by hand:
In this way, after a maximum of ten audio tracks, the system will be perfectly trained to recognize only the keywords that actually make sense.
I worked a lot on the user interface of the keywords manager so that the training operations of the application for the recognition of important keywords and for the elimination of irrelevant ones would be very fast.
And while I was at it, I structured the system in such a way as to allow the generation of additional keywords from the basic ones, so if the words "intelligence" and "artificial" are pronounced in a file, the system allows the automatic generation of "artificial intelligence", but also "machine learning" and so on.
In the preferences you can also choose to generate keywords automatically based on the date the content was created, and also with the names of the folders and subfolders being dragged.
You can drag and drop not only individual assets but also the folders that contain them, so keyword generation uses the same principle as in Final Cut Pro when dragging and dropping files from folders.
Ease of Use
FCP Video Tag's user interface is designed to be very easy to use: Internally, this is a complex application and I've worked hard to go over and simplify how to use it.
When you drag an asset, with these four icons you can specify what kind of analysis you want to do, respectively: image classification, frame identification, text and audio:
All these operations based on artificial intelligence are very time-consuming, that's why FCP Video Tag does not do a recognition with image classification and text identification for each frame, but scans a frame every now and then.
You can set how many scans to do in the preferences.
By default, no more than five frames are scanned per video (FCP Video Tag is designed to be used mainly with stock videos and images), but you can still choose to scan for the entire duration of the movie.
Since file analysis is a relatively lengthy process anyway, FCP Video Tag keeps all the analysis performed on its archive, so that you can easily do a keyword search among all the assets that have been analyzed over time.
This can be convenient by scanning all the media in a folder, an entire project, or even the entire disk: once scanned, all this metadata is saved and perfectly searchable.
And in addition, of course, the searches made can be exported directly to Final Cut Pro so that you can import the files that were found.
A few resources for learning how to use the app
I realize that despite all the efforts we've made to make the app as simple and easy to use as possible, you may still need to get some practice.
That's why we've made several tutorials available on this page that explain how to take your first steps in using FCP Video Tag, and how to become familiar with the tag manager.
The generation of all keywords is done directly on the computer on which the application is running. No data is sent to Ulti.Media servers.
As for the audio, the transcription of the text is done by Apple's servers, using the Siri engine. Again, no information is sent to Ulti.Media servers.
But there is an exception...
I have to be honest: I use FCP Video Tag very often to do the audio analysis of the episodes of the two podcasts I host, so as to generate, in a sensible and automated way, the hashtags for publication.
On average, these episodes are 30 to 60 minutes long, so the processing time is very long, especially because the transcription of the audio is not done directly on the device, but through Apple's servers with the Siri engine.
Since the waiting time is not negligible (a 30-minute file takes as long to be transcribed) I came up with the nice and crazy idea to carry out such processing by distributing it among several computers.
In my company alone I have four other Macs, and I told myself that if I can divide the workload among all these devices, I can also decrease, in proportion, the waiting time for an audio file to be processed.
And so, in the span of a few days, I developed a system that allows you to perform just this calculation in a distributed manner.
Going in the preferences of the audio you can set the activation of the distributed calculation (Enable Shared Transcription):
At this point each time an audio file is parsed, it is chopped up and sent to all the other computers that have made the app available to process, and this works with both FCP Video Tag and AudioTags, and vice versa.
You can also observe the splitted work and how the progression is going by also observing which other computers are contributing to the processing.
At this point, however, we need to make a sort of small disclaimer: the file segment (usually 60 seconds long at most) is actually sent to the cloud and downloaded locally by the other computers, then it is analyzed (again through Apple's servers), and the result is finally sent back to the originating computer.
In this specific case actually the data is sent to the computers that participate in the distributed processing, but the user of these computers still does not have access to the file that is transferred.
This is a system in which everyone is invited to contribute, but it is not absolutely necessary or mandatory to do so.
All Ulti.Media computers are always available to participate in processing, so anyone can take advantage of this distributed computing resource that is being shared.
What else to add? FCP Video Tag has officially been on sale since May 10, 2021, and until the end of the month it has an introductory price of €4.99.
Starting June 1, the app will be priced at €9.99.
I have no idea if this price is high or low, the market will probably decide that...
FCP Video Tag is anyway included in all our app bundles, our "bundles" that allow you to buy several apps and save some money, you can find all the information here.
For the rest I refer you to our website where you can find all the information about FCP Video Tag and all the other apps we develop.
If you have read this far, I can only thank you and (virtually) hug you from Italy!