Welcome, Guest
Username: Password: Remember me

TOPIC: S2T - news from 'The Dark Side'

S2T - news from 'The Dark Side' 29 Aug 2018 06:35 #96953

… a bit outside the box, but a recent topic…:

Microsoft announced, they will add a Speech-To-Text service 'later this year' in its One Drive(Business) cloud service…

Quote/ www.microsoft.com/en-us/microsoft-365/bl...-store-your-content/
…Video and audio transcription—Beginning later this year, automated transcription services will be natively available for video and audio files in OneDrive and SharePoint using the same AI technology available in Microsoft Stream. While viewing a video or listening to an audio file, a full transcript (improving both accessibility and search) will show directly in our industry-leading viewer, which supports over 320 different file types. This will help you utilize your personal video and audio assets, as well as collaborate with others to produce your best work. …

… that chitchat-to-typethat thingie gets more and more the next hot sh** :silly:

And Photo/iOS identifies&tags pic-content automatically in +4k categories …
The administrator has disabled public write access.

S2T - news from 'The Dark Side' 29 Aug 2018 11:01 #96954

  • FCPX.guru
  • FCPX.guru's Avatar
  • Platinum Boarder
  • bbalser.com
  • Posts: 2944
  • Thank you received: 391
  • Karma: 34
We have one, and a second one in beta, plugin for FCPX that does this, and neither are 100 percent accurate. The technology is close, though. Adobe had this in PPro at one time and removed it cause it didn't work nearly well enough. I hope MS is doing it right. Cause not everything they do works out well (Zoom anyone?).

But yes, in 10 more years, I don't think video editing will look like it does now, not for any current NLE.
Last Edit: 29 Aug 2018 11:02 by FCPX.guru.
The administrator has disabled public write access.

S2T - news from 'The Dark Side' 30 Aug 2018 07:24 #96963

FCPX.guru wrote:
… neither are 100 percent accurate. …

…for a 'first run of tagging' I dare to say, that isn't needed.
As P.Hodgett tells in his test…
…99.8% in a 8000 words document - so, 64 words a no hit. What's more dramatic is, that the software identifies a single speaker as 15 different ones… LOL.

Automatic S2T for a 20sec commercial is of little need, but for larger documentaries … you let the Mac do the bulky first run, and then you just edit the whole piece.

Will be fun to test the different systems. From a customer perspective , there should be standard sound-bytes (studio, street, bar, English/Spanish/Chinese) to give those S2T systems a 'normed' value… like CRI for lights… 

The administrator has disabled public write access.