Palestinian Remix
Last updated
Last updated
Project : http://PalestineRemix.com 20 documentaries in 4 languages (English, Arabic, Hebrew, Bosnian) - Based on Hyperaud.io technology.
When people say that they have transcripts they may not have them in the format that you need.
Getting transcripts in a word document… in tables…??? Built a tool with a vertical timeline, each paragraph with tabs for each language.
Helped by QCRI who use Kaldi and have a very good arabic language model.
To deal with hardcoded subtitles: Duplicate video and in javascript canvas, frame-by-frame matching looking for changes in the subtitles, and send it to tesseract server-side with English back (with bad spelling because hard to do).
“One night I managed to write something terrifying like this …”