textav-event-2017
  • Introduction
  • Intro
    • Introduction
    • TODOS
  • Projects
    • hyperaudio
    • oTranscribe
    • WebAv
    • Opened Captions service
    • Opened Captions annotated articles
      • presentation at SRCCON
    • FrameTrail
    • Captions and TV Archives
    • Extending audiogram with automated transcriptions
    • Palestinian Remix
    • BBC Dialogger
    • autoEdit
  • Remote Presentations
    • Aeneas
    • Mercury
    • Captioning Workflow
      • Needs For Captioning Tool
    • Transcription Service at the FT
    • BBC Video Context
  • Problem Domains
    • Problem domain and component based design
    • Interactive Transcription
    • 🔪✅⬇️ (Annotations models)
    • Object-based Broadcasting
    • Tv Archive AI pipeline
    • The Problem with archives
    • From Spoken Word To Sheet Music
  • Services
    • PopUp Archive & Audiosear.ch
    • YouTube for Publishers (Europe) at the Guardian
    • Microsoft STT & Cognitive Services
  • Unconference Projects
    • TransProvenance
      • Architecture
      • Futures of the project
    • Transcript correction
      • webaligner
    • AI Pipeline
      • I learned what Tesseract can do (and so can you!)
    • Captioning Workflow System
    • removeTextTrack API
Powered by GitBook
On this page
  • General Needs
  • Transcription (Step 1)
  • Editing (Step 2)
  • Segmenting (Step 3)
  • Aligning (Step 4)
  • NICE TO HAVE: Checking/Fine-Tuning (Step 5)
  1. Remote Presentations
  2. Captioning Workflow

Needs For Captioning Tool

  • July 17, 2017

  • Joseph Polizzotto (HTCTU), Marshall Sunnes (NYU)

General Needs

  • major goal: offer a faster and simpler process for the creation of quality caption files (e.g., SRT files)

  • open source tool

  • accessible design ( WCAG 2.0 AA guidelines)

  • Mac, PC, and Linux versions

  • ability to save work in progress

  • documentation and best practices (e.g., clarify that time-chunked items in "raw transcript" do not correspond to eventual caption blocks)

Transcription (Step 1)

  • Produce a "raw transcript" from an audio file (e.g., WAV, MP3, MP4, OGG, M4A, FLAC, RAW, WMA et al.) using Speech to Text (STT) systems.

  • The optimal STT integration would include:

    • an open-source tool (e.g., Kaldi)

    • ability to train models on speech data

    • off-line computation available

Editing (Step 2)

  • Edit the "raw transcript", which is time-coded to the audio, in a text editor window

  • NICE TO HAVE: download auto-captions from YouTube

  • The optimal text editor screen contains the following:

    • time coding of chunks at > word-level granularity (phrase or ~7 sec intervals)

    • NICE TO HAVE: speaker diarisation (group chunks into sections based on different speakers)

    • shortcuts for Play / Pause, Previous/ Next (chunk), Increase / Decrease speed

    • automatic adjustments to text (capitalization after full-stops, question marks, exclamation marks, ellipsis)

    • quick key for adding brackets around speaker identification and sound information (brackets, parentheses should be used to tell aligner not to consider this text for the purposes of alignment)

    • settings menu for user configuration:

      • playback stops when keyboard is activated (when editing begins)

      • customize keyboard shortcuts

      • adjust the size of segmented chunks; see step #3 (e.g., number of characters per line, number of lines per caption block)

      • add honorifics and abbreviations to ignore for segmentation step; see step #3

      • adjust (optional) Aeneas parameters (other essential parameters will already be generated, such as input file types); see step #4

        • select output format (e.g., SRT, VTT, TTML, JSON)

        • Audio-head/tail length

        • Ignore non-speech sound minimum duration

        • save out addition HTML file (for fine-tuning syncmap)

        • see Aeneas docs for additional parameters, flags that could be added

    • NICE TO HAVE: RegEx find and replace

Segmenting (Step 3)

    • e.g.,

      • lines should not exceed 35 characters per line

      • 1-2 lines per caption block

      • end-sentence punctuation always found at the end of a captioning block, not in the middle

  • scripting could be done with perl or Natural Language Processing (NLP) libraries (NLTK, spaCy) to complete this step:

    • e.g.,

      • reference a database of honorifics (Mr. Mrs. Fr. etc.) that will be ignored for segmenting purposes

      • place each sentence on its own line

      • break sentences into desirable chunks (see above)

      • add a space between each chunk (for purposes of alignment with Aeneas "subtitle" format; see step #4)

  • NICE TO HAVE:

    • review/edit screen for segmented chunks

    • preset segmenting settings based on previous projects

Aligning (Step 4)

  • Align the segmented transcript with the audio using forced alignment tool (e.g, Aeneas) and output to desired syncmap format (e.g., SRT, TTML, VTT, SBV etc.)

  • Optimal use of the aligner will include:

    • Python C, and CEW extensions are compiled on PC, Linux, and Mac

NICE TO HAVE: Checking/Fine-Tuning (Step 5)

  • Review results of syncmap file (e.g., SRT file) and correct for discrepancies in time-stamps and export correct syncmap

    • increase/ decrease speed

    • toggle time-stamp display format

    • save in multiple formats

  • Note: some audio files with variable bit rate manifest unpredictable behavior in web browsers (consider adding a note on this Fine-Tuning window)

PreviousCaptioning WorkflowNextTranscription Service at the FT

Last updated 6 years ago

Segment now-edited transcript into chunks that meet quality captioning standards ()

Optimal use of the fine-tuning tool (see for example; also comes in "Third Party" directory in Aeneas package) will include:

http://www.captioningkey.org/quality_captioning.html
https://github.com/ozdefir/finetuneas