textav-event-2017
  • Introduction
  • Intro
    • Introduction
    • TODOS
  • Projects
    • hyperaudio
    • oTranscribe
    • WebAv
    • Opened Captions service
    • Opened Captions annotated articles
      • presentation at SRCCON
    • FrameTrail
    • Captions and TV Archives
    • Extending audiogram with automated transcriptions
    • Palestinian Remix
    • BBC Dialogger
    • autoEdit
  • Remote Presentations
    • Aeneas
    • Mercury
    • Captioning Workflow
      • Needs For Captioning Tool
    • Transcription Service at the FT
    • BBC Video Context
  • Problem Domains
    • Problem domain and component based design
    • Interactive Transcription
    • 🔪✅⬇️ (Annotations models)
    • Object-based Broadcasting
    • Tv Archive AI pipeline
    • The Problem with archives
    • From Spoken Word To Sheet Music
  • Services
    • PopUp Archive & Audiosear.ch
    • YouTube for Publishers (Europe) at the Guardian
    • Microsoft STT & Cognitive Services
  • Unconference Projects
    • TransProvenance
      • Architecture
      • Futures of the project
    • Transcript correction
      • webaligner
    • AI Pipeline
      • I learned what Tesseract can do (and so can you!)
    • Captioning Workflow System
    • removeTextTrack API
Powered by GitBook
On this page
  1. Problem Domains

🔪✅⬇️ (Annotations models)

PreviousInteractive TranscriptionNextObject-based Broadcasting

Last updated 6 years ago

Notes

  • Killing Markdown

  • GML (written in1969), an IBM project to write text into files for computers to display in a better formatted style

  • GML gave birth to SGML, which gave birth to HTML and XML

  • “Our tools were based on presumptions that simply were not true anymore”

  • "Content is not hierarchical"

  • Can we come up with a better mental model for thinking about content?

  • We can write code that does this but it's not reproducible.

  • Example of manually edited document: The "content" is the printed material and the notes taken over in pen fit in … where?

  • JSON, content, annotations: “Contents” distinct from “annotations” in JSON format.

    • Here’s a simplified version:

{
    contents: “Main text document goes here etc etc etc”,
    annotations: [
        { position: 4, text: “Change this” }
        { position: 12, text: “I like this” }
    ]
}
  • Theoretically possible for transcript/audio data too.

    Not a standard format – ”I’m allergic to standards”

    Even the format Blaine showed on screen is “massively simplified”