textav-event-2017
  • Introduction
  • Intro
    • Introduction
    • TODOS
  • Projects
    • hyperaudio
    • oTranscribe
    • WebAv
    • Opened Captions service
    • Opened Captions annotated articles
      • presentation at SRCCON
    • FrameTrail
    • Captions and TV Archives
    • Extending audiogram with automated transcriptions
    • Palestinian Remix
    • BBC Dialogger
    • autoEdit
  • Remote Presentations
    • Aeneas
    • Mercury
    • Captioning Workflow
      • Needs For Captioning Tool
    • Transcription Service at the FT
    • BBC Video Context
  • Problem Domains
    • Problem domain and component based design
    • Interactive Transcription
    • 🔪✅⬇️ (Annotations models)
    • Object-based Broadcasting
    • Tv Archive AI pipeline
    • The Problem with archives
    • From Spoken Word To Sheet Music
  • Services
    • PopUp Archive & Audiosear.ch
    • YouTube for Publishers (Europe) at the Guardian
    • Microsoft STT & Cognitive Services
  • Unconference Projects
    • TransProvenance
      • Architecture
      • Futures of the project
    • Transcript correction
      • webaligner
    • AI Pipeline
      • I learned what Tesseract can do (and so can you!)
    • Captioning Workflow System
    • removeTextTrack API
Powered by GitBook
On this page
  1. Problem Domains

Tv Archive AI pipeline

PreviousObject-based BroadcastingNextThe Problem with archives

Last updated 6 years ago

Notes

  • SLIDES:

  • Internet Archive!

  • Petabytes of data! The TV Archive! The TV News Archive!

    Recording TV, cable news, Philadelphia local news, CSPAN (people LOVE it).

  • There's an API written in Python

    Clip-based metadata storage. "This clip was fact-checked, this clip is a politlcal ad"

    Check it out:

    What is in the data? VIDEO / AUDIO / CAPTIONS / but also information about… PROGRAM / CONTEXT / AUDIENCE

    Not all of this data is easy to grab but each have signals where data can be pulled from.

  • [ see excellent slide with detailed examples for each ]

  • How can we use these signals to provide more information?

  • Speaker diarization / who is talking?

  • Experimentation at the Internet Archive:

    • They want to enable experimentation as a library, not necessarily do it (but they do it too)

  • Example: Political TV Ad Archive ()

  • Frequency analysis, turning sound into a hash and easily searchable. FIngerprinted each political ad and found all other copies with that ad.

  • Example: Face o matic

  • Recording in real time, 4 hours to convert into mp4 and other things, run through an algorithm that "doesn't take much time at all"

  • This is a prototype: What do YOU want to use it for? - What format would you like this in?

  • Example output in #face-o-matic on hyperaudio slack

  • Example: Chyron Extraction Extracting the "lower thirds" and how do they compare? That will be discussed tomorrow*

  • The Vision: What's happening on TV? What's all the metadata associated with what's happening? They don't have a great API for these things but want to use Sockets to push things out.

  • Be able to look historically, last month, etc. INTERNET ARCHIVE IS A LIBRARY! Dan wants to make sure you know this. ← it’s true.

  • Question:

    • Can you add more data to the archive? How can submitters use all this cool stuff?

    • "Good question."

    • "Reach out to me, basically." → DAN @slifty SCHULTZ

    • "We should talk" dan.schultz@archive.org everyone talk to him!

https://docs.google.com/presentation/d/17s8tvqYY8zb4cGVk0MPi2HeA85af-x0macLn-WijAdc/edit?usp=sharing
https://archive.org/
https://archive.org/details/tv
https://politicaladarchive.org
http://tv-research5.us.archive.org:8000/auth