16

As of 4 days ago, you were able to send a GET request to or visit https://video.google.com/timedtext?lang=en&v={youtubeVideoId} and receive an xml response containing the caption track of a given youtube video. Does anyone know if this support has been removed, because as of tonight, it no longer provides the xml response with the captions, the page is simply empty for every video. There were numerous videos this worked for 4 days ago that no longer work. Thanks in advance

4
  • see related (but old) issue issuetracker.google.com/issues/170235670
    – user555121
    Commented Nov 12, 2021 at 17:46
  • Let me add that this did not require access to the API whatsoever; no API key needed, the site at the URL was the xml file regardless of how you accessed it Commented Nov 16, 2021 at 5:11
  • 1
    See this issue on Google tracker: issuetracker.google.com/issues/207527674
    – user555121
    Commented Dec 13, 2021 at 14:05
  • why do sites like youglish still work? How did they get captions from youtube videos?
    – Nam Lee
    Commented Sep 25, 2022 at 4:29

5 Answers 5

13

The following Python scripts use Protobuf encoding scheme and more precisely blackboxprotobuf library.

Change the VIDEO_ID parameter with the one interesting you.

Automatic captions in the given language:

import requests
import blackboxprotobuf
import base64
import json

VIDEO_ID = 'qQV6guvFmM0'

def getBase64Protobuf(message, typedef):
    data = blackboxprotobuf.encode_message(message, typedef)
    return base64.b64encode(data).decode('ascii')

message = {
    '1': 'asr',
    '2': 'en',
}

typedef = {
    '1': {
        'type': 'string'
    },
    '2': {
        'type': 'string'
    },
}

two = getBase64Protobuf(message, typedef)

message = {
    '1': VIDEO_ID,
    '2': two,
}

typedef = {
    '1': {
        'type': 'string'
    },
    '2': {
        'type': 'string'
    },
}

params = getBase64Protobuf(message, typedef)

url = 'https://www.youtube.com/youtubei/v1/get_transcript'
headers = {
    'Content-Type': 'application/json'
}
data = {
    'context': {
        'client': {
            'clientName': 'WEB',
            'clientVersion': '2.20240313'
        }
    },
    'params': params
}

data = requests.post(url, headers = headers, json = data).json()
print(json.dumps(data, indent = 4))

Captions in desired language if available:

import requests
import blackboxprotobuf
import base64
import json

VIDEO_ID = 'lo0X2ZdElQ4'
LANGUAGE_INITIALS = 'ru'

def getBase64Protobuf(message, typedef):
    data = blackboxprotobuf.encode_message(message, typedef)
    return base64.b64encode(data).decode('ascii')

requestsParams = {
    'key': 'AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8'
}
url = 'https://www.youtube.com/youtubei/v1/get_transcript'
headers = {
    'Content-Type': 'application/json'
}

message = {
    '2': LANGUAGE_INITIALS,
}

typedef = {
    '2': {
        'type': 'string'
    },
}

two = getBase64Protobuf(message, typedef)

message = {
    '1': VIDEO_ID,
    '2': two
}

typedef = {
    '1': {
        'type': 'string'
    },
    '2': {
        'type': 'string'
    }
}

params = getBase64Protobuf(message, typedef)

data = {
    'context': {
        'client': {
            'clientName': 'WEB',
            'clientVersion': '2.20240313'
        }
    },
    'params': params
}

data = requests.post(url, params = requestsParams, headers = headers, json = data).json()
print(json.dumps(data, indent = 4))

Change the LANGUAGE_INITIALS parameter with the one interesting you.

Note: the key isn't a YouTube Data API v3 one, it is the first public (tested on some computers in different countries) one coming if you curl https://www.youtube.com/ | grep AIzaSy

Note: If interested in how I reverse-engineered this YouTube feature, say it in the comments and I would write a paragraph to explain

21
  • tried the curl command but got an error: 405. That’s an error. The request method GET is inappropriate for the URL /youtubei/v1/get_transcript. That’s all we know. Commented Dec 14, 2021 at 3:35
  • 1
    When I wrote it, it used to works, testing again now I have the same result as you but now testing again I don't have the bug anymore even with Tor: torsocks curl -s 'https://www.youtube.com/youtubei/v1/get_transcript?key=AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8' -X POST -H 'Content-Type: application/json' --data-raw "{\"context\":{\"client\":{\"clientName\":\"WEB\",\"clientVersion\":\"2.2021111\"}},\"params\":\"$(printf '\n\x0bWdy4YBULvdo' | base64)\"}" may you confirm ? Commented Dec 14, 2021 at 16:05
  • that seems to work. I'll need to figure out how to adapt it to my application. looks like all I have to do is replace it with my key, but not sure where I insert my videoID. is it "x0bWdy4YBULvdo"? Commented Dec 14, 2021 at 19:30
  • 1
    It's not a YouTube API key, it's the YouTube UI key. In fact I just used the Network tab from my web-browser and loaded captions on a video and reverse-engineered how the requests were working. Commented Jun 9, 2022 at 11:15
  • 2
    @John_Sheares On Linux Mint 21.1 (curl 7.81.0) I executed the command you shared (without the -v) and got a response with the wanted subtitles. I would recommend in fact to use youtube-dl or yt-dlp for captions retrieving as they propose this feature and are well established softwares. Commented Mar 8, 2023 at 21:13
4

I recommend that anyone who uses python to try the module youtube_transcript_api. I used to send GET request to https://video.google.com/timedtext?lang=en&v={videoId}, but now the page is blank. The following is the code example. In addition, this method does not need api key.

from youtube_transcript_api import YouTubeTranscriptApi
srt = YouTubeTranscriptApi.get_transcript("videoId",languages=['en'])
2

The YouTube API change around captions caused me a lot of hassle, which I circumvented through use of youtube-dl, which has won GitHub legal support and is now again available for download/clone.

The software is available as source or binary download for all major platforms, details on their GitHub page, linked above.

Sample use is this simple:

youtube-dl --write-sub --sub-lang en --skip-download --sub-format vtt https://www.youtube.com/watch?v=E-lZ8lCG7WY
1

This is a working Python implementation of the CURL answer provided by Benjamin Loison. Replace ZhT6BeHNmvo with your video ID.

import base64
import json
import requests

base64_string = base64.b64encode("\n\vZhT6BeHNmvo".encode("utf-8")).decode("utf-8")

headers = {
    "Content-Type": "application/json",
}

body = json.dumps(
    {
        "context": {"client": {"clientName": "WEB", "clientVersion": "2.9999099"}},
        "params": base64_string,
    }
)

response = requests.post(
    "https://www.youtube.com/youtubei/v1/get_transcript?key=AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8",
    headers=headers,
    data=body,
)

print(response.text)
1
  • I don' think it returns the expected result. It returns a json but there is no captions in it.
    – Yusuf
    Commented Apr 28 at 17:01
0

Old API currently returns 404 on every request. And YouTube right now uses new version of this API:

https://www.youtube.com/api/timedtext?v={youtubeVideoId}&asr_langs=de%2Cen%2Ces%2Cfr%2Cid%2Cit%2Cja%2Cko%2Cnl%2Cpt%2Cru%2Ctr%2Cvi&caps=asr&exp=xftt%2Cxctw&xoaf=5&hl=en&ip=0.0.0.0&ipbits=0&expire=1637102374&sparams=ip%2Cipbits%2Cexpire%2Cv%2Casr_langs%2Ccaps%2Cexp%2Cxoaf&signature=0BEBD68A2638D8A18A5BC78E1851D28300247F93.7D5E6D26397D8E8A93F65CCA97260D090C870462&key=yt8&kind=asr&lang=en&fmt=json3

The main problem with this API is to calculate the signature field of request. Unfortunately I couldn't find its algorithm. Maybe someone can reverse engineered it from YouTube player.

7
  • that link doesn't work. Commented Dec 14, 2021 at 3:13
  • What does the signature parameter mean? I see it also has an expire parameter, will this URL be used forever?
    – Nam Lee
    Commented Sep 25, 2022 at 4:34
  • it is not possible to get subtitles of many videos. How can you deploy with such a link :)
    – Nam Lee
    Commented Sep 27, 2022 at 17:21
  • @NamLêQuý signature is calculated based on all parameters including expire and some secret data which is unknown. signature parameter allows backend to validate requests. Commented Sep 28, 2022 at 11:43
  • @NamLêQuý This request is used by YouTube itself. Anyone can check it in browser console. Commented Sep 28, 2022 at 11:44

Not the answer you're looking for? Browse other questions tagged or ask your own question.