代码之家 › 专栏 › 技术社区 › Jakob Bowyer

从YouTube下载FLV格式的视频。

urllib2 youtube download python

Jakob Bowyer · 技术社区 · 15 年前

我真的不明白YouTube是如何提供视频的,但我一直在尽我所能地阅读,这似乎是旧的方法得到视频现在已经过时,不能再使用了,因此,我问是否有另一个蟒蛇和简单的方法收集YouTube视频。

3 回复 | 直到 10 年前

Hank 15 年前

你可能在YouTube上有点运气

http://rg3.github.com/youtube-dl/documentation.html

我不确定是否有一个好的API,但它是用python编写的,所以理论上你可以做一些比popen更好的事情:)

samplebias 15 年前

这是一个快速的python脚本,可以下载youtube视频。没有铃声和口哨声,只需刮掉必要的URL,点击生成的URL,然后将数据流到一个文件:

import lxml.html
import re
import sys
import urllib
import urllib2

_RE_G204 = re.compile('"(http:.+.youtube.com.*\/generate_204[^"]+")', re.M)
_RE_URLS = re.compile('"fmt_url_map": "(\d*[^"]+)",.*', re.M)

def _fetch_url(url, ref=None, path=None):
    opener = urllib2.build_opener()
    headers = {}
    if ref:
        headers['Referer'] = ref
    request = urllib2.Request(url, headers=headers)
    handle = urllib2.urlopen(request)
    if not path:
        return handle.read()
    sys.stdout.write('saving: ')
    # write result to file
    with open(path, 'wb') as out:
        while True:
            part = handle.read(65536)
            if not part:
                break
            out.write(part)
            sys.stdout.write('.')
            sys.stdout.flush()
        sys.stdout.write('\nFinished.\n')

def _extract(html):
    tree = lxml.html.fromstring(html)
    res = {'204': _RE_G204.findall(html)[0].replace('\\', '')}
    for script in tree.findall('.//script'):
        text = script.text_content()
        if 'fmt_url_map' not in text:
            continue
        # found it, extract the urls we need
        for tmp in _RE_URLS.findall(text)[0].split(','):
            url_id, url = tmp.split('|')
            res[url_id] = url.replace('\\', '')
        break
    return res

def main():
    target = sys.argv[1]
    dest = sys.argv[2]
    html = _fetch_url(target)
    res = dict(_extract(html))
    # hit the 'generate_204' url first and remove it
    _fetch_url(res['204'], ref=target)
    del res['204']
    # download the video. now i grab the first 'download' url and use it.
    first = res.values()[0]
    _fetch_url(first, ref=target, path=dest)

if __name__ == '__main__':
    main()

运行它:

python youdown.py 'http://www.youtube.com/watch?v=Je_iqbgGXFw' stevegadd.flv
saving: ........................... finished.

gok 15 年前

我建议使用urllib2或beautifulsoup编写自己的解析器。您可以查看的源代码 DownThemAll 查看该插件如何查找视频URL

推荐文章

Google User · Django管理员在`list_display中未显示`creation_date`字段`

1 年前

user29747013 · 如何创建一个新的数据框架,其中包含原始数据框架中列的聚合列?

1 年前

ÎÎÎ½Î· ÎÎ®Î¹Î½Î¿Ï · Python lxml.html语法错误:使用lxml find时XPATH的谓词无效

1 年前

user29715306 · from_users=和chats=电视节目中的差异

1 年前

Redshoe · 当执行numpy.genfromtxt()时,python是否会读取文件的所有行?

1 年前

RASEL MAHMUD · 为什么以及如何在is_even()函数内的IF条件中递归X变量在满足0后递增?[副本]

1 年前

prayner · 更新嵌套字典包含列表中的项

1 年前

Bringo Jr · 我可以在O(n)中解决这个问题吗?

1 年前

Dave · 如何在for循环中修改列表值

1 年前

Shukurullox Komiljonov · 从记录中获得相互和解。使用SQL

1 年前