python抓取小说

fancyboy2050

浏览: 238526 次
性别:
来自: 皇城根儿下

最近访客更多访客>>

lindow

飞天奔月

fan0128

brucelearnen

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

python

python html

刚学，使用python3，安装了BeautifulSoup，一个Python library，解析html
路径：http://www.crummy.com/software/BeautifulSoup/

from html.parser import HTMLParser
from bs4 import BeautifulSoup as bs
import urllib.request
import re

def parsechapter(url, out):
    data = urllib.request.urlopen(url)
    dataDecode = data.read().decode('utf-8')
    data.close()
    soup = bs(dataDecode)
    for content in soup.findAll(id="chapterContent"):
        for nc in content.findAll("p"):
            print(nc.span.previousSibling, file=out)
try
    a_file = open("test.txt", mode="a", encoding="utf-8")    
    showchapter_url = 'http://book.zongheng.com/showchapter/189169.html'
    chapterData = urllib.request.urlopen(showchapter_url)
    chapterDataDecode = chapterData.read().decode('utf-8')
    chapterData.close()

    chapterDataSoup = bs(chapterDataDecode)
    for chapters in chapterDataSoup.findAll("div", attrs={'class':"booklist"}):
        for chapter in chapters.findAll("a"):
            print(chapter.get_text(), file=a_file)
            parsechapter(chapter['href'], a_file)
except IOError:
    print('file error!')
finally:
    if 'a_file' in locals():
        a_file.close()

分享到：

spring mvc annotation-driven | hessian spring overloadEnabled

2012-11-07 14:29
浏览 1675
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

python抓取小说

评论

发表评论

相关推荐

最近访客 更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论

python抓取小说

评论

发表评论

相关推荐

python-Processing data

python-save data

Python内置数据类型

python正则

python字符串格式化转换类型

最近访客更多访客>>