IPB

Welcome Guest ( Log In | Register )

> wget与代理设置, http://blog.csdn.net/rubysolution/archive/2006/09/12/1214205.aspx
猫猫草
post 2009-07-04 14:13:36, Sat
Post #1


猫猫猫
***

Group: Power Cat
Posts: 626
Joined: 2006-12-8
Member No.: 2



代码
1、在bash shell中设定代理
.basrhc:
export http_proxy="166.111.53A.167:3128"
export ftp_proxy="166.111.53A.167:3128"
%DON‘T ask me why the proxy above is invaild, it's just an example^_*

2、对于wget可以单独建立.wgetrc
http-proxy = 166.111.53.167:3128
ftp-proxy = 166.111.53.167:3128

3、用wget下载整个站点
#wget -k -m -np -d --proxy-user=usrname --proxy-passwd=passwd http://www.hq.nasa.gov/office/pao/History/SP-468/contents.htm
-k, --convert-links 将绝对链接转换为相对链接。
-m 就等价于 递归下载+除非远程文件较新,否则不再取回+最大递归深度无限+不删除“.listing”文件。
-np, --no-parent 不搜索上层目录。
注意其中-d只是输出下载信息,换成-q就变成“安静”下载了。

另外还有两个选项可能会有用到。
-b:让wget在后台运行
-c:断点续传
Go to the top of the page
 
+Quote Post
 
Start new topic
Replies
猫猫草
post 2016-12-02 08:59:13, Fri
Post #2


猫猫猫
***

Group: Power Cat
Posts: 626
Joined: 2006-12-8
Member No.: 2



整站抓取:

wget -r -p -np -k -E

-r 递归抓取
-p 抓取页面所需资源
-k 抓取之后修正链接,适合本地浏览
-E 增加html扩展名
Go to the top of the page
 
+Quote Post

Posts in this topic
猫猫草   wget与代理设置   2009-07-04 14:13:36, Sat
猫猫草   RE: wget与代理设置   2010-11-23 22:48:43, Tue
猫猫草   忽略 robots.txt -e robots=off   2016-04-24 23:42:09, Sun
猫猫草   整站抓取: wget -r -p -np -k -E -r 递归...   2016-12-02 08:59:13, Fri


Reply to this topicStart new topic
1 User(s) are reading this topic (1 Guests and 0 Anonymous Users)
0 Members:

 



Lo-Fi Version Time is now: 2024-10-17 05:03