`

windows平台ruby1.8.6p287的iconv有bug??

阅读更多
这两天在写爬虫,需要iconv来转换字符编码,昨天在解决一个不存在的问题。。
嗯,我的意思是这个问题是别人的问题,不是我的问题。
Iconv.iconv("UTF-8//IGNORE","GB18030//IGNORE",str)居然还会出错导致程序中断。但是我一直没有注意到出错的时候没有输出任何ruby异常,而是只显示一个类似这样的字符串"\277"。搜索了一整天没有结果,后来仔细一想,感觉输出不太对劲,会不会是iconv底层c那部分的问题?于是换到ubuntu下用ree1.8.7来试,结果就没问题了……
========================================================

另外还有一个奇怪的问题……
begin
  @topic_page = @topic_page.link_with(:text => "#{@page}").click
rescue Net::HTTPInternalServerError
  retry
end

这里居然还会抛Net::HTTPInternalServerError,真是不合逻辑啊不合逻辑……
=======================================================
郁闷了,搞了半天,Net::HTTPInternalServerError居然不是一个Error,不是StandardError的子类。

自己启动了个服务器,在articles controller的index方法里写个raise '',再用以下代码访问:
require 'mechanize'

agent=WWW::Mechanize.new
begin
  page = agent.get 'http://localhost:3000/admin/articles'
rescue =>err
  puts "#{err.class}##{err}"
end


输出:WWW::Mechanize::ResponseCodeError#500 => Net::HTTPInternalServerError
原来如此啊....rake任务打印出的异常信息全都不带类名么?这样前面第1个错误的解决也是歪打正着了……

改个加强型的,哇哈哈哈哈:
def self.access(target)
  begin
    if target.respond_to? 'click'
      url = target.href
      target.click
    else
      url = target
      @@agent.get(target)
    end
  rescue WWW::Mechanize::ResponseCodeError => err
    #FIXME: puts到时候改为log
    puts "#{err}, caused by accessing: #{url}"
    sleep 10
    retry
  end
end
分享到:
评论
10 楼 Hooopo 2010-01-02  
额。。那个有问题..
9 楼 yuan 2010-01-02  
Hooopo 写道
Iconv.conv 'UTF-8//IGNORE', 'GBK//IGNORE', str  
一点问题没有

你试试6楼的代码
8 楼 Hooopo 2010-01-02  
什么都没问题...
7 楼 Hooopo 2010-01-02  
Iconv.conv 'UTF-8//IGNORE', 'GBK//IGNORE', str  
一点问题没有
6 楼 yuan 2010-01-02  
Fx这是你要的repro:
require 'hpricot'
require 'open-uri'
require 'iconv'

def gbk_to_utf8(str)
  Iconv.iconv 'UTF-8//IGNORE', 'GB18030//IGNORE', str
end
doc = Hpricot open('http://www.tianya.cn/new/publicforum/content.asp?stritem=free&idarticle=1693330&part=0&flag=1')
tables = doc.at('#pContentDiv').search('table')
tables.each do |table|
  puts "作者:#{gbk_to_utf8(table.at('a').inner_text)}"
end
5 楼 yuan 2010-01-01  
RednaxelaFX 写道
能贴个minimal repro来看看么?

看成report..repro是啥?
4 楼 yuan 2010-01-01  
238行
if head.at('a') && GBK_TO_UTF8.iconv(head.at('a').inner_text)==@topic.author
3 楼 yuan 2010-01-01  
RednaxelaFX 写道
能贴个minimal repro来看看么?

昨天发过个临时帖,后来删除了,内容就像这样的:
引用
E:\yuan\workspace\rails\tieku>rake spider:scrape SITE=tianya --trace
(in E:/yuan/workspace/rails/tieku)
** Invoke spider:scrape (first_time)
** Invoke environment (first_time)
** Execute environment
** Execute spider:scrape
正在进入网站:天涯社区
进入论坛:澶╂动鏉傝皥,uri:http://www.tianya.cn/publicforum/articleslist/0/free.shtml
creating topic...............
标题:姝诲垜鐘€斺€旀鍒戠姱琛屽垜鍓嶇殑涓嶇湢澶湥瑄ri:http://www.tianya.cn/publicforum/content/free/1/1676828.shtml
正在访问第2页
正在访问第3页
正在访问第4页
正在访问第5页
正在访问第6页
正在访问第7页
正在访问第8页
............
正在访问第45页
正在访问第46页
正在访问第47页
正在访问第48页
正在访问第49页
rake aborted!
"\300"
E:/yuan/workspace/rails/tieku/lib/tasks/scrape.rake:238:in `iconv'
E:/yuan/workspace/rails/tieku/lib/tasks/scrape.rake:238:in `read_rest_posts'
E:/yuan/workspace/rails/tieku/lib/tasks/scrape.rake:237:in `each'
E:/yuan/workspace/rails/tieku/lib/tasks/scrape.rake:237:in `read_rest_posts'
E:/yuan/workspace/rails/tieku/lib/tasks/scrape.rake:200:in `read_posts_of_current_page'
E:/yuan/workspace/rails/tieku/lib/tasks/scrape.rake:167:in `create_topic'
E:/yuan/workspace/rails/tieku/lib/tasks/scrape.rake:140:in `process'
E:/yuan/workspace/rails/tieku/lib/tasks/scrape.rake:123:in `iterate_hot_topics'
E:/yuan/workspace/rails/tieku/lib/tasks/scrape.rake:120:in `each'
E:/yuan/workspace/rails/tieku/lib/tasks/scrape.rake:120:in `iterate_hot_topics'
E:/yuan/workspace/rails/tieku/lib/tasks/scrape.rake:91:in `iterate_pages'
E:/yuan/workspace/rails/tieku/lib/tasks/scrape.rake:80:in `process'
E:/yuan/workspace/rails/tieku/lib/tasks/scrape.rake:59:in `iterate_forums'
E:/yuan/workspace/rails/tieku/lib/tasks/scrape.rake:56:in `each'
E:/yuan/workspace/rails/tieku/lib/tasks/scrape.rake:56:in `iterate_forums'
E:/yuan/workspace/rails/tieku/lib/tasks/scrape.rake:47:in `process'
E:/yuan/workspace/rails/tieku/lib/tasks/scrape.rake:19:in `dispatch'
E:/yuan/workspace/rails/tieku/lib/tasks/scrape.rake:288
d:/ruby/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:636:in `call'
d:/ruby/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:636:in `execute'
d:/ruby/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:631:in `each'
d:/ruby/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:631:in `execute'
d:/ruby/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:597:in `invoke_with_call_chain'
d:/ruby/lib/ruby/1.8/monitor.rb:242:in `synchronize'
d:/ruby/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:590:in `invoke_with_call_chain'
d:/ruby/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:583:in `invoke'
d:/ruby/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:2051:in `invoke_task'
d:/ruby/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:2029:in `top_level'
d:/ruby/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:2029:in `each'
d:/ruby/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:2029:in `top_level'
d:/ruby/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:2068:in `standard_exception_handling'
d:/ruby/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:2023:in `top_level'
d:/ruby/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:2001:in `run'
d:/ruby/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:2068:in `standard_exception_handling'
d:/ruby/lib/ruby/gems/1.8/gems/rake-0.8.7/lib/rake.rb:1998:in `run'
d:/ruby/lib/ruby/gems/1.8/gems/rake-0.8.7/bin/rake:31
d:/ruby/bin/rake:19:in `load'
d:/ruby/bin/rake:19
2 楼 Hooopo 2010-01-01  
是啊,帖个那啥…
我觉得是其他问题…
1 楼 RednaxelaFX 2010-01-01  
能贴个minimal repro来看看么?

相关推荐

Global site tag (gtag.js) - Google Analytics