Page 1 of 1
problem with NZBindex RSS feed
Posted: June 8th, 2014, 4:32 am
by sander
I'm trying to add NZBindex RSS feed
http://www.nzbindex.nl/rss/?q=hockey&so ... esc&max=25 to SAB. SAB says:
I can't find any "hr" in the resulting XML, so I'm stuck
According to
http://wiki.sabnzbd.org/nzb-sources, nzbindex.nl is supported, so I would say this is a bug somewhere between nzbindex and SABnzbd
EDIT:
Ah, I think I know the cause:
That RSS URL sometimes says:
503 Service Temporarily Unavailable
nginx
... of which the source is:
<html>
<head><title>503 Service Temporarily Unavailable</title></head>
<body bgcolor="white">
<center><h1>503 Service Temporarily Unavailable</h1></center>
<hr><center>nginx</center>
</body>
</html>
<!-- a padding to disable MSIE and Chrome friendly error page -->
<!-- a padding to disable MSIE and Chrome friendly error page -->
<!-- a padding to disable MSIE and Chrome friendly error page -->
<!-- a padding to disable MSIE and Chrome friendly error page -->
<!-- a padding to disable MSIE and Chrome friendly error page -->
<!-- a padding to disable MSIE and Chrome friendly error page -->
... and there's the <HR>!
A manual sessions tells this:
Code: Select all
$ wget 'http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25' -O blabla
--2014-06-08 11:51:40-- http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25
Resolving www.nzbindex.nl (www.nzbindex.nl)... 178.20.172.26
Connecting to www.nzbindex.nl (www.nzbindex.nl)|178.20.172.26|:80... connected.
HTTP request sent, awaiting response... 503 Service Temporarily Unavailable
2014-06-08 11:51:40 ERROR 503: Service Temporarily Unavailable.
Assuming the 503 is visible as return, I would expect SAB to check for that, then NOT parse the HTML, and retry in some minutes.
Re: problem with NZBindex RSS feed
Posted: June 8th, 2014, 5:13 am
by sander
I wrote some code to check the response code returned and it indeed is a 503. So is SAB ignoring that info and trying to parse anyway?
Code: Select all
import urllib2
import sys
url = 'http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25'
try:
req = urllib2.Request(url)
response = urllib2.urlopen(req)
print "HTTP code is", response.getcode()
htmlstuff = response.read()
print htmlstuff
except urllib2.HTTPError, e:
print "Error:", str(e)
sys.exit(0)
except:
print "something else went wrong"
with result:
Code: Select all
$ python rss-tester.py
Error: HTTP Error 503: Service Temporarily Unavailable
Re: problem with NZBindex RSS feed
Posted: June 8th, 2014, 5:00 pm
by shypike
Possibly. The RSS parser is a (quite complex) third-party module.
So maybe it doesn't handle 503 ok or SABnzbd does something wrong.
I'll check later this week.
Re: problem with NZBindex RSS feed
Posted: June 9th, 2014, 4:15 am
by sander
FWIW: this is the current logging:
Code: Select all
$ cat sabnzbd.log | grep -i nzbindex
2014-06-09 10:56:38,674::DEBUG::[rss:332] Running feedparser on http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25
2014-06-09 10:56:38,712::DEBUG::[rss:334] Done parsing http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25
2014-06-09 10:56:38,712::INFO::[rss:349] Failed to retrieve RSS from http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25: <unknown>:6:-1: Opening and ending tag mismatch: hr line 6 and body
2014-06-09 10:59:31,209::DEBUG::[rss:332] Running feedparser on http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25
2014-06-09 10:59:32,245::DEBUG::[rss:334] Done parsing http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25
2014-06-09 10:59:32,246::INFO::[rss:349] Failed to retrieve RSS from http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25: <unknown>:6:-1: Opening and ending tag mismatch: hr line 6 and body
2014-06-09 11:13:44,355::INFO::[rss:509] Starting scheduled RSS read-out for "hockey nzbindex"
2014-06-09 11:13:44,356::DEBUG::[rss:332] Running feedparser on http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25
2014-06-09 11:13:44,803::DEBUG::[rss:334] Done parsing http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25
The third one looks successful.
Re: problem with NZBindex RSS feed
Posted: June 9th, 2014, 4:39 am
by sander
OK, changed the code of def run_feed in rss.py a bit. Result is now:
2014-06-09 11:36:13,460::INFO::[rss:340] Got HTTP Error 503 - Service unavailable for hockey nzbindex on
http://www.nzbindex.nl/rss/?q=hockey&so ... esc&max=25
Code changed (the middle part):
Code: Select all
d = feedparser.parse(uri.replace('feed://', 'http://'))
status = d.get('status', 999)
if status in (503, 888):
msg = Ta('Got HTTP Error 503 - Service unavailable for %s on %s') % (feed, uri)
logging.info(msg)
return unicoder(msg)
if not d:
msg = Ta('Failed to retrieve RSS from %s: %s') % (uri, '?')
logging.info(msg)
return unicoder(msg)
EDIT: the logging was:
Code: Select all
2014-06-08 21:35:13,038::INFO::[rss:349] Failed to retrieve RSS from http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25: <unknown>:6:-1: Opening and ending tag mismatch: hr line 6 and body
from the corresponding code:
Code: Select all
msg = Ta('Failed to retrieve RSS from %s: %s') % (uri, xml_name(str(d['bozo_exception'])))
With my code, that point is not reached anymore. The attempt to parse is still done, but not presented. THe logging is now a more clear 503 message, instead of the confusing " hr line 6".
EDIT 3:
Code: Select all
value of d is:
{'feed': {'summary': u'<center><h1>503 Service Temporarily Unavailable</h1></center>\n<hr /><center>nginx</center>'}, 'status': 503, 'version': u'', 'encoding': u'us-ascii', 'bozo': 1, 'headers': {'date': 'Mon, 09 Jun 2014 10:48:53 GMT', 'content-length': '206', 'content-type': 'text/html', 'connection': 'close', 'server': 'nginx'}, 'href': u'http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25', 'namespaces': {}, 'entries': [], 'bozo_exception': SAXParseException('Opening and ending tag mismatch: hr line 6 and body\n',)}
2014-06-09 12:47:12,867::INFO::[rss:341] Server side error (code 503); could not get hockey nzbindex on http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25
Re: problem with NZBindex RSS feed
Posted: June 9th, 2014, 5:44 am
by sander
I changed to code to catch all server side (=5xx, see
http://en.wikipedia.org/wiki/List_of_HT ... rver_Error ) problems:
Code: Select all
status = d.get('status', 999)
if status >= 500 and status <=599:
msg = Ta('Server side error (code %s); could not get %s on %s') % (status, feed, uri)
logging.info(msg)
return unicoder(msg)
Logging result:
Code: Select all
2014-06-09 12:39:02,826::INFO::[rss:515] Starting scheduled RSS read-out for "hockey nzbindex"
2014-06-09 12:39:02,826::DEBUG::[rss:332] Running feedparser on http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25
2014-06-09 12:39:02,850::INFO::[rss:338] Server side error (code 503); could not get hockey nzbindex on http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25
Re: problem with NZBindex RSS feed
Posted: June 9th, 2014, 6:11 am
by sander
Re: problem with NZBindex RSS feed
Posted: June 9th, 2014, 6:17 am
by jcfp
May want to expand the range you catch to include 4xx codes as well. After all, these too signal problems where sab shouldn't expect a usable rss feed in the reply.
Re: problem with NZBindex RSS feed
Posted: June 9th, 2014, 6:39 am
by sander
jcfp wrote:May want to expand the range you catch to include 4xx codes as well. After all, these too signal problems where sab shouldn't expect a usable rss feed in the reply.
... ok, but then in a separate if/logging, as "The 4xx class of status code is intended for cases in which the client seems to have errored.". Source:
http://en.wikipedia.org/wiki/List_of_HT ... ient_Error
Current code
Code: Select all
if status in (401, 402, 403):
msg = Ta('Do not have valid authentication for feed %s') % feed
logging.info(msg)
return unicoder(msg)
could change to something like:
Code: Select all
if status >= 400 and status <=499:
msg = Ta('Client side error (server code %s); could not get %s on %s') % (status, feed, uri)
logging.info(msg)
if status in (401, 402, 403):
msg = Ta('Do not have valid authentication for feed %s') % feed
logging.info(msg)
return unicoder(msg)
Someone has an RSS-feed-URL without valid authentication for me?
Re: problem with NZBindex RSS feed
Posted: June 9th, 2014, 8:04 am
by sander
I played with url = '
http://api.nzb.su/api?apikey=777777777& ... 030%2C5040', which in my python code gives a loud and clear "urllib2.HTTPError: HTTP Error 403: Forbidden", as expected.
However, the same URL as RSS in SAB only results in
Code: Select all
2014-06-09 15:00:59,814::INFO::[rss:520] Starting scheduled RSS read-out for "nzb.su without correct api"
2014-06-09 15:00:59,814::DEBUG::[rss:332] Running feedparser on http://api.nzb.su/api?apikey=777777777&t=tvsearch&cat=5030%2C5040
2014-06-09 15:01:00,088::DEBUG::[rss:334] Done parsing http://api.nzb.su/api?apikey=777777777&t=tvsearch&cat=5030%2C5040
2014-06-09 15:01:00,089::DEBUG::[rss:342] Return status from RSS feed is 200
2014-06-09 15:01:00,089::INFO::[rss:364] RSS Feed http://api.nzb.su/api?apikey=777777777&t=tvsearch&cat=5030%2C5040 was empty
... a plain 200?
Needless to say it is not caught by "status >= 400 and status <=499".
Value of d is:
Code: Select all
{'feed': {'error': {'code': u'100', 'description': u'Incorrect user credentials'}}, 'status': 200, 'version': u'', 'encoding': u'us-ascii', 'bozo': 0, 'headers': {'x-powered-by': 'PHP/5.5.8-3', 'transfer-encoding': 'chunked', 'set-cookie': 'PHPSESSID=d563fsm4mmphdbr64ar0qmftg3; path=/', 'expires': 'Thu, 19 Nov 1981 08:52:00 GMT', 'vary': 'Accept-Encoding', 'content-encoding': 'gzip', 'connection': 'close', 'pragma': 'no-cache', 'cache-control': 'no-store, no-cache, must-revalidate, post-check=0, pre-check=0', 'date': 'Mon, 09 Jun 2014 13:06:19 GMT', 'server': 'nginx', 'content-type': 'text/xml'}, 'href': u'http://api.nzb.su/api?apikey=777777777&t=tvsearch&cat=5030%2C5040', 'namespaces': {}, 'entries': []}
So does rss.py something else than my plain urllib2 code?
Re: problem with NZBindex RSS feed
Posted: June 9th, 2014, 8:42 am
by jcfp
No, it's the server side doing that. The response differs if you set the user agent to something sab would send.
Code: Select all
$ wget -q -O- -U 'SABnzbd/0.7.666' -S 'http://api.nzb.su/api?apikey=666666666666&t=tvsearch&cat=5030%2C5040'
HTTP/1.1 200 OK
Server: nginx
Date: Mon, 09 Jun 2014 13:42:04 GMT
Content-Type: text/xml
Content-Length: 100
Connection: keep-alive
Vary: Accept-Encoding
X-Powered-By: PHP/5.5.8-3
Set-Cookie: PHPSESSID=1v4n73efvqak28tg3ijlg6avi0; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
<?xml version="1.0" encoding="UTF-8"?>
<error code="100" description="Incorrect user credentials"/>
Re: problem with NZBindex RSS feed
Posted: June 9th, 2014, 8:55 am
by sander
Oehhh ... good find!
Code: Select all
sander@flappie:~$ wget -q -O- -S 'http://api.nzb.su/api?apikey=666666666666&t=tvsearch&cat=5030%2C5040'
HTTP/1.1 403 Forbidden
Server: nginx
Date: Mon, 09 Jun 2014 13:52:28 GMT
Content-Type: text/html
Content-Length: 162
Connection: keep-alive
Vary: Accept-Encoding
sander@flappie:~$
any Useragent seems to give the 200:
Code: Select all
sander@flappie:~$ wget -q -O- -U 'anything' -S 'http://api.nzb.su/api?apikey=666666666666&t=tvsearch&cat=5030%2C5040'
HTTP/1.1 200 OK
Server: nginx
Date: Mon, 09 Jun 2014 13:55:08 GMT
Content-Type: text/xml
Content-Length: 100
Connection: keep-alive
Vary: Accept-Encoding
X-Powered-By: PHP/5.5.8-3
Set-Cookie: PHPSESSID=c040qb35hh978df0voh6o7qhd3; path=/
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
<?xml version="1.0" encoding="UTF-8"?>
<error code="100" description="Incorrect user credentials"/>
sander@flappie:~$