SABnzbd Forums • problem with NZBindex RSS feed

Page 1 of 1

problem with NZBindex RSS feed

Posted: June 8th, 2014, 4:32 am

by sander

I'm trying to add NZBindex RSS feed http://www.nzbindex.nl/rss/?q=hockey&so ... esc&max=25 to SAB. SAB says:

RSS » hockey nzbindex

Failed to retrieve RSS from http://www.nzbindex.nl/rss/?q=hockey&so ... esc&max=25: <unknown>:6 Opening and ending tag mismatch: hr line 6 and body

I can't find any "hr" in the resulting XML, so I'm stuck

According to http://wiki.sabnzbd.org/nzb-sources, nzbindex.nl is supported, so I would say this is a bug somewhere between nzbindex and SABnzbd

EDIT:

Ah, I think I know the cause:

That RSS URL sometimes says:

503 Service Temporarily Unavailable

nginx

... of which the source is:

<html>
<head><title>503 Service Temporarily Unavailable</title></head>
<body bgcolor="white">
<center><h1>503 Service Temporarily Unavailable</h1></center>
<hr><center>nginx</center>
</body>
</html>

... and there's the <HR>!

A manual sessions tells this:

Code: Select all

$ wget 'http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25' -O blabla
--2014-06-08 11:51:40--  http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25
Resolving www.nzbindex.nl (www.nzbindex.nl)... 178.20.172.26
Connecting to www.nzbindex.nl (www.nzbindex.nl)|178.20.172.26|:80... connected.
HTTP request sent, awaiting response... 503 Service Temporarily Unavailable
2014-06-08 11:51:40 ERROR 503: Service Temporarily Unavailable.

Assuming the 503 is visible as return, I would expect SAB to check for that, then NOT parse the HTML, and retry in some minutes.

Re: problem with NZBindex RSS feed

Posted: June 8th, 2014, 5:13 am

by sander

I wrote some code to check the response code returned and it indeed is a 503. So is SAB ignoring that info and trying to parse anyway?

Code: Select all

import urllib2
import sys

url = 'http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25'

try:
	req = urllib2.Request(url)
	response = urllib2.urlopen(req)
	print "HTTP code is", response.getcode()
	htmlstuff = response.read()
	print htmlstuff

except urllib2.HTTPError, e:
	print "Error:", str(e)
	sys.exit(0)

except: 
	print "something else went wrong"

with result:

Code: Select all

$ python rss-tester.py 
Error: HTTP Error 503: Service Temporarily Unavailable

Re: problem with NZBindex RSS feed

Posted: June 8th, 2014, 5:00 pm

by shypike

Possibly. The RSS parser is a (quite complex) third-party module.
So maybe it doesn't handle 503 ok or SABnzbd does something wrong.
I'll check later this week.

Re: problem with NZBindex RSS feed

Posted: June 9th, 2014, 4:15 am

by sander

FWIW: this is the current logging:

Code: Select all

$ cat sabnzbd.log | grep -i nzbindex

2014-06-09 10:56:38,674::DEBUG::[rss:332] Running feedparser on http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25
2014-06-09 10:56:38,712::DEBUG::[rss:334] Done parsing http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25
2014-06-09 10:56:38,712::INFO::[rss:349] Failed to retrieve RSS from http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25: <unknown>:6:-1: Opening and ending tag mismatch: hr line 6 and body

2014-06-09 10:59:31,209::DEBUG::[rss:332] Running feedparser on http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25
2014-06-09 10:59:32,245::DEBUG::[rss:334] Done parsing http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25
2014-06-09 10:59:32,246::INFO::[rss:349] Failed to retrieve RSS from http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25: <unknown>:6:-1: Opening and ending tag mismatch: hr line 6 and body

2014-06-09 11:13:44,355::INFO::[rss:509] Starting scheduled RSS read-out for "hockey nzbindex"
2014-06-09 11:13:44,356::DEBUG::[rss:332] Running feedparser on http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25
2014-06-09 11:13:44,803::DEBUG::[rss:334] Done parsing http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25

The third one looks successful.

Re: problem with NZBindex RSS feed

Posted: June 9th, 2014, 4:39 am

by sander

OK, changed the code of def run_feed in rss.py a bit. Result is now:

2014-06-09 11:36:13,460::INFO::[rss:340] Got HTTP Error 503 - Service unavailable for hockey nzbindex on http://www.nzbindex.nl/rss/?q=hockey&so ... esc&max=25

Code changed (the middle part):

Code: Select all

            d = feedparser.parse(uri.replace('feed://', 'http://'))

            status = d.get('status', 999)
            if status in (503, 888):
                msg = Ta('Got HTTP Error 503 - Service unavailable for %s on %s') % (feed, uri)
                logging.info(msg)
                return unicoder(msg)

            if not d:
                msg = Ta('Failed to retrieve RSS from %s: %s') % (uri, '?')
                logging.info(msg)
                return unicoder(msg)

EDIT: the logging was:

Code: Select all

2014-06-08 21:35:13,038::INFO::[rss:349] Failed to retrieve RSS from http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25: <unknown>:6:-1: Opening and ending tag mismatch: hr line 6 and body

from the corresponding code:

Code: Select all

msg = Ta('Failed to retrieve RSS from %s: %s') % (uri, xml_name(str(d['bozo_exception'])))

With my code, that point is not reached anymore. The attempt to parse is still done, but not presented. THe logging is now a more clear 503 message, instead of the confusing " hr line 6".

EDIT 3:

Code: Select all

value of d is:
{'feed': {'summary': u'<center><h1>503 Service Temporarily Unavailable</h1></center>\n<hr /><center>nginx</center>'}, 'status': 503, 'version': u'', 'encoding': u'us-ascii', 'bozo': 1, 'headers': {'date': 'Mon, 09 Jun 2014 10:48:53 GMT', 'content-length': '206', 'content-type': 'text/html', 'connection': 'close', 'server': 'nginx'}, 'href': u'http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25', 'namespaces': {}, 'entries': [], 'bozo_exception': SAXParseException('Opening and ending tag mismatch: hr line 6 and body\n',)}
2014-06-09 12:47:12,867::INFO::[rss:341] Server side error (code 503); could not get hockey nzbindex on http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25

Re: problem with NZBindex RSS feed

Posted: June 9th, 2014, 5:44 am

by sander

I changed to code to catch all server side (=5xx, see http://en.wikipedia.org/wiki/List_of_HT ... rver_Error ) problems:

Code: Select all

            status = d.get('status', 999)
            if status >= 500 and status <=599:
                msg = Ta('Server side error (code %s); could not get %s on %s') % (status, feed, uri)
                logging.info(msg)
                return unicoder(msg)

Logging result:

Code: Select all

2014-06-09 12:39:02,826::INFO::[rss:515] Starting scheduled RSS read-out for "hockey nzbindex"
2014-06-09 12:39:02,826::DEBUG::[rss:332] Running feedparser on http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25
2014-06-09 12:39:02,850::INFO::[rss:338] Server side error (code 503); could not get hockey nzbindex on http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25

Re: problem with NZBindex RSS feed

Posted: June 9th, 2014, 6:11 am

by sander

Pull request sent: https://github.com/sabnzbd/sabnzbd/pull/164

Re: problem with NZBindex RSS feed

Posted: June 9th, 2014, 6:17 am

by jcfp

May want to expand the range you catch to include 4xx codes as well. After all, these too signal problems where sab shouldn't expect a usable rss feed in the reply.

Re: problem with NZBindex RSS feed

Posted: June 9th, 2014, 6:39 am

by sander

jcfp wrote:May want to expand the range you catch to include 4xx codes as well. After all, these too signal problems where sab shouldn't expect a usable rss feed in the reply.

... ok, but then in a separate if/logging, as "The 4xx class of status code is intended for cases in which the client seems to have errored.". Source: http://en.wikipedia.org/wiki/List_of_HT ... ient_Error

Current code

Code: Select all

            if status in (401, 402, 403):
                msg = Ta('Do not have valid authentication for feed %s') % feed
                logging.info(msg)
                return unicoder(msg)

could change to something like:

Code: Select all

            if status >= 400 and status <=499:
                msg = Ta('Client side error (server code %s); could not get %s on %s') % (status, feed, uri)
                logging.info(msg)
                if status in (401, 402, 403):
                     msg = Ta('Do not have valid authentication for feed %s') % feed
                     logging.info(msg)
                return unicoder(msg)

Someone has an RSS-feed-URL without valid authentication for me?

Re: problem with NZBindex RSS feed

Posted: June 9th, 2014, 8:04 am

by sander

I played with url = 'http://api.nzb.su/api?apikey=777777777& ... 030%2C5040', which in my python code gives a loud and clear "urllib2.HTTPError: HTTP Error 403: Forbidden", as expected.

However, the same URL as RSS in SAB only results in

Code: Select all

2014-06-09 15:00:59,814::INFO::[rss:520] Starting scheduled RSS read-out for "nzb.su without correct api"
2014-06-09 15:00:59,814::DEBUG::[rss:332] Running feedparser on http://api.nzb.su/api?apikey=777777777&t=tvsearch&cat=5030%2C5040
2014-06-09 15:01:00,088::DEBUG::[rss:334] Done parsing http://api.nzb.su/api?apikey=777777777&t=tvsearch&cat=5030%2C5040
2014-06-09 15:01:00,089::DEBUG::[rss:342] Return status from RSS feed is 200
2014-06-09 15:01:00,089::INFO::[rss:364] RSS Feed http://api.nzb.su/api?apikey=777777777&t=tvsearch&cat=5030%2C5040 was empty

... a plain 200?

Needless to say it is not caught by "status >= 400 and status <=499".

Value of d is:

Code: Select all

{'feed': {'error': {'code': u'100', 'description': u'Incorrect user credentials'}}, 'status': 200, 'version': u'', 'encoding': u'us-ascii', 'bozo': 0, 'headers': {'x-powered-by': 'PHP/5.5.8-3', 'transfer-encoding': 'chunked', 'set-cookie': 'PHPSESSID=d563fsm4mmphdbr64ar0qmftg3; path=/', 'expires': 'Thu, 19 Nov 1981 08:52:00 GMT', 'vary': 'Accept-Encoding', 'content-encoding': 'gzip', 'connection': 'close', 'pragma': 'no-cache', 'cache-control': 'no-store, no-cache, must-revalidate, post-check=0, pre-check=0', 'date': 'Mon, 09 Jun 2014 13:06:19 GMT', 'server': 'nginx', 'content-type': 'text/xml'}, 'href': u'http://api.nzb.su/api?apikey=777777777&t=tvsearch&cat=5030%2C5040', 'namespaces': {}, 'entries': []}

So does rss.py something else than my plain urllib2 code?

Re: problem with NZBindex RSS feed

Posted: June 9th, 2014, 8:42 am

by jcfp

No, it's the server side doing that. The response differs if you set the user agent to something sab would send.

Code: Select all

$ wget -q -O- -U 'SABnzbd/0.7.666' -S 'http://api.nzb.su/api?apikey=666666666666&t=tvsearch&cat=5030%2C5040'
  HTTP/1.1 200 OK
  Server: nginx
  Date: Mon, 09 Jun 2014 13:42:04 GMT
  Content-Type: text/xml
  Content-Length: 100
  Connection: keep-alive
  Vary: Accept-Encoding
  X-Powered-By: PHP/5.5.8-3
  Set-Cookie: PHPSESSID=1v4n73efvqak28tg3ijlg6avi0; path=/
  Expires: Thu, 19 Nov 1981 08:52:00 GMT
  Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
  Pragma: no-cache
<?xml version="1.0" encoding="UTF-8"?>
<error code="100" description="Incorrect user credentials"/>

Re: problem with NZBindex RSS feed

Posted: June 9th, 2014, 8:55 am

by sander

Oehhh ... good find!

Code: Select all

sander@flappie:~$ wget -q -O-  -S 'http://api.nzb.su/api?apikey=666666666666&t=tvsearch&cat=5030%2C5040'
  HTTP/1.1 403 Forbidden
  Server: nginx
  Date: Mon, 09 Jun 2014 13:52:28 GMT
  Content-Type: text/html
  Content-Length: 162
  Connection: keep-alive
  Vary: Accept-Encoding
sander@flappie:~$

any Useragent seems to give the 200:

Code: Select all

sander@flappie:~$ wget -q -O- -U 'anything' -S 'http://api.nzb.su/api?apikey=666666666666&t=tvsearch&cat=5030%2C5040'
  HTTP/1.1 200 OK
  Server: nginx
  Date: Mon, 09 Jun 2014 13:55:08 GMT
  Content-Type: text/xml
  Content-Length: 100
  Connection: keep-alive
  Vary: Accept-Encoding
  X-Powered-By: PHP/5.5.8-3
  Set-Cookie: PHPSESSID=c040qb35hh978df0voh6o7qhd3; path=/
  Expires: Thu, 19 Nov 1981 08:52:00 GMT
  Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
  Pragma: no-cache
<?xml version="1.0" encoding="UTF-8"?>
<error code="100" description="Incorrect user credentials"/>
sander@flappie:~$