problem with NZBindex RSS feed

Report & discuss bugs found in SABnzbd
Forum rules
Help us help you:
  • Are you using the latest stable version of SABnzbd? Downloads page.
  • Tell us what system you run SABnzbd on.
  • Adhere to the forum rules.
  • Do you experience problems during downloading?
    Check your connection in Status and Interface settings window.
    Use Test Server in Config > Servers.
    We will probably ask you to do a test using only basic settings.
  • Do you experience problems during repair or unpacking?
    Enable +Debug logging in the Status and Interface settings window and share the relevant parts of the log here using [ code ] sections.
Post Reply
User avatar
sander
Release Testers
Release Testers
Posts: 9429
Joined: January 22nd, 2008, 2:22 pm

problem with NZBindex RSS feed

Post by sander »

I'm trying to add NZBindex RSS feed http://www.nzbindex.nl/rss/?q=hockey&so ... esc&max=25 to SAB. SAB says:
RSS » hockey nzbindex

Failed to retrieve RSS from http://www.nzbindex.nl/rss/?q=hockey&so ... esc&max=25: <unknown>:6:-1: Opening and ending tag mismatch: hr line 6 and body
I can't find any "hr" in the resulting XML, so I'm stuck

According to http://wiki.sabnzbd.org/nzb-sources, nzbindex.nl is supported, so I would say this is a bug somewhere between nzbindex and SABnzbd

EDIT:

Ah, I think I know the cause:

That RSS URL sometimes says:
503 Service Temporarily Unavailable

nginx

... of which the source is:
<html>
<head><title>503 Service Temporarily Unavailable</title></head>
<body bgcolor="white">
<center><h1>503 Service Temporarily Unavailable</h1></center>
<hr><center>nginx</center>
</body>
</html>
<!-- a padding to disable MSIE and Chrome friendly error page -->
<!-- a padding to disable MSIE and Chrome friendly error page -->
<!-- a padding to disable MSIE and Chrome friendly error page -->
<!-- a padding to disable MSIE and Chrome friendly error page -->
<!-- a padding to disable MSIE and Chrome friendly error page -->
<!-- a padding to disable MSIE and Chrome friendly error page -->
... and there's the <HR>!

A manual sessions tells this:

Code: Select all

$ wget 'http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25' -O blabla
--2014-06-08 11:51:40--  http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25
Resolving www.nzbindex.nl (www.nzbindex.nl)... 178.20.172.26
Connecting to www.nzbindex.nl (www.nzbindex.nl)|178.20.172.26|:80... connected.
HTTP request sent, awaiting response... 503 Service Temporarily Unavailable
2014-06-08 11:51:40 ERROR 503: Service Temporarily Unavailable.
Assuming the 503 is visible as return, I would expect SAB to check for that, then NOT parse the HTML, and retry in some minutes.
User avatar
sander
Release Testers
Release Testers
Posts: 9429
Joined: January 22nd, 2008, 2:22 pm

Re: problem with NZBindex RSS feed

Post by sander »

I wrote some code to check the response code returned and it indeed is a 503. So is SAB ignoring that info and trying to parse anyway?

Code: Select all

import urllib2
import sys

url = 'http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25'

try:
	req = urllib2.Request(url)
	response = urllib2.urlopen(req)
	print "HTTP code is", response.getcode()
	htmlstuff = response.read()
	print htmlstuff

except urllib2.HTTPError, e:
	print "Error:", str(e)
	sys.exit(0)

except: 
	print "something else went wrong"
with result:

Code: Select all

$ python rss-tester.py 
Error: HTTP Error 503: Service Temporarily Unavailable
User avatar
shypike
Administrator
Administrator
Posts: 19773
Joined: January 18th, 2008, 12:49 pm

Re: problem with NZBindex RSS feed

Post by shypike »

Possibly. The RSS parser is a (quite complex) third-party module.
So maybe it doesn't handle 503 ok or SABnzbd does something wrong.
I'll check later this week.
User avatar
sander
Release Testers
Release Testers
Posts: 9429
Joined: January 22nd, 2008, 2:22 pm

Re: problem with NZBindex RSS feed

Post by sander »

FWIW: this is the current logging:

Code: Select all

$ cat sabnzbd.log | grep -i nzbindex

2014-06-09 10:56:38,674::DEBUG::[rss:332] Running feedparser on http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25
2014-06-09 10:56:38,712::DEBUG::[rss:334] Done parsing http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25
2014-06-09 10:56:38,712::INFO::[rss:349] Failed to retrieve RSS from http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25: <unknown>:6:-1: Opening and ending tag mismatch: hr line 6 and body

2014-06-09 10:59:31,209::DEBUG::[rss:332] Running feedparser on http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25
2014-06-09 10:59:32,245::DEBUG::[rss:334] Done parsing http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25
2014-06-09 10:59:32,246::INFO::[rss:349] Failed to retrieve RSS from http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25: <unknown>:6:-1: Opening and ending tag mismatch: hr line 6 and body

2014-06-09 11:13:44,355::INFO::[rss:509] Starting scheduled RSS read-out for "hockey nzbindex"
2014-06-09 11:13:44,356::DEBUG::[rss:332] Running feedparser on http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25
2014-06-09 11:13:44,803::DEBUG::[rss:334] Done parsing http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25

The third one looks successful.
User avatar
sander
Release Testers
Release Testers
Posts: 9429
Joined: January 22nd, 2008, 2:22 pm

Re: problem with NZBindex RSS feed

Post by sander »

OK, changed the code of def run_feed in rss.py a bit. Result is now:

2014-06-09 11:36:13,460::INFO::[rss:340] Got HTTP Error 503 - Service unavailable for hockey nzbindex on http://www.nzbindex.nl/rss/?q=hockey&so ... esc&max=25

Code changed (the middle part):

Code: Select all

            d = feedparser.parse(uri.replace('feed://', 'http://'))

            status = d.get('status', 999)
            if status in (503, 888):
                msg = Ta('Got HTTP Error 503 - Service unavailable for %s on %s') % (feed, uri)
                logging.info(msg)
                return unicoder(msg)

            if not d:
                msg = Ta('Failed to retrieve RSS from %s: %s') % (uri, '?')
                logging.info(msg)
                return unicoder(msg)

EDIT: the logging was:

Code: Select all

2014-06-08 21:35:13,038::INFO::[rss:349] Failed to retrieve RSS from http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25: <unknown>:6:-1: Opening and ending tag mismatch: hr line 6 and body
from the corresponding code:

Code: Select all

msg = Ta('Failed to retrieve RSS from %s: %s') % (uri, xml_name(str(d['bozo_exception'])))
With my code, that point is not reached anymore. The attempt to parse is still done, but not presented. THe logging is now a more clear 503 message, instead of the confusing " hr line 6".

EDIT 3:

Code: Select all

value of d is:
{'feed': {'summary': u'<center><h1>503 Service Temporarily Unavailable</h1></center>\n<hr /><center>nginx</center>'}, 'status': 503, 'version': u'', 'encoding': u'us-ascii', 'bozo': 1, 'headers': {'date': 'Mon, 09 Jun 2014 10:48:53 GMT', 'content-length': '206', 'content-type': 'text/html', 'connection': 'close', 'server': 'nginx'}, 'href': u'http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25', 'namespaces': {}, 'entries': [], 'bozo_exception': SAXParseException('Opening and ending tag mismatch: hr line 6 and body\n',)}
2014-06-09 12:47:12,867::INFO::[rss:341] Server side error (code 503); could not get hockey nzbindex on http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25
User avatar
sander
Release Testers
Release Testers
Posts: 9429
Joined: January 22nd, 2008, 2:22 pm

Re: problem with NZBindex RSS feed

Post by sander »

I changed to code to catch all server side (=5xx, see http://en.wikipedia.org/wiki/List_of_HT ... rver_Error ) problems:

Code: Select all

            status = d.get('status', 999)
            if status >= 500 and status <=599:
                msg = Ta('Server side error (code %s); could not get %s on %s') % (status, feed, uri)
                logging.info(msg)
                return unicoder(msg)
Logging result:

Code: Select all

2014-06-09 12:39:02,826::INFO::[rss:515] Starting scheduled RSS read-out for "hockey nzbindex"
2014-06-09 12:39:02,826::DEBUG::[rss:332] Running feedparser on http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25
2014-06-09 12:39:02,850::INFO::[rss:338] Server side error (code 503); could not get hockey nzbindex on http://www.nzbindex.nl/rss/?q=hockey&sort=agedesc&max=25
User avatar
sander
Release Testers
Release Testers
Posts: 9429
Joined: January 22nd, 2008, 2:22 pm

Re: problem with NZBindex RSS feed

Post by sander »

User avatar
jcfp
Release Testers
Release Testers
Posts: 1032
Joined: February 7th, 2008, 12:45 pm

Re: problem with NZBindex RSS feed

Post by jcfp »

May want to expand the range you catch to include 4xx codes as well. After all, these too signal problems where sab shouldn't expect a usable rss feed in the reply.
User avatar
sander
Release Testers
Release Testers
Posts: 9429
Joined: January 22nd, 2008, 2:22 pm

Re: problem with NZBindex RSS feed

Post by sander »

jcfp wrote:May want to expand the range you catch to include 4xx codes as well. After all, these too signal problems where sab shouldn't expect a usable rss feed in the reply.
... ok, but then in a separate if/logging, as "The 4xx class of status code is intended for cases in which the client seems to have errored.". Source: http://en.wikipedia.org/wiki/List_of_HT ... ient_Error

Current code

Code: Select all

            if status in (401, 402, 403):
                msg = Ta('Do not have valid authentication for feed %s') % feed
                logging.info(msg)
                return unicoder(msg)
could change to something like:

Code: Select all

            if status >= 400 and status <=499:
                msg = Ta('Client side error (server code %s); could not get %s on %s') % (status, feed, uri)
                logging.info(msg)
                if status in (401, 402, 403):
                     msg = Ta('Do not have valid authentication for feed %s') % feed
                     logging.info(msg)
                return unicoder(msg)
Someone has an RSS-feed-URL without valid authentication for me?
User avatar
sander
Release Testers
Release Testers
Posts: 9429
Joined: January 22nd, 2008, 2:22 pm

Re: problem with NZBindex RSS feed

Post by sander »

I played with url = 'http://api.nzb.su/api?apikey=777777777& ... 030%2C5040', which in my python code gives a loud and clear "urllib2.HTTPError: HTTP Error 403: Forbidden", as expected.

However, the same URL as RSS in SAB only results in

Code: Select all

2014-06-09 15:00:59,814::INFO::[rss:520] Starting scheduled RSS read-out for "nzb.su without correct api"
2014-06-09 15:00:59,814::DEBUG::[rss:332] Running feedparser on http://api.nzb.su/api?apikey=777777777&t=tvsearch&cat=5030%2C5040
2014-06-09 15:01:00,088::DEBUG::[rss:334] Done parsing http://api.nzb.su/api?apikey=777777777&t=tvsearch&cat=5030%2C5040
2014-06-09 15:01:00,089::DEBUG::[rss:342] Return status from RSS feed is 200
2014-06-09 15:01:00,089::INFO::[rss:364] RSS Feed http://api.nzb.su/api?apikey=777777777&t=tvsearch&cat=5030%2C5040 was empty
... a plain 200? ???

Needless to say it is not caught by "status >= 400 and status <=499". >:(

Value of d is:

Code: Select all

{'feed': {'error': {'code': u'100', 'description': u'Incorrect user credentials'}}, 'status': 200, 'version': u'', 'encoding': u'us-ascii', 'bozo': 0, 'headers': {'x-powered-by': 'PHP/5.5.8-3', 'transfer-encoding': 'chunked', 'set-cookie': 'PHPSESSID=d563fsm4mmphdbr64ar0qmftg3; path=/', 'expires': 'Thu, 19 Nov 1981 08:52:00 GMT', 'vary': 'Accept-Encoding', 'content-encoding': 'gzip', 'connection': 'close', 'pragma': 'no-cache', 'cache-control': 'no-store, no-cache, must-revalidate, post-check=0, pre-check=0', 'date': 'Mon, 09 Jun 2014 13:06:19 GMT', 'server': 'nginx', 'content-type': 'text/xml'}, 'href': u'http://api.nzb.su/api?apikey=777777777&t=tvsearch&cat=5030%2C5040', 'namespaces': {}, 'entries': []}
So does rss.py something else than my plain urllib2 code?
User avatar
jcfp
Release Testers
Release Testers
Posts: 1032
Joined: February 7th, 2008, 12:45 pm

Re: problem with NZBindex RSS feed

Post by jcfp »

No, it's the server side doing that. The response differs if you set the user agent to something sab would send.

Code: Select all

$ wget -q -O- -U 'SABnzbd/0.7.666' -S 'http://api.nzb.su/api?apikey=666666666666&t=tvsearch&cat=5030%2C5040'
  HTTP/1.1 200 OK
  Server: nginx
  Date: Mon, 09 Jun 2014 13:42:04 GMT
  Content-Type: text/xml
  Content-Length: 100
  Connection: keep-alive
  Vary: Accept-Encoding
  X-Powered-By: PHP/5.5.8-3
  Set-Cookie: PHPSESSID=1v4n73efvqak28tg3ijlg6avi0; path=/
  Expires: Thu, 19 Nov 1981 08:52:00 GMT
  Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
  Pragma: no-cache
<?xml version="1.0" encoding="UTF-8"?>
<error code="100" description="Incorrect user credentials"/>
User avatar
sander
Release Testers
Release Testers
Posts: 9429
Joined: January 22nd, 2008, 2:22 pm

Re: problem with NZBindex RSS feed

Post by sander »

Oehhh ... good find!

Code: Select all

sander@flappie:~$ wget -q -O-  -S 'http://api.nzb.su/api?apikey=666666666666&t=tvsearch&cat=5030%2C5040'
  HTTP/1.1 403 Forbidden
  Server: nginx
  Date: Mon, 09 Jun 2014 13:52:28 GMT
  Content-Type: text/html
  Content-Length: 162
  Connection: keep-alive
  Vary: Accept-Encoding
sander@flappie:~$
any Useragent seems to give the 200:

Code: Select all

sander@flappie:~$ wget -q -O- -U 'anything' -S 'http://api.nzb.su/api?apikey=666666666666&t=tvsearch&cat=5030%2C5040'
  HTTP/1.1 200 OK
  Server: nginx
  Date: Mon, 09 Jun 2014 13:55:08 GMT
  Content-Type: text/xml
  Content-Length: 100
  Connection: keep-alive
  Vary: Accept-Encoding
  X-Powered-By: PHP/5.5.8-3
  Set-Cookie: PHPSESSID=c040qb35hh978df0voh6o7qhd3; path=/
  Expires: Thu, 19 Nov 1981 08:52:00 GMT
  Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
  Pragma: no-cache
<?xml version="1.0" encoding="UTF-8"?>
<error code="100" description="Incorrect user credentials"/>
sander@flappie:~$ 
Post Reply