RSS feed layout compatible with SABnzbd...

teracow · Post by **teracow** » June 12th, 2015, 4:49 am

Hello,

This question is probably best answered by a SABnzbd developer-type person.

Due to the imminent demise of Yahoo Pipes, I've decided to code my own feed aggregator. This will run locally and act as a middle-man between my usenet indexer and my local copy of SABnzbd.

I'm hoping that a minimum XML layout can be advised in order for Sab to correctly read my feed. A sanitised adapation of my current indexer's feed results in this:

Code: Select all

<?xml version="1.0" encoding="utf-8" ?>
	<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:newznab="http://www.newznab.com/DTD/2010/feeds/attributes/">
	<channel>
		<title>RSS title here</title>
		<description>RSS description here</description>
		<language>en-us</language>
			<item>
				<title>NZB title here</title>
				<pubDate>Tue, 09 Jun 2015 19:12:13 -0600</pubDate>
				<category>x264</category>
				<link>https://website.com/fetch/abc123</link>
				<guid>https://website.com/fetch/abc123</guid>
				<newznab:attr name="category" value="6000" />
				<newznab:attr name="size" value="342144435" />
				<newznab:attr name="grabs" value="2" />
				<newznab:attr name="comments" value="0" />
				<newznab:attr name="password" value="0" />
				<newznab:attr name="usenetdate" value="Tue 09 Jun 2015 19:12:13 -0600" />
				<newznab:attr name="group" value="alt.binaries.etc" />
			</item>
		</channel>
	</rss>

This feed is read without issue by SABnzbd at present.

My aggregator will be generating it's own XML. Can I output as shown below and still have SABnzbd read it?

Code: Select all

<?xml version="1.0" encoding="utf-8" ?>
	<rss version="2.0">
	<channel>
		<title>RSS title here</title>
		<description>RSS description here</description>
		<item>
			<title>NZB title here</title>
			<category>x264</category>
			<link>https://website.com/fetch/abc123</link>
		</item>
	</channel>
	</rss>

Are there any mods that I should make for compatibility?

Thank you.

Post by **sander** » June 12th, 2015, 6:45 am

Well ... did you try it out with SABnzbd? The proof of the pudding is in the junk ;-)

EDIT: I did some junk:

put your XML into my /home/sander/weg.rss
I tried, and my web browser (Chromium) could succesfully access and show file:///home/sander/weg.rss
in SABnzbd's RSS feed: add "file:///home/sander/weg.rss" as URL ... and, yes, SABnzbd accepts that
then in SAB's details of that feed ... SAB sees the content of the RSS/XML file. Cool
Clicking on Download really tries to download from the given URL https://website.com/fetch/abc123 (which does not exist / is empty).

Looks good!

Code: Select all

2015-06-12 13:57:06,960::DEBUG::[interface:1744] RSS READOUT = True
2015-06-12 13:57:06,960::DEBUG::[rss:332] Running feedparser on file:///home/sander/weg.rss
2015-06-12 13:57:07,406::DEBUG::[rss:334] Done parsing file:///home/sander/weg.rss
2015-06-12 13:57:07,407::DEBUG::[rss:409] Trying title NZB title here
2015-06-12 13:57:07,407::DEBUG::[rss:436] Filter matched on rule 0

and

Code: Select all

2015-06-12 14:00:17,947::DEBUG::[interface:1744] RSS READOUT = False
2015-06-12 14:00:17,949::DEBUG::[rss:409] Trying title NZB title here
2015-06-12 14:00:17,951::DEBUG::[rss:436] Filter matched on rule 0
2015-06-12 14:00:26,406::INFO::[__init__:513] Fetching https://website.com/fetch/abc123
2015-06-12 14:00:26,427::INFO::[nzbqueue:218] Saving queue
2015-06-12 14:00:26,429::DEBUG::[__init__:844] Saving data for SABnzbd_nzo_6fRAL8 in /home/sander/.sabnzbd/admin/future
2015-06-12 14:00:26,444::INFO::[__init__:919] Saving data for queue9.sab in /home/sander/.sabnzbd/admin/queue9.sab
2015-06-12 14:00:26,955::INFO::[growler:298] Send to NotifyOSD: NZB added to queue / Trying to fetch NZB from https://website.com/fetch/abc123
2015-06-12 14:00:26,994::INFO::[urlgrabber:116] Grabbing URL https://website.com/fetch/abc123
2015-06-12 14:00:27,073::DEBUG::[interface:1744] RSS READOUT = False
2015-06-12 14:00:28,973::INFO::[misc:814] Creating directories: /media/sander/SanderJ1/Downloads/incomplete/NZB title here
2015-06-12 14:00:28,982::WARNING::[nzbstuff:744] Empty NZB file abc123.nzb [https://website.com/fetch/abc123]
2015-06-12 14:00:28,985::INFO::[__init__:908] /home/sander/.sabnzbd/admin/future/SABnzbd_nzo_6fRAL8 removed

So ... format approved!

teracow · Post by **teracow** » June 12th, 2015, 9:02 pm

Thanks sander.

Post by **sander** » June 13th, 2015, 1:05 am

You're welcome.

How did you create the XML/RSS output? Can you post the source code?

teracow · Post by **teracow** » June 13th, 2015, 2:30 pm

I only started coding this on Wednesday so it's got a way to go yet:

it reads a feed (live from website or from a local file),
checks the size of each item (newznab:attr 'size'),
if the title has particular words then grab it,
if it is smaller than 700MB or larger than 15GB then ignore it.
create XML for any items that make it through,
display entire XML tree to screen.

Code: Select all

#!/usr/bin/env python

import re
import sys
import os
from urllib import urlopen
from bs4 import BeautifulSoup
from xml.etree.ElementTree import Element, SubElement, Comment, tostring
from xml.dom import minidom
import ConfigParser

def load_list_from_file(file):
	# build list from each text file line
	return [line.strip() for line in open(file, 'r')]

def load_list_from_dir(dir):
	# get list of all subdirectories
	mynames = next(os.walk("{}.".format(dir)))[1]

	# remove compilation dirs like '(dump)'
	mynames = [x for x in mynames if not re.match(r"\(.*?\)", x)]

	# change dir names with '(ignore-audio)'
	mynames = [x.replace(' (ignore-audio)', '') for x in mynames]

	return mynames
	
def sizeof_fmt(num, suffix='B'):
    # refer - http://stackoverflow.com/questions/1094841/reusable-library-to-get-human-readable-version-of-file-size
    
    for unit in ['','Ki','Mi','Gi','Ti','Pi','Ei','Zi']:
        if abs(num) < 1024.0:
            return "%3.1f %s%s" % (num, unit, suffix)
        num /= 1024.0

    return "%.1f %s%s" % (num, 'Yi', suffix)
    
def convert_to_MB(number):
	return(int(number / pow(1024, 2)))
	
def strip_desc(text):
	# remove <description> XML tags surrounding CDATA block.
	# this is a workaround so lxml doesn't choke on CDATA tags.

	return re.sub(r"<description><!\[CDATA.*?\]\]></description>", "", text, flags=re.DOTALL)

def prettify(elem):
	rough_string = tostring(elem, 'utf-8')
	reparsed = minidom.parseString(rough_string)

	return reparsed.toprettyxml(indent="   ", encoding='utf-8')

def init_xml_tree(title):
	global rss_root
	global rss_channel
	
	rss_root = Element('rss')
	rss_root.set('version', '2.0')
	rss_channel = SubElement(rss_root, 'channel') 
	rss_title = SubElement(rss_channel, 'title') 
	rss_title.text = title
	
	return 0
	
def append_to_xml_tree(title, link, category, size):
	# add another 'item' to the main tree
	global rss_channel

	feed_item = SubElement(rss_channel, 'item')
	
	feed_title = SubElement(feed_item, 'title')
	feed_title.text = title

	feed_link = SubElement(feed_item, 'link')
	feed_link.text = link

	feed_category = SubElement(feed_item, 'category')
	feed_category.text = category

	feed_size = SubElement(feed_item, 'size')
	feed_size.text = size

	return 0

LOWER_SIZE_LIMIT = 700			# (MB) files must be larger than this
UPPER_SIZE_LIMIT = 15000		# (MB) files must be smaller than this

script_path = os.path.dirname(os.path.realpath(sys.argv[0]))
script_name = os.path.basename(__file__)							# this script's name without path
script_basename = os.path.splitext(script_name)[0]					# this script's name without path or extension
config_file = script_path + "/rss-agg.cfg"
config = ConfigParser.ConfigParser()
config.read(config_file)
primary_feed = config.get('feeds', 'primary') 

included_phrases_file = config.get('paths', 'included_phrases_file') 
included_names_path = config.get('paths', 'included_names_path') 
included_phrases = load_list_from_file(included_phrases_file) + load_list_from_dir(included_names_path)
included_phrases = [s.replace(' ', '.') for s in included_phrases]
included_phrases = [s.lower() for s in included_phrases]

excluded_phrases_file = config.get('paths', 'excluded_phrases_file') 
excluded_phrases = load_list_from_file(excluded_phrases_file)
excluded_phrases = [s.replace(' ', '.') for s in excluded_phrases]
excluded_phrases = [s.lower() for s in excluded_phrases]

print("\n * keeping these phrases: ({} found)\n{}".format(len(included_phrases), included_phrases))

print("\n * discarding these phrases: ({} found)\n{}".format(len(excluded_phrases), excluded_phrases))

print("\n - reading feed...")
#handler = urlopen(primary_feed).read()
handler = open('feedcopy.rss.cfm').read()

handler = strip_desc(handler)
soup = BeautifulSoup(handler, 'lxml')

init_xml_tree(soup.description.text)

for item in soup.findAll('item'):
	newznabs = item.findAll('newznab:attr')
	newz_dict = {}

	# build dictionary of newznab attributes 
	for attribute in newznabs:
		newz_dict[attribute['name'].split(".")[0]] = attribute['value'].split(".")[0]

	item_size = convert_to_MB(int(newz_dict['size']))
#	print(" file size: {:,} Mbytes".format(item_size)),

	# grab any items with these phrases
	must_grab_flag = False
	for phrase in included_phrases:
		if phrase in item.title.text.lower():
			must_grab_flag = True
			break

	# disregard anything with these phrases
	should_grab_flag = True
	for phrase in excluded_phrases:
		if phrase in item.title.text.lower():
			should_grab_flag = False
			break

	if must_grab_flag or should_grab_flag:
		# disregard if too small or too large
		if (item_size <= LOWER_SIZE_LIMIT) or (item_size >= UPPER_SIZE_LIMIT):
			must_grab_flag = False
			should_grab_flag = False

	if must_grab_flag or should_grab_flag:
		append_to_xml_tree(item.title.text, item.guid.text, item.category.text, sizeof_fmt(int(newz_dict['size'])))

print("\n * RSS XML output:\n{}".format(prettify(rss_root)))

and the config file is:

Code: Select all

[paths]
included_names_path = /names/
included_phrases_file = included_phrases.lst
excluded_phrases_file = excluded_phrases.lst

[feeds]
primary = https://website.com/rss.cfm?r=abc123

Currently working on loading a list of 'include' and 'exclude' words from file which will then be matched against the incoming feed (the Yahoo Pipes equivalent of a filter module).

I'm fairly new to Python so please excuse any glaring newbie-type issues that are present.

Support Forum

RSS feed layout compatible with SABnzbd...

RSS feed layout compatible with SABnzbd...

Re: RSS feed layout compatible with SABnzbd...

Re: RSS feed layout compatible with SABnzbd...

Re: RSS feed layout compatible with SABnzbd...

Re: RSS feed layout compatible with SABnzbd...