Bryan Lawrence : Bryan's Blog 2006/03/01

Bryan Lawrence

... personal wiki, blog and notes

Bryan's Blog 2006/03/01

openid part one, why I might have to roll my own

(One day I'll blog about my day job again ... sometimes it's hard to remember that I'm an atmospheric scientist first and a computer geek second).

Anyway, on joining the trackback working group, the first thing one wants to do is annotate the wiki, which means you have to have an openid identity. Hmm. This is the first I've ever heard of openid. sixapart who run the wiki recommend one of MyOpenID or videntity to get an identity if you don't have one.

Ok, so off I trot. Firstly, MyOpenID. Let's have a look at their terms of service, and in particular this bit:

You agree to indemnify and hold JanRain, and its subsidiaries, affiliates, officers, agents, co-branders or other partners, and employees, harmless from any claim or demand, including reasonable attorneys' fees, made by any third party due to or arising out of your Content, your use of the Service, your connection to the Service, your violation of the TOS, or your violation of any rights of another.

This states that if someone in the U.S. get's pissed off with my content, and can't get me in the UK, they can go after me via JanRain, and then I'm liable for JanRain's attorney's fees! So, a denial of service attack on me, would be to threaten to sue JanRain (not me, which would require them to come to a UK court I think). Don't like this much. Reject.

Ok, let's have a look at videntity, and their terms of service:

You agree to hold harmless and indemnify the Provider, and its subsidiaries, affiliates, officers, agents, and employees from and against any third party claim arising from or in any way related to your use of the Service, including any liability or expense arising from all claims, losses, damages (actual and consequential), suits, judgments, litigation costs and attorneys' fees, of every kind and nature. In such a case, the Provider will provide you with written notice of such claim, suit or action.

This time it's the Costa Rican courts, but it's the same deal.

While I hope I never write anything that would piss someone off so much they'd sue, the point is that they should be suing me, and these particular indemnification clauses seems to imply that if they can't get to me in my court system, they can get to me in another court system via my identity manager ...

... thanks, but no thanks! I can see why these folk want to be indemnified, but by putting either an explicit clause covering content or loose wording that has the same affect, I think they're making things worse - if they didn't have this, there would be no way anyone would sue them because one couldn't win - there is no method that they have to control my content, so a prospective suer would be wasting his/her money bringing a case. However, by putting this in, despite the fact that they (the identity managers) can't control my content, there is a route for a prospective suer to get at least costs out of me, so suddenly it's worth it, because they can directly affect me.

Now I reckon there is nearly zero risk that this would happen, but it's a principle thing. These sorts of clauses are bad news, and it's up for us (the users) not to accept them! We shouldn't mindlessly click through!

So, I guess if I want to use openid, I need to role my own identity. Well, I'll probably not bother, but it might, just might be worth investigating for NDG ... in which case, the python openid software may be of interest ... but really, I don't think I'll ever get to a part two on this subject ... there are just too many other things I should be doing.

Update (7th March, 2006): Actually these indemnity clauses are rampant. Even blogger.com - see clause 13 - has one!

by Bryan Lawrence : 2006/03/01 : Categories computing : 1 comment (permalink)

Some Trackback code

So I've joined up to the trackback working group, and I decided I'd better have a proper trackback implementation to play with ... given this is all very important for claddier the time investment was worth it.

So here is my toy standalone implementation for playing with. I have no idea whether it is actually properly compliant, but it works for me. If folk want to point out why it's wrong I'll be grateful. But note it's only toy code, it's not for real work.

There are five files:

  • NewTrackbackStuff.py - which consists of three classes which I'll document below: a TrackbackProvider (handles the trackbacks), which uses the Handler to provide a dumb but persistent store. There is also a StandardPing class to help with various manipulations ... (it's really unnecessary, but helped me with debugging and thinking etc).

  • cgi_server.py - a dumb server which runs

  • tb_server.py on the localhost at port 8001.

  • ElementTree.py (part of the fabulous package by Fredrik Lundh and included here for completeness)

  • test_tb_server.py which tests the trackback to the persistent store.

It's all pretty simple stuff, anyway, this is how you invoke trackback, assuming you have the tb_server running locally on port 8001:

import urllib,urllib2

payload={'title':'a new citing article title',
		'url':'url of citing article',
		'excerpt':'And so we cite blah ... carefully  ...',
		'blog_name':'Name of collection which hosts citing article'}

s=urllib.urlencode(payload)

req=urllib2.Request('http://localhost:8001/cgi/tb_server.py/link1')
req.add_header('User-Agent','bnl trackback tester v0.2')
req.add_header('Content-Type','application/x-www-form-urlencoded')
fd=urllib2.urlopen(req,s)
print fd.readlines()

***
highlight file error
***

There's not much to say about that. The server looks like this:

#!/usr/bin/env python
from NewTrackbackStuff import TrackBackProvider,Handler
import cgi,os
#mport cgitb
#cgitb.enable()

handler=Handler()
relative_path=os.environ.get("PATH_INFO").strip('/')
method=os.environ.get("REQUEST_METHOD")
cgifields=cgi.FieldStorage()

print "Content-type: text/html"
print

tb=TrackBackProvider(method,relative_path,cgifields,handler)
print tb.result()

***
highlight file error
***

So all the fun stuff is in NewTrackBackStuff in the TrackBackProvider:

class TrackBackProvider:
    	''' This is a very simpler CGI handler for incoming trackback pings, all it
	does is accept the ping if appropriate, and biff it in a dumb persistence
	store '''
	def __init__(self, method, relative_path, cgifields, handler):
		''' Provides trackback services
		    (The handler provides a persistent store)
		'''
		self.method=method
		self.relative_path=relative_path
		self.fields={}
		#lose the MiniFieldStorage syntax:
		for key in cgifields: self.fields[key]=cgifields[key].value
        	self.handler = handler
		
		self.noid='Incorrect permalink ID'
        	self.nostore='Cannot store the ping information'
        	self.invalid='Invalid ping format'
		self.nourl='Invalid or nonexistent ping url'
		self.noretrieve='Unable to retreive information'
		self.xmlhdr=''
		
	def result(self):
        	target=self.relative_path
            	if target=='': return self.__Response(self.noid)
		if not self.handler.checkTargetExists(target): self.__Response(self.noid)
		if self.method=='POST':
			''' All incoming pings should be a post '''	
			try:
				ping=StandardPing(self.fields)
				if ping.noValidURL(): return self.__Response(self.nourl)
			except:
				return self.__Response(self.invalid+str(self.fields))
			try:
				r=self.handler.store(target,ping)
				return self.__Response()
			except:
				return self.__Response(self.nostore)
		elif self.method=='GET':
			# this should get the resource at the trackback target ...
			try:
				r=self.handler.retrieve(target)
				return self.xmlhdr+r
			except:
            			return self.__Response(self.noretrieve)

	def __Response(self,error=''):
        	''' Format a compliant reply to an incoming ping '''
        	e='0'
        	if error!='': e='1%s'%error
        	r=''.join([self.xmlhdr,'',e,''])
        	return r

***
highlight file error
***

which is mainly about handling the error returns. The persistent store, as I say is very dumb, and simply consists of an XML file for this toy, which uses:

class Handler:
	''' This provides a simple persistence store for incoming trackbacks.
	Makes no assumptions beyond assuming the incoming ping is
	an xml fragment. No logic for ids for the trackbacks etc ... this is 
	supposed to be dumb, and essentially useless for real applications!!'''
	def __init__(self):
		self.xmlfile='trackback-archive.xml'
		# if file doesn't exist create it on store ...
		try:
			t=ElementTree.parse(self.xmlfile)
			self.data=t.getroot()
		except:
			self.data=ElementTree.Element("tracbackArchive")
	def checkTargetExists(self,target):
		''' check whether the target link exists '''
		for t in self.data:
			if t.attrib.get('permalink')==target: return 1
		return 0
	def urlstore(self,target,ping):
		'''stores a url encoded "standard" ping '''
		node=ElementTree.fromstring(ping)
		return self.store(target,node)
	def store(self,target,ping):
		''' stores a ping and associates it with target, quite happy for the
		moment to have duplicates - assumes the ping is already an ET instance '''
		#regrettably I think we have to test each child for the attribute name,
		#I'd prefer to use an xpath like expression ... but don't know how.
		for t in self.data:
			if t.attrib.get('permalink')==target:
				#add to an existing target element
				t.append(ping.element)
				break
		else:
			#create a new target element
			t=ElementTree.Element('target',permalink=target)
			t.append(ping.element)
			self.data.append(t)
		ElementTree.ElementTree(self.data).write(self.xmlfile)
		return 0
	def retrieve(self,target):
		''' retrieves the target and any pings associate with it '''
		for t in self.data:
			if t.attrib.get('permalink')==target: return ElementTree.tostring(t)

***
highlight file error
***

Finally, for handling the ping, I found this useful:

class StandardPing:
	''' Defines a standard trackback ping payload. Use as a toy to validate
	existing standard pings and convert to XML or urlencode ...'''
	def __init__(self,argdict=None):
		''' Instantiate with payload or empty '''
		self.allowed=('title','url','excerpt','blog_name')
		self.element=ElementTree.Element('ping')
        	for key in argdict:
			self.__setitem__(key,argdict[key])
	def __setitem__(self,key,item):
        	''' set item just as if it is dictionary, but keys are limited'''
        	if key not in self.allowed: self.reject(key)
        	e=ElementTree.SubElement(self.element,key)
		e.text=item
	def reject(self,key):
        	raise 'Invalid key in TraceBack Ping:'+key
	def toXML(self):
		''' take element tree instance and create xml string '''
		s=ElementTree.tostring(self.element)
		return s
	def toURLdata(self):
		''' take element tree instance and create url encoded payload string '''
		s=[]
		for item in self.element:
			s.append((item.tag,item.text))
		return urllib.urlencode(s)
	def noValidURL(self):
		''' At some point this could check the url for validity etc '''
		e=self.element.find('url')
		if e is None: return 1
		if e.text=='': return 1
		return 0

***
highlight file error
***

Update: 15th March, 2006: Use of blogname replaced with blog_name (might as well get it right :-)

by Bryan Lawrence : 2006/03/01 : Categories ndg computing (permalink)


DISCLAIMER: This is a personal blog. Nothing written here reflects an official opinion of my employer or any funding agency.