Journal tags: xtech08

10

XTech 2008

I enjoyed being back in Ireland. Jessica and I arrived into Dublin last Saturday but went straight from the airport to the train station so that we could spend the weekend in my hometown seeing family and friends. Said town was somewhat overwhelmed by the arrival of one of the largest cruise ships in the world.

We were back in Dublin in plenty of time for the start of this year’s XTech conference. A good time was had by the übergeeks gathered in the salubrious surroundings of a newly-opened hotel in the heart of Ireland’s capital. This was my third XTech and it had much the same feel as the previous two I’ve attended: very techy but nice and cosy. In some ways it resembles a BarCamp (but with a heftier price tag). The talks are held in fairly intimate rooms that lend themselves well to participation and discussion.

I didn’t try to attend every talk — an impossible task anyway given the triple-track nature of the schedule — but I did my damndest to liveblog the talks I did attend:

Opening Keynote by David Recordon.
Using socially-authored content to provide new routes through existing content archives by Rob Lee.
Browsers on the Move: The Year in Review, the Year Ahead by Michael Smith.
Building the Real-time Web by Matt Biddulph, Seth Fitzsimmons, Rabble and Ralph Meijer.
AMEE — The World’s Energy Meter by Gavin Starks.
Ni Hao, Monde: Connecting Communities Across Cultural and Linguistic Boundaries by Simon Batistoni.
Data Portability For Whom? by Gavin Bell.
Why You Should Have a Web Site by Steven Pemberton.
Orangutans, Oxen and Ogham Stones by Sean McGrath.

There were a number of emergent themes around social networks and portability. There was plenty of SemWeb stuff which finally seems to be moving from the theoretical to the practical. And once again the importance of XMPP, first impressed upon me at the Social Graph Foo Camp, was once again made clear.

Amongst all these high-level technical talks, I gave a presentation that was ludicrously simple and simplistic: Creating Portable Social Networks with Microformats. To be honest, I could have delivered the talk in 60 seconds: Add rel="me" to these links, add rel="contact" to those links, and that’s it. If you’re interested, you can download a PDF of the presentation including notes.

I made an attempt to record my talk using Audio Hijack. It seems to have worked okay so I’ll set about getting that audio file transcribed. The audio includes an unusual gap at around the four minute mark, just as I was hitting my stride. This was the point when Aral came into the room and very gravely told me that he needed me to come out into the corridor for an important message. I feared the worst. I was almost relieved when I was confronted by a group of geeks who proceeded to break into song. You can guess what the song was.

Ian caught the whole thing on video. Why does this keep happening to me?

Orangutans, Oxen and Ogham Stones

Sean McGrath is delivering the closing keynote at XTech 2008. Sean would like to reach inside and mess with our heads today. He plans to modify our brain structures, talking about the movable Web.

Even though Sean has been doing tech stuff for a long time he freely admits that he doesn’t know what the Web is. He quotes Dylan:

I was so much older then, I’m so much younger now.

Algorithms + Data Structures = Programs is a book by Nicklaus Wirth from 1978. Anyone remember Pascal? Sean went to college here at Trinity in 1983 doing four years of computer science which is where he came across that book.

Computing is all about language …human language. People first, machines second. Information is really about words, not numbers. Words give the numbers context.

Sean used to sit in his student bedsit and think about what algorithms actually are. He was also around at the birth of SGML in 1985. More words, then. Then he got involved in the creation of XML …even more words. Then the Web came along. HTML is, yup, more words. Even JavaScript is words. His epiphany was realising that HTTP was about sending words across the wire. The Web is fundamentally words.

There’s a Bob Dylan documentary called Eat The Document. Sean took this as a sign from God …or at least from Dylan.

Sean explains Ogham stones — horizontal lines from top to bottom. The Book of Ballymote is the Rosetta Stone of Ogham writing. The translation on this particular stone is If I were you, I would not stand here. The Irish have been using words for a long time. They’ve also been hacking for a long time. Dolmens are an example of neolithic hacking.

Illuminated documents demonstrate the long Irish history of writing unit test cases for Cascading Style Sheets. A common thread in books from the Book Of Ballymote up to the Annals of the Four Masters was that they were from a religious background. Joyce came along with the world’s first hypertext novel, Finnegan’s Wake. Sean goes from Yeats to Shane McGowan, quoting Summer In Siam as a sublime piece of Zen metaphysics:

When it’s Summer in Siam then all I really know is that I truly am in the Summer in Siam.

The Irish will even go to war over words. Copyright was a big bone of contention between St. Finnian and his student St. Columba in the 6th Century. St. Columba ran a proto-Pirate Bay. If you saw him coming, you’d bury your books. There was a war between St. Finnian and St. Columba in which 3,000 people lost their lives. Finally, the High King of Ireland said As to every cow its calf, so to every book its copy, the first official statement on copyright. But because books were actually written on cows (vellum), the statement is ambiguous.

Here’s a picture. Nobody in the room knows what it is. We haven’t had our brains rewired yet.

Sean loves the simplicity of the idea that computing is words. Sadly, it’s just not true. There are plenty of images and video on the Web.

Back to that picture. It’s a cow. One person in the room sees the cow.

Sean likes the idea of the Web as electronic Ogham stones. But he sought the 2nd path to Web enlightenment. He realised that not only is the Web not just all words, the Web doesn’t exist at all.

What is the true nature of the words on the Web? Here’s something Sean created called Finite State Machines for a mobile app called Mission Control that generated documents based on the user, the device, the location and the network. There were no persistent documents. No words, just evaporation as Leonard Cohen said.

There are three models for the world.

Model A is the platonic model. Documents exist on the server before you observe them. You request them over HTTP.
Model B is Bishop Berkeley’s model. Stuff exists but we twist it (using CSS for example).
Model C is that nothing exists until you observe it. In quantum physics there is the idea that observing a system actually defines the system.

Model A exists within Model B which exists within Model C. Model C is the general case. If you have a system that is that dynamic, you could generate Model B and therefore Model A. Look at the way our sites have evolved over time. We used to create Model A websites. Then we switched over to Model B with Web Standards. Now we’re at Model C — we’re not going to create any actual content at all. There is no content but there is also an infinite amount of content at the same time. We generate a tailor-made document for each user but we don’t hold on to that document, we throw it away. So what content actually exists on the Web?

PHP, Django, Rails, Google App Engine …on the Web, Model C wins. It’s even starting to happen on the client side with Ajax, Silverlight and Air. It’s spooky sometimes to view source and see no actual content, just JavaScript to generate the content.

Doing everything dynamically is fine as long it scales. It’s better to solve the problems of scalability than to revert to the static model. The benefits of Model C are just so much greater than Model A.

Amazon are making great services but they are rubbish at naming things, like Mechanical Turk.

So where are all the words? HTTP still delivers words to me but they are generated on the fly. The programs that generate them are hidden.

The Web is becoming a Web of silos. As the Web becomes more dynamic, it’s harder for the little guy to compete (behind me I hear Simon grumble something about Moore’s Law). So we build silos on the client side; so-called Rich Internet Applications. We’re losing URIs.

Model C is Turing complete, user-sensitive, location-sensitive and device-sensitive. It’s scalable if it’s designed right. It’s commercially viable if it’s deployed right.

But we lose hyertext and deep linking as we know it. Perhaps we will lose search. Will the Googlebot download that JavaScript and eval it to spider it? URIs have emergent properties because they can be bookmarked, tagged and mashed up. We are also losing simplicity: simply surfing documents.

So is it worth it?

Mu. That means I reject the premise of the question. We have no choice. We are heading towards Model C whether we want to or not. That’s bad for the librarians such as the Orangutan librarian from Discworld. Read Borges’s The Garden of Forking Paths. Sean recommends reading Borges first and Pratchett second — it just doesn’t work the other way around. Now Sean mentions Borges and John Wilkins — Jesus, this is just like my Hypertext talk at Reboot! Everyone has a good laugh about taxonomies. Model C makes it possible to build the library of Babel — every possible book that is 401 pages long. But the library of Babel is, in Standish’s view, useless. He says that a library is not useful for the books it contains but for the books that it doesn’t contain — the rubbish has been filtered out. How will we filter out the rubbish on a Model C Web?

Information content is inversely related to probability said Claude Shannon. George Dyson figured out that the library of Babel would be between a googol and googolplex of books.

Nothing that Sean has seen this week at XTech has rocked his belief that we are marching towards Model C. Our content is going into the cloud, despite what Steven Pemberton would wish for.

When Sean first started using the Web, you had static documents and you had a cgi-bin. Now we generate our documents dynamically. We are at an interesting crossroads right now between Joycean documents and Turing applications. Is there a middle way, a steady-state model? Sean doesn’t think so because he now believes that the Web doesn’t actually exist. The Web is really just HTTP. The value of URIs is that we can name things. It’s still important that we use URIs wisely.

Perhaps HTML is trying to be too clever, to anthropomorphise it. Perhaps HTML, in trying to balance documents and applications, is a jack of all trades and a master of none.

Sean now understands what Fielding was talking about. There is no such thing as a document. All there is is HTTP. Dan Connolly has a URI for his Volkswagen Beetle because it’s on the Web. Sean is now at peace, understanding the real value of HTTP + URIs.

Now Sean will rewire our brains by showing us the cow in the picture. Once we see the cow, we cannot unsee it.

Why You Should Have a Web Site

The enigmatic Steven Pemberton is at XTech to tell us Why you should have a Web site: it’s the law! (and other Web 3.0 issues). God, I hope he’s using Web 3.0 ironically.

Steven has heard many predictions in his time: that we will never have LCD screens, that digital photography could never replace film, etc. But the one he wants to talk about is Moore’s Law. People have been seeing that it hasn’t got long to go since 1977. Steven is going to assume that Moore’s Law is not going to go away in his lifetime.

In the 1980s the most powerful computers were the Crays. People used to say One day we will all have a Cray on our desk. In fact most laptops are about 120 Craysworth and mobile phones are about 35 Craysworth.

There is actually an LED correlation to Moore’s Law (brighter and cheaper faster). Steven predicts that within our lifetime all lighting will be LCDs.

Bandwidth follows a similar trend. Jakob Nielsen likes to claim this law; that bandwidth will double every year. In fact the timescale is closer to 10.5 months.

Following on from Moore’s and Nielsen’s laws, there’s Metcalfe’s Law: the value of a network is proportional to the square of the number of nodes. This is why it’s really good that there is only one email network and bad that there are so many instant messenger networks.

Let’s define the term Web 2.0 using Tim O’Reilly’s definition: sites that gain value by their users adding data to them. Note that these kinds of sites existed before the term was coined. There are some dangers to Web 2.0. When you contribute data to a web site, you are locking yourself in. You are making a commitment just like when you commit to a data format. This was actually one of the justifications for XML — data portability. But there are no standard ways of getting your data out of one Web 2.0 site and into another. What if you want to move your photos from one website to another? How do you choose which social networking sites to commit to? What about when a Web 2.0 site dies? This happened with MP3.com and Stage6. Or what about if your account gets closed down? There are documented cases of people whose Google accounts were hacked so those accounts were subsequently shut down — they lost all their data.

These are examples of Metcalfe’s law in action. What should really happen is that you keep all your data on your website and then aggregators can distribute it across the Web. Most people won’t want to write all the angle brackets but software should enable you to do this.

What do we need to realize this vision? First and foremost, we need machine-readable pages so that aggregators can identify and extract data. They can then create the added value by joining up all the data that is spread across the whole Web. Steven now pimps RDFa. It’s like microformats but it will invalidate your markup.

Once you have machine-readable semantics, a browser can do a lot more with the data. If a browser can identify something as an event, it can offer to add it to your calendar, show it on a map, look up flights and so on. (At this point, I really have to wonder… why do the RDFa examples always involve contact details or events? These are the very things that are more easily solved with microformats. If the whole point of RDFa is that it’s more extensible than microformats, then show some examples of that instead of showing examples that apply equally well to hCalendar or hCard)

So rather than putting all your data on other people’s Web sites, put all your data on your Web site and then you get the full Metcalfe value. But where can you store all this stuff? Steven is rather charmed by routers that double up as web servers, complete with FTP. For a personal site, you don’t need that much power and bandwidth. In any case, just look at all the power and bandwidth we do have.

To summarise, Web 2.0 is damaging to the Web. It divides the Web into topical sub-webs. With machine-readable pages, we don’t need those separate sites. We can reclaim our data and still get the value. Web 3.0 sites will aggregate your data (Oh God, he is using the term unironically).

Questions? Hell, yeah!

Kellan kicks off. Flickr is one of the world’s largest providers of RDFa. He also maintains his own site. Even he had to deal with open source software that got abandoned; he had to hack to ensure that his data survived. How do we stop that happening? Steven says we need agreed data formats like RDFa. So, Kellan says, first we have to decide on formats, then we have to build the software and then we have to build the aggregators? Yes, says Steven.

Dan says that Web 2.0 sites like Flickr add the social value that you just don’t get from building a site yourself. Steven points to MP3.com as a counter-example. Okay, says Dan, there are bad sites. Simon interjects, didn’t Flickr build their API to provide reassurance to people that they could get their data out? Not quite, says Kellan, it was created so that they could build the site in the first place.

Someone says they are having trouble envisioning Steven’s vision. Steven says I’m not saying there won’t be a Flickr — they’ll just be based on aggregation.

Someone else says that far from being worried about losing their data on Flickr, they use Flickr for backup. They can suck down their data at regular intervals (having written a script on hearing of the Microsoft bid on Yahoo). But what Flickr owns is the URI space.

Gavin Starks asks about the metrics of energy usage increases. No, it drops, says Steven.

Ian says that Steven hit on a bug in social websites: people never read the terms of service. If we encouraged best practices in EULAs we could avoid worst-case scenarios.

Someone else says that our focusing on Flickr is missing the point of Steven’s presentation.

Someone else agrees. The issue here is where the normative copy of your data exists. So instead of the normative copy living on Flickr, it lives on your own server. Flickr can still have a copy though. Steven nods his head. He says that the point is that it should be easy to move data around.

Time’s up. That was certainly a provocative and contentious talk for this crowd.

Data Portability For Whom?

It’s time for my second Gavin of the day at XTech. Gavin Bell asks Data portability for whom?

To start with, we’ve got a bunch of great technologies like OpenID and OAuth that we’re using to build an infrastructure of openness and portability but right now, these technologies don’t interoperate very cleanly. Getting a show of hands, everyone here knows of OpenID and OAuth and almost everyone here has an OpenID and uses it every week.

But we’re the alpha geeks. We forget how ahead of the curve we are. Think of RSS. We imagine it’s a widely-accepted technology but most people don’t know what it is. That doesn’t matter though as long as they are using RSS readers and subscribing to content; people don’t need to know what the underlying technology is.

Clay Shirky talked about cognitive surplus recently. We should try to tap into that cognitive surplus as Wikipedia has done. Time for some psychology.

Cognitive psychology as a field is about the same age as the study of artificial intelligence. A core tool is something called a schema, a model of understanding of the world. For example, we have a schema for a restaurant. They tend to have tables, chairs, cutlery, waiters, menus. But there is room for variation. Chinese restaurants have chopsticks instead of knives and forks, for example. We have a schema for the Web involving documents that reside at URLs. Schema congruence is the degree to which our model of the world matches the ideal model of the world.

Schemas change and adapt. Our idea of what a mobile phone is, or is capable of, has changed in the last few years. Schemas teach us that gradual change is better than big bang changes. We need a certain level of stability. When you’re pushing the envelope and changing the mental model of how something can work, you still need to support the old mental model. A good example of mental model extension is the graceful way that Flickr added video support. However, because the change was quite sudden, a portion of people got very upset. Gradual change is less scary.

Cognitive dissonance, a phrase that is often misused, is the unfortunate tension that can result from holding two conflicting thoughts at the same time. On the web, the cognitive dissonance of seeing content outside its originating point is dissipating.

J.J. Gibson came up with the idea of affordances. Chairs afford sitting on. Cups afford liquid to be poured into them. When we’re using affordances, it’s important to stick to common convention. If, on a website, you use a plus sign to allow someone to add something to a cart, you shouldn’t use the same symbol later on to allow an image to be enlarged.

Flow is the immensely enjoyable state of being fully immersed in what you’re doing. This is like the WWILFing experience on Wikipedia. You get it on Flickr too. Now we’re getting flow with multiple sites as we move between del.icio.us and Dopplr and Twitter, etc. Previously we would have experienced cognitive dissonance. Now we’re pivoting.

B.F. Skinner did a lot of research into reinforcement. We are sometimes like rats and pigeons on the Web as we click the buttons in an expectation of change (refreshing RSS, email, etc.).

Experience vs. features …don’t be feature led. A single website is just one part of people’s interaction with one another. Here’s the obligatory iPod reference: they split the features up so that the bare minimum were on the device and the rest were put into the iTunes software.

We’ve all lost count of the number of social networks we’ve signed up to. That’s not true of — excuse me, Brian — regular people. Regular people won’t upgrade their browser for your website. Regular people won’t install a plug-in for their browser. We shouldn’t be trying to sell technologies like OpenID, we should be making the technology invisible.

Gavin uses Leslie’s design of the Satisfaction sign-up process as an example. She never mentions hCard. Nobody needs to know that.

We’re trying lots of different patterns and we often get it wrong. The evil password antipattern signup page on the Spokeo website is the classic example of getting it wrong.

We must remember the hinternet. Here’s a trite but true example: Gavin’s mum …she doesn’t have her own email address. She shares it with Gavin’s dad. According to most social network sites, they are one person. And be careful of exposing stuff publicly that people don’t expect. Also, are we being elitist with things like OpenID delegation that is only for people who have their own web page and can edit it?

Our data might be portable but what about the context? If I can move a picture from Flickr but I can’t move the associated comments then what’s the point?

We’re getting very domain-centric. It would be great if everyone was issued with their own domain name. Most people don’t even think about buying a domain name. They might have a MySpace page or Facebook profile but that’s different.

Some things are getting better. People have stopped mentioning the http:// prefix. But many people don’t even see or care about your lovely URL structure. Anyway, with portable data, when you move something (like a blog post), you lose the lovely URL path.

Larry Tesler came up with the law of the conversation of complexity. There is a certain basic level of complexity. We are starting to build this basic foundation with OpenID and OAuth — they could be like copy and paste on the desktop.

We built a Web for us, geeks, but we built it in a social way. We are discoverable. We live online. This lends itself well to smaller, narrower, tailored services like Dopplr for travel, Fire Eagle for location, AMEE for carbon emissions. But everything should integrate even better. Why can’t clicking “done” in Basecamp generate an invoice in Blinksale, for example? If they were desktop applications, we’d script something. Simon interjects that if they were open source, we would modify them. That’s what Gavin is agitating for. The boundaries are blurring. We have lots of applications both on and off the Web but they are all connected by the internet. People don’t care that much these days about what application they are currently using or who built it; it’s the experience that’s important.

Here’s something Gavin wants somebody to make: identity brokerage. This builds on his id6 idea from last year. That was about contact portability. Now he wants something to deal with all the invitations he gets from social networks. Now that we’ve got OpenID, why can’t we automate the acceptance or rejection of friend requests?

We are heading towards a distributed future. DiSo points the way. But let’s learn from RSS and make the technology invisible. We need to make sense of the Web for the people coming after us. That may sound elitist but Gavin doesn’t mean it to be.

Kellan asks if we can just change the schema. Gavin says we can but we should change it gradually.

Step-by-step reassurance is important. Get the details right. Magnolia is starting to get this right with its sign-in form which lists the services you can sign in through, rather than the technology (OpenID).

We are sharing content, not making friends. Dopplr gets this right by never using the word friend. Instead it lists people with whom you share your trips. The Pownce approach of creating sub-groups from a master list is close to how people really work.

Scaffolding and gradual change are important. As a child, we are told two apples plus three apples is five apples. Later we learn that two plus three equals five; the scaffolding is removed. We must first build the scaffolding but we can remove it later.

Gavin wraps up and even though the time is up, the discussion kicks off. Points and counterpoints are flying thick and fast. The main thrust of the discussion is whether we need to teach the people of the hinternet about they way things work or to hide all that stuff from them. There’s a feast of food for thought here.

Ni Hao, Monde: Connecting Communities Across Cultural and Linguistic Boundaries

Simon Batistoni is responsible for Flickr’s internationalisation and he’s going to share his knowledge here at XTech. Flickr is in a lucky position; its core content is pictures. Pictures of cute kittens are relatively universal.

We, especially the people at this conference, are becoming hyperconnected with lots of different ways of communicating. But we tend to forget that there is this brick wall that many of us never run into; we are divided.

In the beginning was the Babelfish. When some people think of translation, this is what they think of. We’ve all played the round-trip translation game, right? Oh my, that’s a tasty salad becomes that’s my OH — this one is insalata of tasty pleasure. It’s funny but you can actually trace the moment where tasty becomes of tasty pleasure (it’s de beun gusto in Spanish). Language is subtle.

It cannot really be encoded into rules. It evolves over time. Even 20 years ago if you came into the office and said I had a good weekend surfing it may have meant something different. Human beings can parse and disambiguate very well but machines can’t.

Apocraphyl story alert. In 1945, the terms for Japanese surrender were drawn up using a word which was intended to convey no comment. But the Japanese news agency interpreted this as we ignore and reported it as such. When this was picked up by the Allies, they interpreted this as a rejection of the terms of surrender and so an atomic bomb was dropped on Hiroshima.

Simon plugs The Language Instinct, that excellent Steven Pinker book. Pinker nails the idea of ungrammatically but it’s essentially a gut instinct. This is why reading machine translations is uncomfortable. Luckily we have access to language processors that are far better than machines …human brains.

Here’s an example from Flickr’s groups feature. The goal was to provide a simple interface for group members to translate their own content: titles and descriptions. A group about abandoned trains and railways was originally Spanish but a week after internationalisation, the group exploded in size.

Here’s another example: 43 Things. The units of content are nice and succinct; visit Paris, fall in love, etc. So when you provide an interface for people to translate these granular bits, the whole thing snowballs.

Dopplr is another example. They have a “tips” feature. That unit of content is nice and small and so it’s relatively easy to internationalise. Because Dopplr is location-based, you could bubble up local knowledge.

So look out for some discrete chunks of content that you can allow the community to translate. But there’s no magic recipe because each site is different.

Google Translate is the great white hope of translation — a mixture of machine analysis on human translations. The interface allows you to see the original text and offers you the opportunity to correct translations. So it’s self-correcting by encouraging human intervention. If it actually works, it will be great.

Wait, they don’t love you like I love you… Maaa-aa-a-aa-aa-a-aa-aaps.

Maps are awesome, says Simon. Flickr places, created by Kellan who is sitting in front of me, is a great example of exposing the size and variation of the world. It’s kind of like the Dopplr Raumzeitgeist map. Both give you an exciting sense of the larger, international community that you are a part of. They open our minds. Twittervision is much the same; just look at this amazing multicultural world we live in.

Maps are one form of international communication. Gestures are similar. We can order beers in a foreign country by pointing. Careful about what assumptions you make about gestures though. The thumbs up gesture means something different in Corsica. There are perhaps six universal facial expressions. The game Phantasy Star Online allowed users to communicate using a limited range of facial expressions. You could also construct very basic sentences by using drop downs of verbs and nouns.

Simon says he just wants to provide a toolbox of things that we can think about.

Road signs are quite universal. The roots of this communication stretches back years. In a way, they have rudimentary verbs: yellow triangles (“be careful of”), red circles (“don’t”).

Star ratings have become quite ubiquitous. Music is universal so why does Apple segment the star rating portion of reviews between different nationality stores? People they come together, people they fall apart, no one can stop us now ‘cause we are all made of stars.

To summarise:

We don’t have phasers and transporters and we certainly don’t have universal translators. It’s AI hard.
Think about the little bits of textual content that you can break down and translate.

Grab the slides of this talk at hitherto.net/talks.

It’s question time and I ask whether there’s a danger in internationalisation of thinking about language in a binary way. Most people don’t have a single language, they have a hierarchy of languages that they speak to a greater or lesser degree of fluency. Why not allow people to set a preference of language hierarchy? Simon says that Flickr don’t allow that kind of preference setting but they do something simpler; so if you are on a group page and it isn’t available in your language of choice, it will default to the language of that group. Also, Kellan points out, there’s a link at the bottom of each page to take you to different language versions. Crucially, that link will take you to a different version of the current page you’re on, not take you back to the front of the site. Some sites get this wrong and it really pisses Jessica off.

Someone asks about the percentage of users who are from a non-English speaking country but who speak English. I jump in to warn of thinking about speaking English in such a binary way — there are different levels of fluency. Simon also warns about taking a culturally imperialist attitude to developing applications.

There are more questions but I’m too busy getting involved with the discussion to write everything down here. Great talk; great discussion.

AMEE — The World’s Energy Meter

Gavin Starks, the man behind AMEE — the Avoiding Mass Exctinction Engine — is back at XTech this year. The service was launched at XTech in Paris last year.

Data providers have been added in the last year, including the Irish government. There’s also a bunch of new sources that are data mined. There are plenty of consumers too, including Google and change.ie from the Irish government. It’s cool to have countries on board. Here’s Edenbee. Yay! Gavin really likes it. The Carbon Account is another great one. But Gavin’s favourite is probably the Dopplr integration.

AMEE is tracking 850,000 carbon footprints now. That’s all happened in 12 months. There are over 500 organisations and individuals using AMEE. That’s over 500 calls to Gavin’s mobile number which he made available on the website.

Gavin describes AMEE as a neutral aggregation platfrom. The data is provided from agencies that can license or syndicate their data. This data is then used by developers who can build products and services on top of it. So AMEE is, by design, commercially enabling to 3rd parties.

Gavin says they are trying to catalyse change. They want to create a standard for measuring carbon emissions. To a large extent, they’ve achieved that. Even though there are lots of different data providers, AMEE provides a single point of measurement. The vision is to measure the CO² emissions of everything. That’s a non-trivial task so they’ve concentrated solely on doing that one thing.

AMEE has profiles for your carbon identity and your energy identity but both are deliberately kept separate. The algorithms for energy measurement might change (for example, how carbon emissions from flights are measured) but your carbon identity should remain constant. This separation allows for real data portability e.g. integrating your Dopplr account with your Edenbee account. AMEE takes care of tracking energy but they don’t care about who you are: everything is anonymous and abstracted. It’s up to you as a developer of social apps to take care of establishing identity. There’s a lot of potential here, kind of like Fire Eagle; a service that concentrates on doing one single thing really well.

They’re partnering on tracking technology. For example, tracking Blackberries and using the speed of travel to guess what mode of transport you are using at any one time.

AMEE has a RESTful API that returns XML and JSON. They also provide more complicated, Enterprise-y stuff to please the Java people.

There are different pricing models. Media companies pay more than other companies. Charities pay nothing.

What’s next? AMEE version 2; making it easier for people to engage with the service. In the long term, let’s go after all the products that exist. Someone has that data in a spreadsheet somewhere — let us get at it.

Why do all this? Why do you think? Does anybody really need to be convinced about climate change at this stage? There will always be debate in science but even senior conservative scientists are coming out and saying that they may have underestimated the impact of carbon emissions. If a level of 450ppm continues long enough (and that’s the level we’re aiming for), that’s a sea rise of up to 75 metres. That’s an exctinction level event. We might well be fucked but as Stephen Fry says:

Doing nothing risk everything and gains comparitively little, doing something risks comparitively little and gains the whole world.

Here’s where AMEE comes in: if we can measure and visualise energy consumption change, that will drive social change. In the long term we will have to completely re-engineer our lifestyles and re-invent the power grid. Shut down power stations, shut down oil platforms, reduce all travel …measure and visualise all of it.

We don’t just need change; we need a systematic redesign of the future. We could start with the political language we use. Instead of using the word “consumer” with its positive connotations, let’s say “waster” which is more accurate.

What will you build? www.amee.cc

Building the Real-time Web

I skipped a lot of the afternoon presentations at XTech to spend some time in the Dublin sunshine. I came back to attend Blaine’s presentation on The Real Time Web only to find that Blaine and Maureen didn’t make it over to Ireland because of visa technicalities. That’s a shame. But Matt is stepping into the breach. He has taken Blaine’s slides and assembled a panel with Seth and Rabble from Fire Eagle to answer the questions raised by Blaine.

Matt poses the first question …what is the real-time Web? Rabble says that HTTP lets us load data but isn’t so good at realtime two-way interaction. Seth concurs. With HTTP you have to poll “has anything changed? has anything changed? has anything changed?” As Rabble says, this doesn’t scale very well. With Jabber there is only one connection request and one response and that response is sent when something has changed.

What’s wrong with HTTP, Comet or SMTP? Seth says that SMTP has no verifiable authentication and there’s no consistent API. Rabble says that pinging with HTTP has timeout problems. Seth says that Comet is a nice hack (or family of hacks, as Matt says) but it doesn’t scale. Bollocks! says Simon, Meebo!

Jabber has a lot of confusing documentation. What’s the state of play for the modern programmer? Rabble dives in. Jabber is just streaming XML documents and the specs tell you what to expect in that stream. Jabber addressing looks a lot like emails. Seth explains the federation aspect. Jabber servers authenticate with each other. The payload, like with email, is a message, explains Rabble. Apart from the basic body of the message, you can include other things like attachments. Seth points out that you can get presence information like whether a mobile device is on roaming. You can subscribe to Jabber nodes so that you receive notifications of change from that node. Matt makes the observation that at this point we’re talking about a lot more than just delivering documents.

So we can send and receive messages from either end, says Matt. There’s a sense of a “roster”: end points that you can send and receive data from. That sounds fine for IM but what happens when you apply this to applications? Twitter and Dopplr can both be operated from a chat client. Matt says that this is a great way to structure an API.

Rabble says that everything old is new again. Twitter, the poster child of the new Web, is applying the concept of IRC channels.

Matt asks Rabble to explain how this works with Fire Eagle. Rabble says that Fire Eagle is a fairly simple app but even a simple HTTP client will ping it a lot because they want to get updated location data quickly. With a subscribable end point that represents a user, you get a relatively real-time update of someone’s location.

What about state? The persistence of state in IM is what allows conversations. What are the gotchas of dealing with state?

Well, says Seth, you don’t have a consistent API. Rabble says there is SOAP over XMPP …the room chuckles. The biggest gotcha, says Seth, is XMPP’s heritage as a chat server. You will have a lot of connections.

Chat clients are good interfaces for humans. Twitter goes further and sends back the human-readable message but also a machine-readable description of the message. Are there design challenges in building this kind of thing?

Rabble says the first thing is to always include a body, the human-readable message. Then you can overload that with plenty of data formats; all the usual suspects. GEO Atom in FIre Eagle, for example.

Matt asks them to explain PubSup. It’s Publish/Subscribe, says Seth. Rather than a one-to-one interaction, you send a PubSub request to a particular node and then get back a stream of updates. In Twitter, for example, you can get a stream of the public timeline by subscribing to a node. Rabble mentions Ralph and Blaine’s Twitter/Jaiku bridge that they hacked together during one night at Social Graph Foo Camp. Seth says you can also filter the streams. Matt points out that this is what Tweetscan does now. They used to ping a lot but know they just subscribe. Rabble wonders if we can handle all of this activity. There’s just so much stuff coming back. With RSS we have tricks like “last modified” timestamps and etags but it would be so much easier if every blog had a subscribable node.

We welcome to the stage a special guest, it’s Ralph. Matt introduces the story of the all-night hackathon from Social Graph Foo Camp and asks Ralph to tell us what happened there. Ralph had two chat windows open: one for the Twitterbot, one for the Jaikubot. They hacked and hacked all night. Data was flowing from the US (Twitter) to Europe (Jaiku) and in the other direction. At 7:10am in Sebastapol one chat window went “ping!” and one and a half seconds later another chat window went “pong!”

Matt asks Ralph to stay on the panel for questions. The questions come thick and fast from Dan, Dave, Simon and Gavin. The answers come even faster. I can’t keep up with liveblogging this — always a good sign. You kind of had to be there.

Browsers on the Move: The Year in Review, the Year Ahead

Michael Smith from the W3C is talking about the changing browser landscape. Just in the last year we’ve had the release of the iPhone with WebKit, the Beta of IE8 and just yesterday, Opera’s Dragonfly technology.

In the mobile browser space, the great thing about the iPhone is that it has the same WebKit engine as Safari on the desktop. Opera Mini 4 — the proxy browser — is getting a lot better too. It even supports CSS3 selectors. Mozilla, having previously expressed no interest in Mobile, have started a project called Fennec. Then there’s Android which will use WebKit as the rendering engine for browsers.

Looking at the DOM/CSS space, there have been some interesting developments. The discovery of IE’s interesting quirk with generated elements was one. The support by other browsers for lots of CSS3 selectors is quite exciting. The selectors API is gaining ground. Michael says he’s not fond of using CSS syntax for DOM traversal but he’s definitely in the minority — this is a godsend.

Now for an interlude to look at Web developer tools in browsers. Firebug really started a trend. IE8 has copied it almost verbatim. WekKit has its pretty Web Inspector and now Opera has Dragonfly. Dragonfly has a remote debugging feature, like Fiddler, which Michael is very excited by.

On the Ajax front, things are looking up in HTML5 for cross-site requests. He pimps Anne’s talk tomorrow. Then there’s Doug’s proposal for JSONrequest. Browser vendors haven’t shown too much interest in that. Meanwhile, Microsoft comes out with XDR, its own implementation that nobody is happy about. The other exciting thing in HTML5 is the offline storage stuff which works like Google Gears.

XSLT is supported very well on the client side now. But apart from Michael, who cares? Give me the selectors API any day. SVG is still strong in Mozilla and Opera.

ARIA is the one I’m happiest with. It’s supported across the board now.

The HTML5 video element is supported in WebKit nightlies and in Mozilla. There’s an experimental Opera build and, of course, no IE support. The biggest issues seem to be around licensing and deciding on a royalty-free format for video. Sun has some ideas for that.

Ah, here comes the version targetting “innovation”. The good news, as Micheal notes, is that is defaults now to the latest version. Damn straight!

Here are some Acid 3 measurements so that we can figure out which browser has the biggest willy.

Finally, look at all the CSS innovations that Dave Hyatt is putting in WebKit (and correctly prefixing with webkit-).

Looking to the year ahead, you’ll see more CSS innovations and HTML5 movement. Michael is rushing through this part because he’s running out of time. In fact, he’s out of time.

The slides are up at w3.org/2008/Talks/05-07-smith-xtech/slides.pdf.

Using socially-authored content to provide new routes through existing content archives

Rob Lee is talking about making the most of user-authored (or user-generated) content. In other words, content written by you, Time’s person of the year.

Wikipedia is the poster child. It’s got lots of WWILFing: What Was I Looking For? (as illustrated by XKCD). Here’s a graph entitled Mapping the distraction that is Wikipedia generated from a greasemonkey script that tracks link paths.

Rob works for Rattle Research who were commissioned by the BBC Innovation Labs to do some research into bringing WWILFing to the BBC archive.

Grab the first ten internal links from any Wikipedia article and you will get ten terms that really define that subject matter. The external links at the end of an article provide interesting departure points. How could this be harnessed for BBC news articles? Categories are a bit flat. Semantic analysis is better but it takes a lot of time and resources to generate that for something as large as the BBC archives. Yahoo’s Term Extractor API is a handy shortcut. The terms extracted by the API can be related to pages on Wikipedia.

Look at this news story on organic food sales. The “see also” links point to related stories on organic food but don’t encourage WWILFing. The BBC is a bit of an ivory tower: it has lots of content that it can link to internally but it doesn’t spread out into the rest of the Web very well.

How do you decide what would be interesting terms to link off with? How do you define “interesting”? You could use Google page rank or Technorati buzz for the external pages to decide if they are considered “interesting”. But you still need contextual relevance. That’s where del.icio.us comes in. If extracted terms match well to tags for a URL, there’s a good chance it’s relevant (and del.icio.us also provides information on how many people have bookmarked a URL).

So that’s what they did. They called it “muddy boots” because it would create dirty footprints across the pristine content of the BBC.

The “muddy boots” links for the organic food article links off to articles on other news sites that are genuinely interesting for this subject matter.

Here’s another story, this one from last week about the dissection of a giant squid. In this case, the journalist has provided very good metadata. The result is that there’s some overlap between the “see also” links and the “muddy boots” links.

But there are problems. An article on Apple computing brings up a “muddy boots” link to an article on apples, the fruit. Disambiguation is hard. There are also performance problems if you are relying on an external API like del.icio.us’s. Also, try to make sure you recommend outside links that are written in the same language as the originating article.

Muddy boots was just one example of using some parts of the commons (Wikipedia and del.icio.us). There are plenty of others out there like Magnolia, for example.

But back to disambiguation, the big problem. Maybe the Semantic Web can help. Sources like Freebase and DBpedia add more semantic data to Wikipedia. They also pull in data from Geonames and MusicBrainz. DBpedia extracts the disambiguation data (for example, on the term “Apple”). Compare terms from disambiguation candidates to your extracted terms and see which page has the highest correlation.

But why stop there? Why not allow routes back into our content? For example, having used DBpedia to determine that your article is about Apple, the computer company, you could an hCard for the Apple company to that article.

If you’re worried about the accuracy of commons data, you can stop. It looks like Wikipedia is more accurate than traditional encyclopedias. It has authority, a formal review process and other tools to promote accuracy. There are also third-party services that will mark revisions of Wikipedia articles as being particularly good and accurate.

There’s some great commons data out there. Use it.

Rob is done. That was a great talk and now there’s time for some questions.

Brian asks if they looked into tying in non-text content. In short, no. But that was mostly for time and cost reasons.

Another question, this one about the automation of the process. Is there still room for journalists to spend a few minutes on disambiguating stories? Yes, definitely.

Gavin asks about data as journalism. Rob says that this particularly relevant for breaking news.

Ian’s got a question. Journalists don’t have much time to add metadata. What can be done to make it easier — it is an interface issue? Rob says we can try to automate as much as possible to keep the time required to a minimum. But yes, building things into the BBC CMS would make a big difference.

Someone questions the wisdom of pushing people out to external sources. Doesn’t the BBC want to keep people on their site? In short, no. By providing good external references, people will keep coming back to you. The BBC understand this.

David Recordon’s XTech keynote

I’m at the first non-workshop day at XTech 2008 in Dublin’s fair city. David Recordon is delivering the second of the morning keynotes. I missed most of Simon Wardley’s opening salvo in favour of having breakfast — sorry Simon. David, unlike Simon, will not be using Comic Sans and nor will he have any foxes, the natural enemy of the duck.

Let’s take a look at how things have evolved in recent years.

Open is hip. Open can get you funded. Just look at MySQL.

Social is hip. But the sheer number of social web apps doesn’t scale. It’s frustrating repeating who you know over and over again. Facebook apps didn’t have to deal with that. The Facebook App platform was something of a game changer.

Networked devices are hip, from iPhones to Virgin America planes.

All of these things have common needs. We don’t just need portability, we need interoperability. We have some good formats for that:

Atom and to a lesser extent, RSS.
Microformats (David incorrectly states that Microsoft have added a microformat to IE8).
RDF and the Semantic Web.

We need a way to share abstract information …securely. The password anti-pattern, for example, is wrong, wrong, wrong. OAuth aims to solve this problem. Here’s a demo of David authorising FireEagle to have access to his location. He plugs Kellan’s OAuth session which is on tomorrow.

We need a way to communicate with people. Email just sucks. The IM wars were harmful. Jabber (XMPP) emerged as a leader because it tackled the interoperability problem. Even AOL are getting into it.

We need to know who someone is. Naturally, OpenID gets bigged up here. It’s been a busy year for OpenID. The number of relying parties has grown exponentially. The ability to get that done — to grow from a few people on a mailing list to having companies like Yahoo and Microsoft and AOL supporting that standard — that’s really something new that you wouldn’t have seen on the Web a few years ago.

But people don’t exist at just one place. The XFN microformat (using rel=”me”) is great for linking up multiple URLs. David demos his own website which has a bunch of rel=”me” links in his sidebar: “this isn’t just some profile, this is my profile.” He plugs my talk — nice one!

We need to know who someone knows. The traditional way of storing relationships has been address books and social networks but this is beginning to change. David demos the Google Social Graph API, plugging in his own URL. Then he uses Tantek’s URL to show that it works for anybody. The API has problems if you leave off the trailing slash, apparently.

We need to know what people are doing. Twitter, Fire Eagle, Facebook and Open Social all deal with realtime activity in some way. They’re battling things out in this space. Standards haven’t emerged yet. Watch this space. If Google ties Open Social to its App Engine we could see some interesting activity emerge.

Let’s look at where things stand right now. Who’s getting things right and who’s getting things wrong?

When you create an API, use existing standards. Don’t reinvent vcard or hCard, just use what people already publish.

Good APIs enable mashups. Google is a leader here with its mapping API.

Fire Eagle is an interesting one. You will hardly ever visit the site. Fire Eagle doesn’t care who your friends are. It just deals with one task: where are you. Here’s a demo of Fireball.

It’ll be interesting to see how these things play out in the next few years where you have services that don’t really involve a website very much at all. Just look at Twitter: people bitch and moan about the features they think it should support but the API allows you to build on top of Twitter to add the features you want. Tweetscan and Quotably are good examples of this in action.

David shows a Facebook app he built (yes, you heard right, a Facebook app). But this app allows you to publish from Facebook onto other services.

Plaxo Pulse runs your OpenID URL through the Google Social Graph API to see who you already know (I must remember this for my talk tomorrow).

The DiSo project is one to watch: figuring out how to handle activity, relationships and permissions in a distributed environment.

And that’s all, folks.