consider techniques for original post discovery that do not require a backlink/citation #125

kylewm · 2014-04-09T15:48:47Z

There are good reasons for including a link to original content in syndicated copies, but some would still prefer not to include permalinks/citations in POSSEd copies for aesthetic (Twitter length limit) or technical (Instagram's lack of a posting API) reasons. Opening this issue to collect ideas for how Bridgy could discover the original post without an explicit backlink in the syndicated copy.

Briefly discussed on IRC starting here.

Some quick ideas from @snarfed

14:04 i hadn't thought much about how to preserve backfeed. query endpoint, extra "syndicated" webmention param, searching h-feed entries for rel-syndication are all possibilities

snarfed · 2014-04-09T17:23:31Z

thanks for starting the conversation! cc @aaronpk

aaronpk · 2014-04-10T17:11:29Z

It looks like Instagram will not be allowing access to the comments API for anything except brand management or customer service apps, so we will most likely need to do original post discovery another way for Instagram photos.

kylewm · 2014-04-11T14:57:08Z

I hacked together a proof-of-concept for one possible implementation (query endpoint). When doing original post discovery, this would be an additional step:

Grab the replied-to author's homepage from their profile. Luckily this is already present in the activitystreams data.
Exactly like webmention endpoint discovery, fetch their homepage page, extract <link rel="original-post-discovery" href="...">
Construct a URL for the post by adding ?syndication=[URL of syndicated copy] to the discovery endpoint. E.g., http://kylewm.com/original_post_discovery?syndication=https://twitter.com/kyle_wm/status/454464347679371264
I add the constructed URL to the response's inReplyTos and somewhere down the line, Bridgy magically turns it into the real permalink URL.
On my site, the original_post_discovery endpoint translates twitter URL to real post URL. If there is a corresponding original, it returns a 302 redirect to the original's permalink, if not 404. Note: one wrinkle, I've stored permalinks as https URLs, but Bridgy gives me http URLs.

I wonder if Bridgy should throw out discovery URLs that return 404. Right now the effect is that it tries to send mentions to the real permalink redirected links, or to the 404'ing endpoint for orphans.

Here is an example of a tweet without backlink

And here is the modified Bridgy sending links anyway!

The interesting changes are here in my fork of activitystreams kylewm/granary@79e19cd ... I cached the rel endpoint because otherwise it hits it a lot, concurrently, and times out on my server.

Just to be totally clear this is not a proposal; the code is probably in the totally wrong place and I'd want to get approval from microformats people on a proper rel name.

cc @kartikprabhu, @gRegorLove

snarfed · 2014-04-11T15:52:51Z

oh man. so cool!!! can't wait to see discussion of this. (and the eventual pull request!)

looks like you omitted some of the discovery details? looks like step 2 might be truncated.

also, re twitter http vs https, i'd happily accept a PR that switches bridgy and a-u to always generate https twitter URLs. (and facebook and others too?)

finally, related feature request for if/when we merge this: #122

kylewm · 2014-04-12T17:15:12Z

oh interesting, I hadn't thought about the round trip case (bridgy publish -> silo -> bridgy).

for sort of an all-in-one, idiot-proof solution, bridgy could store all the original-post -> syndicated-url's that it's ever published (maybe it already does), and provide an original-post-discovery endpoint of its own. then sites that don't bother with getting the published ID back from bridgy and storing it could just use bridgy's endpoint (the way that some sites use webmention.io for receiving their mentions).

also something Kartik said the other day made me think that people will also want a way to pull in responses to old, pre-indieweb, posts. like e.g., when I started my site in 2013, I imported tweets back to 2009 and have all the original -> syndicated information for them. it'd be cool to do a one-time sweep of all the old stuff.

this is turning into quite a rabbit hole :)

snarfed · 2014-04-12T20:18:15Z

@kylewm, interesting ideas! bridgy publish could definitely store its own url mappings, and we could also definitely add backfill. if you're interested in either, feel free to file issues to capture brainstorming!

kylewm · 2014-04-15T16:26:38Z

I asked for feedback on the #microformats IRC yesterday, but no responses yet http://logs.glob.uno/?c=freenode%23microformats&s=15+Apr+2014&e=15+Apr+2014#c71399

The good news is no one hated the idea enough to smash it! If we end up going this way, I will likely stick with the rel type rel="original-post-discovery" and the endpoint query parameter "syndication=[url]", as I have not thought of anything better in the last few days.

snarfed · 2014-04-15T17:35:44Z

thanks for asking! sounds good to me. i'm all for shipping and iterating in public. after just a bit more discussion - and unit test(s) - i'd happily accept your new original_post_discovery() code.

i'd also love to see more discussion on the alternatives, e.g.:

why not to add this as a new webmention parameter as opposed to a whole new endpoint?
returning 301/404/200 status codes is nice because we can embed query URLs in the webmention target param, but are there any drawbacks?
what's the current state of the art on hiding the original post URL in silo post metadata, for each silo?

etc...

snarfed · 2014-04-15T18:08:02Z

(i'm happy to move this discussion to a wiki page + IRC, too. it's definitely broader than just bridgy.)

kylewm · 2014-04-15T20:49:10Z

Good call. I tried to capture everything so far here http://indiewebcamp.com/posse-post-discovery

snarfed · 2014-04-15T21:55:05Z

thanks @kylewm! closing this issue now that we have the wiki page.

snarfed added maybe labels Apr 9, 2014

snarfed closed this as completed Apr 15, 2014

gijswijs mentioned this issue Apr 15, 2021

Use Bridgy Publish records for original post discovery in backfeed #1029

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

consider techniques for original post discovery that do not require a backlink/citation #125

consider techniques for original post discovery that do not require a backlink/citation #125

kylewm commented Apr 9, 2014

snarfed commented Apr 9, 2014

aaronpk commented Apr 10, 2014

kylewm commented Apr 11, 2014

snarfed commented Apr 11, 2014

kylewm commented Apr 12, 2014

snarfed commented Apr 12, 2014

kylewm commented Apr 15, 2014

snarfed commented Apr 15, 2014

snarfed commented Apr 15, 2014

kylewm commented Apr 15, 2014

snarfed commented Apr 15, 2014

consider techniques for original post discovery that do not require a backlink/citation #125

consider techniques for original post discovery that do not require a backlink/citation #125

Comments

kylewm commented Apr 9, 2014

snarfed commented Apr 9, 2014

aaronpk commented Apr 10, 2014

kylewm commented Apr 11, 2014

snarfed commented Apr 11, 2014

kylewm commented Apr 12, 2014

snarfed commented Apr 12, 2014

kylewm commented Apr 15, 2014

snarfed commented Apr 15, 2014

snarfed commented Apr 15, 2014

kylewm commented Apr 15, 2014

snarfed commented Apr 15, 2014