Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

consider techniques for original post discovery that do not require a backlink/citation #125

Closed
kylewm opened this issue Apr 9, 2014 · 11 comments
Labels

Comments

@kylewm
Copy link
Contributor

kylewm commented Apr 9, 2014

There are good reasons for including a link to original content in syndicated copies, but some would still prefer not to include permalinks/citations in POSSEd copies for aesthetic (Twitter length limit) or technical (Instagram's lack of a posting API) reasons. Opening this issue to collect ideas for how Bridgy could discover the original post without an explicit backlink in the syndicated copy.

Briefly discussed on IRC starting here.

Some quick ideas from @snarfed

14:04 i hadn't thought much about how to preserve backfeed. query endpoint, extra "syndicated" webmention param, searching h-feed entries for rel-syndication are all possibilities

@snarfed
Copy link
Owner

snarfed commented Apr 9, 2014

thanks for starting the conversation! cc @aaronpk

@aaronpk
Copy link
Contributor

aaronpk commented Apr 10, 2014

It looks like Instagram will not be allowing access to the comments API for anything except brand management or customer service apps, so we will most likely need to do original post discovery another way for Instagram photos.

@kylewm
Copy link
Contributor Author

kylewm commented Apr 11, 2014

I hacked together a proof-of-concept for one possible implementation (query endpoint). When doing original post discovery, this would be an additional step:

  1. Grab the replied-to author's homepage from their profile. Luckily this is already present in the activitystreams data.
  2. Exactly like webmention endpoint discovery, fetch their homepage page, extract <link rel="original-post-discovery" href="...">
  3. Construct a URL for the post by adding ?syndication=[URL of syndicated copy] to the discovery endpoint. E.g., http://kylewm.com/original_post_discovery?syndication=https://twitter.com/kyle_wm/status/454464347679371264
  4. I add the constructed URL to the response's inReplyTos and somewhere down the line, Bridgy magically turns it into the real permalink URL.
  5. On my site, the original_post_discovery endpoint translates twitter URL to real post URL. If there is a corresponding original, it returns a 302 redirect to the original's permalink, if not 404. Note: one wrinkle, I've stored permalinks as https URLs, but Bridgy gives me http URLs.

I wonder if Bridgy should throw out discovery URLs that return 404. Right now the effect is that it tries to send mentions to the real permalink redirected links, or to the 404'ing endpoint for orphans.

Here is an example of a tweet without backlink

twitter

And here is the modified Bridgy sending links anyway!

bridgy

The interesting changes are here in my fork of activitystreams kylewm/granary@79e19cd ... I cached the rel endpoint because otherwise it hits it a lot, concurrently, and times out on my server.

Just to be totally clear this is not a proposal; the code is probably in the totally wrong place and I'd want to get approval from microformats people on a proper rel name.

cc @kartikprabhu, @gRegorLove

@snarfed
Copy link
Owner

snarfed commented Apr 11, 2014

oh man. so cool!!! can't wait to see discussion of this. (and the eventual pull request!)

looks like you omitted some of the discovery details? looks like step 2 might be truncated.

also, re twitter http vs https, i'd happily accept a PR that switches bridgy and a-u to always generate https twitter URLs. (and facebook and others too?)

finally, related feature request for if/when we merge this: #122

@kylewm
Copy link
Contributor Author

kylewm commented Apr 12, 2014

oh interesting, I hadn't thought about the round trip case (bridgy publish -> silo -> bridgy).

for sort of an all-in-one, idiot-proof solution, bridgy could store all the original-post -> syndicated-url's that it's ever published (maybe it already does), and provide an original-post-discovery endpoint of its own. then sites that don't bother with getting the published ID back from bridgy and storing it could just use bridgy's endpoint (the way that some sites use webmention.io for receiving their mentions).

also something Kartik said the other day made me think that people will also want a way to pull in responses to old, pre-indieweb, posts. like e.g., when I started my site in 2013, I imported tweets back to 2009 and have all the original -> syndicated information for them. it'd be cool to do a one-time sweep of all the old stuff.

this is turning into quite a rabbit hole :)

@snarfed
Copy link
Owner

snarfed commented Apr 12, 2014

@kylewm, interesting ideas! bridgy publish could definitely store its own url mappings, and we could also definitely add backfill. if you're interested in either, feel free to file issues to capture brainstorming!

@kylewm
Copy link
Contributor Author

kylewm commented Apr 15, 2014

I asked for feedback on the #microformats IRC yesterday, but no responses yet http://logs.glob.uno/?c=freenode%23microformats&s=15+Apr+2014&e=15+Apr+2014#c71399

The good news is no one hated the idea enough to smash it! If we end up going this way, I will likely stick with the rel type rel="original-post-discovery" and the endpoint query parameter "syndication=[url]", as I have not thought of anything better in the last few days.

@snarfed
Copy link
Owner

snarfed commented Apr 15, 2014

thanks for asking! sounds good to me. i'm all for shipping and iterating in public. after just a bit more discussion - and unit test(s) - i'd happily accept your new original_post_discovery() code.

i'd also love to see more discussion on the alternatives, e.g.:

  • why not to add this as a new webmention parameter as opposed to a whole new endpoint?
  • returning 301/404/200 status codes is nice because we can embed query URLs in the webmention target param, but are there any drawbacks?
  • what's the current state of the art on hiding the original post URL in silo post metadata, for each silo?

etc...

@snarfed
Copy link
Owner

snarfed commented Apr 15, 2014

(i'm happy to move this discussion to a wiki page + IRC, too. it's definitely broader than just bridgy.)

@kylewm
Copy link
Contributor Author

kylewm commented Apr 15, 2014

Good call. I tried to capture everything so far here http://indiewebcamp.com/posse-post-discovery

@snarfed
Copy link
Owner

snarfed commented Apr 15, 2014

thanks @kylewm! closing this issue now that we have the wiki page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 participants