Make WordPress Core

Opened 7 years ago

Last modified 7 years ago

#41746 new defect (bug)

oEmbed does not respect canonical provider url parameter

Reported by: dougal's profile dougal Owned by:
Milestone: Awaiting Review Priority: normal
Severity: normal Version: 2.9
Component: Embeds Keywords: has-patch dev-feedback 2nd-opinion needs-unit-tests
Focuses: Cc:

Description

I came across a Twitter URL format that would not embed correctly. Providing that URL to their provider endpoint returned an error. But the original page had a <link> element which already had a working, canonical url parameter in its querystring.

An example URL is:
https://twitter.com/i/web/status/898599373956722688

If you try to fetch oEmbed data for that URL by just adding it as a url querystring parameter on the standard Twitter oEmbed provider URL, it will return an error.

But view source on that page, and you'll see:
<link rel="alternate" type="application/json+oembed" href="https://publish.twitter.com/oembed?url=https://twitter.com/dimensionmedia/status/898599373956722688" title="David Bisset on Twitter: &quot;Agorakit is a web based open source &quot;groupware for citizens initiatives” (which i’ve seen @buddypress used for too) https://t.co/bFPw9ZZWi2 https://t.co/H1REt0QfcO&quot;">

Note that the path of this URL is .../{username}/status/{id}, whereas the original URL was .../i/web/status/{id}.

I've worked out a small patch and method for getting WordPress to use oEmbed discovery to extract and use the canonical URL.

When using wp_oembed_add_provider(), if you leave the provider URL falsey, then WP_oEmbed::get_provider() will use discovery to find it (assuming that you haven't forced discovery = false in $args). Then my patch will pull the url arg from there and use that, instead of the original URL that was passed in to the embed handling.

Later, when the JSON response is being handled, the code will still be able to see whether this is a whitelisted URL pattern, and bypass/perform security filtering such as kses() (see wp_filter_oembed_result()).

Attachments (1)

41746.patch (1.9 KB) - added by dougal 7 years ago.
Use canonical provider url param in discovery, if provided

Download all attachments as: .zip

Change History (10)

@dougal
7 years ago

Use canonical provider url param in discovery, if provided

#1 @dougal
7 years ago

  • Keywords has-patch dev-feedback added

This ticket was mentioned in Slack in #core by dougal. View the logs.


7 years ago

This ticket was mentioned in Slack in #core by dougal. View the logs.


7 years ago

#4 @azaozz
7 years ago

  • Keywords 2nd-opinion added

This ticket was mentioned in Slack in #core by dougal. View the logs.


7 years ago

#6 @dougal
7 years ago

Some more info on this...

I found another class of Twitter URLs that display this behavior -- media links in tweets (e.g., photos or videos):
https://www.twitter.com/{username}/status/{statusid}/photo/{num}
https://www.twitter.com/{username}/status/{statusid}/video/{num}

I ran across these when auto-cross-posting bookmarks from | Pinboard to my blog. Pinboard has a feature to automatically bookmark URLs from tweets that you mark as favorites. When it does this, it can extract these types of URLs that point directly to Twitter resources, but which the Twitter oEmbed provider does not recognize. Twitter apparently expects you to get the canonical provider URL from the original resource's head.

I believe that @azaozz said that he had seen other services that extract URLs from tweets similarly.

#7 @dougal
7 years ago

Also, to clarify, the actual bug here is that in cases where we are already using discovery, and the service is already giving us a URL in the discovered provider URL, we override it with whatever original URL we were spidering in the first place. Clear as mud? Let's break it down:

User: Hey WordPress, I need you to oEmbed URLA.
WP: Hmm, I don't already have a provider URL in my whitelist for that, let's try discovery!
WP: Hey, Service -- what's the oEmbed provider URL for URLA? (<spiders URLA>)
Service: The provider address for URLA is {ProvUrl}?url=URLB
WP: Thanks! But I'll just toss out that querystring, and rebuild it...
WP: The provider URL is now {ProvUrl}?url=URLA! I'll fetch some JSON from there...
Service: oEmbed for URLA? No such thing!
User: Dangit, why won't this URL embed correctly?

#8 @swissspidy
7 years ago

  • Keywords needs-unit-tests added
  • Version changed from trunk to 2.9

This ticket was mentioned in Slack in #core by jeffpaul. View the logs.


7 years ago

Note: See TracTickets for help on using tickets.