Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclear purpose for tag visitors returning true or false in Optimization Detective #1342

Open
westonruter opened this issue Jul 10, 2024 · 7 comments
Assignees
Labels
[Plugin] Embed Optimizer Issues for the Embed Optimizer plugin (formerly Auto Sizes) [Plugin] Enhanced Responsive Images Issues for the Enhanced Responsive Images plugin [Plugin] Image Prioritizer Issues for the Image Prioritizer plugin (dependent on Optimization Detective) [Plugin] Optimization Detective Issues for the Optimization Detective plugin [Type] Enhancement A suggestion for improvement of an existing feature

Comments

@westonruter
Copy link
Member

When registering a tag visitor in Optimization Detective, the callable must return a boolean. When the boolean returns true, then this is an indication that the tag was indeed "visited" or rather that the visitor is relevant to the current tag. For example:

add_action( 'od_register_tag_visitors', static function ( OD_Tag_Visitor_Registry $registry ) {
	$registry->register( 'async_images', static function ( OD_Tag_Visitor_Context $context ) {
		if ( $context->processor->get_tag() !== 'IMG' ) {
			return false;
		}
		if ( $context->processor->get_attribute( 'decoding' ) !== 'async' ) {
			$context->processor->set_attribute( 'decoding', 'async' );
		}
		return true;
	} );
} );

Clearly this visitor should return false if the current tag is not an IMG. But should it still also return false if it already has a decoding=async attribute?

In reality, when a visitor returns true this is a signal for whether or not the tag should be captured to be stored in the URL Metrics:

foreach ( $visitors as $visitor ) {
$did_visit = $visitor( $tag_visitor_context ) || $did_visit;
// Since the visitor may have traversed HTML tags, we need to make sure we go back to this tag so that
// in the next iteration any relevant tag visitors may apply, in addition to properly setting the data-od-xpath
// on this tag below.
$processor->seek( $current_tag_bookmark );
}
$processor->release_bookmark( $current_tag_bookmark );
if ( $did_visit && $needs_detection ) {
$processor->set_meta_attribute( 'xpath', $processor->get_xpath() );
}

So when a visitor returns true then this means that $did_visit will be true and if the URL Metrics for current URL are not complete (if $needs_detection is false), then the effect is that it will add the data-od-xpath attribute to the tag. What does this do? It means that when detect.js runs, it will include the element among all of the elements for which it captures metrics, including via IntersectionObserver and web-vitals.js:

const breadcrumbedElements = doc.body.querySelectorAll( '[data-od-xpath]' );

for ( const element of breadcrumbedElementsMap.keys() ) {
intersectionObserver.observe( element );
}

However, in the case above for a tag visitor that just makes sure that all img tags have decoding=async, there is no need for this tag to ever be captured in the URL metrics because the URL metrics do not impact how the applied optimization in any way. This is in contrast with what Image Prioritizer does with applying lazy-loading and fetchpriority=high, where it depends on the URL Metrics to know what optimization to perform. Embed Optimizer also depends on URL Metrics to know whether an embed should be lazy-loaded or instead should have preconnect links added. Conversely, the auto-sizes integration in #1322 doesn't depend on the URL Metrics and so at present its tag visitor could always return false.

In short, I think the explanation of what the boolean means being returned from a tag visitor. It's not whether the tag was visited, but rather whether or not the tag needs to be tracked in the URL Metrics.

It could also be possible to automatically detect whether or not a tag visitor depends on URL metrics by detecting whether or not it accessed the $context->url_metrics_group_collection. If so, then this would be a signal that indeed the tag should be tracked in the URL metrics, and the return value of the tag visitor could be eliminated entirely. However, this could be a bit too sophisticated.

@westonruter westonruter added [Type] Enhancement A suggestion for improvement of an existing feature [Plugin] Optimization Detective Issues for the Optimization Detective plugin [Plugin] Embed Optimizer Issues for the Embed Optimizer plugin (formerly Auto Sizes) [Plugin] Enhanced Responsive Images Issues for the Enhanced Responsive Images plugin [Plugin] Image Prioritizer Issues for the Image Prioritizer plugin (dependent on Optimization Detective) labels Jul 10, 2024
@westonruter westonruter changed the title Clarify purpose for tag visitors to return true or false in Optimization Detective Jul 10, 2024
@westonruter westonruter changed the title Purpose unclear for tag visitors returning true or false in Optimization Detective Jul 10, 2024
@adamsilverstein
Copy link
Member

Clearly this visitor should return false if the current tag is not an IMG. But should it still also return false if it already has a decoding=async attribute?

All core output images will have decoding=async by default.

In short, I think the explanation of what the boolean means being returned from a tag visitor. It's not whether the tag was visited, but rather whether or not the tag needs to be tracked in the URL Metrics.

That makes sense.

@westonruter
Copy link
Member Author

All core output images will have decoding=async by default.

Yes, this is just an example.

@joemcgill
Copy link
Member

joemcgill commented Jul 11, 2024

In short, I think the explanation of what the boolean means being returned from a tag visitor. It's not whether the tag was visited, but rather whether or not the tag needs to be tracked in the URL Metrics.

What determines whether the tag needs to be tracked in the URL Metrics? If any modifications have been made by the visitor at all, or are there specific types of changes that need to return true?

In other words, would we want to return true in your example if decoding=async was added by the visitor?

@westonruter
Copy link
Member Author

The tag would need to be tracked in URL metrics if the tag visitor needs to obtain any metrics for the element in the URL metrics. In the example case here with decoding=async, there's no need to return true because it doesn't need to be captured in URL metrics since the visitor doesn't apply any conditional optimizations. This is in contrast, for example, with the Embed Optimizer which does conditional optimizations based on whether the embed is in the initial viewport:

public function __invoke( OD_Tag_Visitor_Context $context ): bool {
$processor = $context->processor;
if ( ! (
'FIGURE' === $processor->get_tag()
&&
$processor->has_class( 'wp-block-embed' )
) ) {
return false;
}
$max_intersection_ratio = $context->url_metrics_group_collection->get_element_max_intersection_ratio( $processor->get_xpath() );
if ( $max_intersection_ratio > 0 ) {
/*
* The following embeds have been chosen for optimization due to their relative popularity among all embed types.
* See <https://colab.sandbox.google.com/drive/1nSpg3qoCLY-cBTV2zOUkgUCU7R7X2f_R?resourcekey=0-MgT7Ur0pT__vw-5_AHjgWQ#scrollTo=utZv59sXzXvS>.
* The list of hosts being preconnected to was obtained by inserting an embed into a post and then looking
* at the network log on the frontend as the embed renders. Each should include the host of the iframe src
* as well as URLs for assets used by the embed, _if_ the URL looks like it is not geotargeted (e.g. '-us')
* or load-balanced (e.g. 's0.example.com'). For the load balancing case, attempt to load the asset by
* incrementing the number appearing in the subdomain (e.g. s1.example.com). If the asset still loads, then
* it is a likely case of a load balancing domain name which cannot be safely preconnected since it could
* not end up being the load balanced domain used for the embed. Lastly, these domains are only for the URLs
* for GET requests, as POST requests are not likely to be part of the critical rendering path.
*/
$preconnect_hrefs = array();
if ( $processor->has_class( 'wp-block-embed-youtube' ) ) {
$preconnect_hrefs[] = 'https://www.youtube.com';
$preconnect_hrefs[] = 'https://i.ytimg.com';
} elseif ( $processor->has_class( 'wp-block-embed-twitter' ) ) {
$preconnect_hrefs[] = 'https://syndication.twitter.com';
$preconnect_hrefs[] = 'https://pbs.twimg.com';
} elseif ( $processor->has_class( 'wp-block-embed-vimeo' ) ) {
$preconnect_hrefs[] = 'https://player.vimeo.com';
$preconnect_hrefs[] = 'https://f.vimeocdn.com';
$preconnect_hrefs[] = 'https://i.vimeocdn.com';
} elseif ( $processor->has_class( 'wp-block-embed-spotify' ) ) {
$preconnect_hrefs[] = 'https://apresolve.spotify.com';
$preconnect_hrefs[] = 'https://embed-cdn.spotifycdn.com';
$preconnect_hrefs[] = 'https://encore.scdn.co';
$preconnect_hrefs[] = 'https://i.scdn.co';
} elseif ( $processor->has_class( 'wp-block-embed-videopress' ) || $processor->has_class( 'wp-block-embed-wordpress-tv' ) ) {
$preconnect_hrefs[] = 'https://video.wordpress.com';
$preconnect_hrefs[] = 'https://public-api.wordpress.com';
$preconnect_hrefs[] = 'https://videos.files.wordpress.com';
$preconnect_hrefs[] = 'https://v0.wordpress.com'; // This does not appear to be a load-balanced domain since v1.wordpress.com is not valid.
} elseif ( $processor->has_class( 'wp-block-embed-instagram' ) ) {
$preconnect_hrefs[] = 'https://www.instagram.com';
$preconnect_hrefs[] = 'https://static.cdninstagram.com';
$preconnect_hrefs[] = 'https://scontent.cdninstagram.com';
} elseif ( $processor->has_class( 'wp-block-embed-tiktok' ) ) {
$preconnect_hrefs[] = 'https://www.tiktok.com';
// Note: The other domains used for TikTok embeds include https://lf16-tiktok-web.tiktokcdn-us.com,
// https://lf16-cdn-tos.tiktokcdn-us.com, and https://lf16-tiktok-common.tiktokcdn-us.com among others
// which either appear to be geo-targeted ('-us') _or_ load-balanced ('lf16'). So these are not added
// to the preconnected hosts.
} elseif ( $processor->has_class( 'wp-block-embed-amazon' ) ) {
$preconnect_hrefs[] = 'https://read.amazon.com';
$preconnect_hrefs[] = 'https://m.media-amazon.com';
} elseif ( $processor->has_class( 'wp-block-embed-soundcloud' ) ) {
$preconnect_hrefs[] = 'https://w.soundcloud.com';
$preconnect_hrefs[] = 'https://widget.sndcdn.com';
// Note: There is also https://i1.sndcdn.com which is for the album art, but the '1' indicates it may be geotargeted/load-balanced.
} elseif ( $processor->has_class( 'wp-block-embed-pinterest' ) ) {
$preconnect_hrefs[] = 'https://assets.pinterest.com';
$preconnect_hrefs[] = 'https://widgets.pinterest.com';
$preconnect_hrefs[] = 'https://i.pinimg.com';
}
foreach ( $preconnect_hrefs as $preconnect_href ) {
$context->link_collection->add_link(
array(
'rel' => 'preconnect',
'href' => $preconnect_href,
)
);
}
} elseif ( embed_optimizer_update_markup( $processor, false ) && ! $this->added_lazy_script ) {
$processor->append_body_html( wp_get_inline_script_tag( embed_optimizer_get_lazy_load_script(), array( 'type' => 'module' ) ) );
$this->added_lazy_script = true;
}
return true;
}

@westonruter
Copy link
Member Author

So basically, if a tag visitor accesses $context->url_metrics_group_collection then it should return true.

@westonruter
Copy link
Member Author

So that's why I mentioned this:

It could also be possible to automatically detect whether or not a tag visitor depends on URL metrics by detecting whether or not it accessed the $context->url_metrics_group_collection. If so, then this would be a signal that indeed the tag should be tracked in the URL metrics, and the return value of the tag visitor could be eliminated entirely. However, this could be a bit too sophisticated.

So we could detect whether a tag visitor accesses $context->url_metrics_group_collection, or we could have an explicit method on the $context object like record_element(), track_tag(), observe(), etc. which would do this explicitly.

@joemcgill
Copy link
Member

I like the idea of making this behavior explicit.

I don't fully understand the intended semantics of the current API, but I wonder if we need to modify the API for registering visitors to include an signal (e.g., property, arg, option, etc) whether it uses URL metrics? That way we could separate the return value indicating whether the visitor optimization was performed from whether the tag needs to be tracked for user metrics.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
[Plugin] Embed Optimizer Issues for the Embed Optimizer plugin (formerly Auto Sizes) [Plugin] Enhanced Responsive Images Issues for the Enhanced Responsive Images plugin [Plugin] Image Prioritizer Issues for the Image Prioritizer plugin (dependent on Optimization Detective) [Plugin] Optimization Detective Issues for the Optimization Detective plugin [Type] Enhancement A suggestion for improvement of an existing feature
3 participants