Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML to Atom: title element has encoded HTML #629

Closed
gRegorLove opened this issue Nov 9, 2023 · 4 comments
Closed

HTML to Atom: title element has encoded HTML #629

gRegorLove opened this issue Nov 9, 2023 · 4 comments

Comments

@gRegorLove
Copy link

Minor, but I noticed with this feed that one of my notes has encoded HTML getting into the <title>:

<title>I like this CSS image reset after watching Kevin Powell’s walkthrough. &lt;...</title>

The original post uses e-content and I'm guessing Granary is using the parsed html attribute for the title. A possible solution might be to use value attribute instead, for posts that don't have a name of course:

    "content": [
        {
            "html": "<p>I like this CSS image reset after watching <a href=\"https://www.youtube.com/watch?v=345V2MU3E_w\">Kevin Powell&#x2019;s walkthrough</a>.</p>\n\n<p>Also intrigued by the post he linked, &#x201C;<a class=\"h-cite\" href=\"https://csswizardry.com/2023/09/the-ultimate-lqip-lcp-technique/\">The Ultimate Low-Quality Image Placeholder Technique</a>.&#x201D;</p>",
            "value": "I like this CSS image reset after watching Kevin Powell\u2019s walkthrough.\nAlso intrigued by the post he linked, \u201cThe Ultimate Low-Quality Image Placeholder Technique.\u201d",
            "lang": "en"
        }
    ],
@gRegorLove
Copy link
Author

Oh I just realized it's not the first youtube <a href> getting encoded, but some later HTML element. I'm not sure then. Let me know if something is off with my HTML that's causing it then.

@snarfed
Copy link
Owner

snarfed commented Nov 9, 2023

Huh! Thanks for the nudge. This is an odd one! It doesn't reproduce locally for me at all, with either the /stream/ feed or the specific post; both have <title>I like this CSS image reset after watching Kevin Powell’s walkthrough. </title>, no &lt;.... I do see it on prod https://granary.io/ with both though. Hrm.

@snarfed
Copy link
Owner

snarfed commented Jan 22, 2024

Looked at this again, I'm now able to reproduce it locally. Output is still a bit different, I'm guessing that's because I'm using a different HTML parser locally vs in prod.

I suspect we're generating title from HTML content, then ellipsizing, and we end up with just the opening < of a tag, which we then entity-encode.

@snarfed
Copy link
Owner

snarfed commented Jan 22, 2024

Fixed!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants