Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hashtags with extended alphabet characters aren't recognized as hashtags, AP=>Bluesky #1131

Open
MS-potilas opened this issue Jun 13, 2024 · 2 comments

Comments

@MS-potilas
Copy link

AP Hashtags containing extended alphabet characters, like ä (a with dots) and ö (o with dots), aren't recognized as hashtags. They show as text in Bluesky.

Example:
https://mementomori.social/@rolle/112586679114646311
https://bsky.app/profile/rolle.mementomori.social.ap.brid.gy/post/3kuikyelvzdc2

Here #Äänestäminen was not recognized as hashtag,

@snarfed snarfed added the now label Jun 27, 2024
@snarfed
Copy link
Owner

snarfed commented Jun 27, 2024

Huh, this turned out to be more interesting than I though. Mastodon's AS2 JSON for this post removes the umlauts from those characters in the tag objects. It renders them in content and in the UI:

image

...but the AS2 tag has "name" : "#aanestaminen", no umlauts. Full object below.

Interestingly, if you click on the #Äänestäminen hashtag chip in the UI, it goes to the hashtag page, https://mementomori.social/tags/%C3%84%C3%A4nest%C3%A4minen , which has the umlauts, but they're only for show, evidently they're not in the underlying hashtag index. If you remove them from that URL to get https://mementomori.social/tags/Aanestaminen , it renders the hashtag without them but shows the same results.

{
   "type" : "Note",
   "id" : "https://mementomori.social/users/rolle/statuses/112586679114646311",
   "url" : "https://mementomori.social/@rolle/112586679114646311",
   "attributedTo" : "https://mementomori.social/users/rolle",
   "content" : "<p>Muista käydä äänestämässä! Klo 20 asti aikaa. On tyhmää olla vaikuttamatta, kun siihen demokratiassa on mahdollisuus. Kaikille maailmassa ei tällaista suoda.</p><p><a href=\"https://mementomori.social/tags/Eurovaalit2024\" class=\"mention hashtag\" rel=\"tag\">#<span>Eurovaalit2024</span></a> <a href=\"https://mementomori.social/tags/Eurovaalit\" class=\"mention hashtag\" rel=\"tag\">#<span>Eurovaalit</span></a> <a href=\"https://mementomori.social/tags/%C3%84%C3%A4nest%C3%A4minen\" class=\"mention hashtag\" rel=\"tag\">#<span>Äänestäminen</span></a> <a href=\"https://mementomori.social/tags/Politiikka\" class=\"mention hashtag\" rel=\"tag\">#<span>Politiikka</span></a></p>",
   "tag" : [
      {
         "href" : "https://mementomori.social/tags/eurovaalit2024",
         "name" : "#eurovaalit2024",
         "type" : "Hashtag"
      },
      {
         "href" : "https://mementomori.social/tags/eurovaalit",
         "name" : "#eurovaalit",
         "type" : "Hashtag"
      },
      {
         "href" : "https://mementomori.social/tags/aanestaminen",
         "name" : "#aanestaminen",
         "type" : "Hashtag"
      },
      {
         "href" : "https://mementomori.social/tags/politiikka",
         "name" : "#politiikka",
         "type" : "Hashtag"
      }
   ]
}
@snarfed
Copy link
Owner

snarfed commented Jun 27, 2024

I actually like this, it seems clever and a good UX idea, but it's definitely more difficult to translate. Bluesky uses index-based facets for hashtags and other rich text, but Mastodon's AS2 tags don't have indices, so we have to search for their name in the content, which doesn't work in this case because the name is the normalized text, eg #aanestaminen, which doesn't have the umlauts.

I could do something Mastodon-specific and parse content as HTML and search for class="hashtag" or rel="tag", but I'd still have to map the umlaut text there to the plain Latin text in tag.name, but that's a proprietary special that I'd rather avoid. Or I could ignore tags entirely and only look at the parsed HTML, but that's even more proprietary. Hrm.

@snarfed snarfed removed the now label Jun 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
2 participants