Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rel-urls Parsing Issues #50

Open
jgarber623 opened this issue May 12, 2020 · 3 comments
Open

rel-urls Parsing Issues #50

jgarber623 opened this issue May 12, 2020 · 3 comments

Comments

@jgarber623
Copy link
Member

jgarber623 commented May 12, 2020

Section 1.4 of the microformats2 parsing specification outlines how to parse link elements (<a>, <link>, etc.) for rel values and defines the JSON output structure.

The rels structure is reasonably straightforward and maps one-to-one with matched elements:

<a rel="author" href="http://example.com/a">author a</a>
<a rel="author" href="http://example.com/b">author b</a>
<a rel="in-reply-to" href="http://example.com/1">post 1</a>
<a rel="in-reply-to" href="http://example.com/2">post 2</a>
<a rel="alternate home"
   href="http://example.com/fr"
   media="handheld"
   hreflang="fr">French mobile homepage</a>

…results in…

{
  "rels": { 
    "author": [ "http://example.com/a", "http://example.com/b" ],
    "in-reply-to": [ "http://example.com/1", "http://example.com/2" ],
    "alternate": [ "http://example.com/fr" ],
    "home": [ "http://example.com/fr" ]
  }
}

The parsing rules break down slightly when compiling results for the rel-urls structure. For each unique URL, the resulting JSON hash should include a key rels whose value is an array of strings found across matched link elements. The spec also defines rules for parsing various attributes (hreflang, media, title, and type) and the node's text value. These extended attributes are specified as strings (not arrays), resulting in data loss and a seemingly inconsistent parsing pattern.

Parser Results

Parser developers have implemented this feature with differing results.

Given the markup:

<link rel="me" href="https://sixtwothree.org">

<a rel="me" href="https://sixtwothree.org">Jason Garber</a>
<a rel="home" href="https://sixtwothree.org">Go back home</a>

…the parsers provide differing result JSON.

Go

{
  "items": [],
  "rels": {
    "home": ["https://sixtwothree.org"],
    "me": ["https://sixtwothree.org"]
  },
  "rel-urls": {
    "https://sixtwothree.org": {
      "rels": ["me"]
    }
  }
}

PHP

{
  "items": [],
  "rels": {
    "me": ["https://sixtwothree.org"],
    "home": ["https://sixtwothree.org"]
  },
  "rel-urls": {
    "https://sixtwothree.org": {
      "text": "Jason Garber",
      "rels": ["home", "me"]
    }
  }
}

Python

{
  "items": [],
  "rels": {
    "me": ["https://sixtwothree.org"],
    "home": ["https://sixtwothree.org"]
  },
  "rel-urls": {
    "https://sixtwothree.org": {
      "text": "",
      "rels": ["home", "me"]
    }
  }
}

Ruby

{
  "items": [],
  "rels": {
    "me": ["https://sixtwothree.org"],
    "home": ["https://sixtwothree.org"]
  },
  "rel-urls": {
    "https://sixtwothree.org": {
      "rels": ["home"],
      "text": "Jason Garber"
    }
  }
}

Note: The Node parser on microformats.io appears to be offline.

So…

The test suite's rel tests appear to conform to the spec as its written today. What I'd like help sorting out is what seems like an arbitrary (or, at least undocumented) decision to only aggregate rel attribute values in the rel-urls result structure. The extended attributes are, per the spec, worth capturing, but not worth capturing as arrays. That seems strange.

Can someone shed some light on the subject and/or can we update the spec to be more clear or to change behavior?

Edit 1: #39 is tangentially related to this, as well.

Edit 2: #32 is also related to this.

@gRegorLove
Copy link
Member

gRegorLove commented May 12, 2020

Here's the previous discussion and resolution.

My reading of the current spec is that this is correct for rel-urls:

{
  "items": [],
  "rels": {
    "me": ["https://sixtwothree.org"],
    "home": ["https://sixtwothree.org"]
  },
  "rel-urls": {
    "https://sixtwothree.org": {
      "text": "Jason Garber",
      "rels": ["home", "me"]
    }
  }
}

Parsing the first link adds me to the rels; parsing the second adds the text property; parsing the third adds home to the rels.

Edit Just noticed that this does lose the text value of the third link since that's already set by the second one. Hm.

@jgarber623
Copy link
Member Author

Tagging @kevinmarks and @sknebel on this one.

Building on something Kevin mentioned in chat, say you're viewing a blog post in a Web browser and the page advertises alternate versions available at the same URL but with responses dictated by the incoming request's Accept header:

<link rel="alternate" href="https://sixtwothree.org/posts/877-days" type="application/json">
<link rel="alternate" href="https://sixtwothree.org/posts/877-days" type="text/markdown">

The above example is a modified version of some markup I have on my own website. curl-able by issuing the following commands:

curl -H 'Accept: application/json' https://sixtwothree.org/posts/877-days
curl -H 'Accept: text/markdown' https://sixtwothree.org/posts/877-days

With the aforementioned parsers on microformats.io, you'd miss out on the text/markdown alternate version because the types key in the rel-urls structure is a simple string, not an aggregate array of matched values.

The same would be true of hreflang, media, etc. but the use case for that data is a little less obvious to me.

@aimee-gm
Copy link
Member

@jgarber623 thanks for raising this. I too found this ambiguous while implementing a parser.

The output of https://aimee-gm.github.io/microformats-parser/ (a JavaScript parser) is:

{
  "rels": {
    "me": [
      "https://sixtwothree.org"
    ],
    "home": [
      "https://sixtwothree.org"
    ]
  },
  "rel-urls": {
    "https://sixtwothree.org": {
      "rels": [
        "me",
        "home"
      ],
      "text": ""
    }
  },
  "items": []
}

I also agree with @gRegorLove that this should have a non-empty string text value.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
3 participants