isKnown state #20

AramZS · 2020-09-10T17:16:48Z

This builds off of the conversation around the isLoggedIn proposal by @johnwilander at https://github.com/privacycg/is-logged-in

This is a rough sketch here but I think we would like to create a more private way to understand a user as a repeat visitor to the site without having to invade the user's privacy by requiring they identify themselves.

I will explain with the use case I am most familiar with, paywalls. This is one case, but I do not think it is the only case. It is a distinct case from the one I see being discussed at privacycg/is-logged-in#9 which is a reaction to a logged in state. The isKnown state, as I am currently proposing it, would not require a user to log in.

Many publishers use paywalls, these are important to maintain a profit for many publishers. The way this works right now is a site lays out a cookie, that cookie records the number of visits and when the number of visits exceeds the number we wish to allow freely then we can trigger a paywall and block access to content.

This process suffers from potentially the same issues that maintaining a logged in state does (which is why the isLoggedIn is a useful reference):

It is often maintained via a fingerprinting process (in the same way that sites attempt to retain the user's login outside of the cookie by profiling and recording information about their device) which is privacy invasive.
It is dependent on cookies or storage in such a way that it could potentially leak additional data about the users to third parties.

It contains an additional threat to privacy in that maintaining a connection with the user outside of login that requires tracking the user without their explicit consent (in UI terms, not diving into the legal side of this right now), which opens up problems that don't gel well with the future privacy-first web.

What I would like to have instead is a very simple non-invasive way to handle this, one not dependent on cookies or storage. This is important because, as we are already starting to see in anticipation of the post-cookie world, sites that feel they do not have some ability to meter access to content will push users to annoying UI on their first visit, asking them to log in to read even the first page they arrive on.

In a best-case scenario I'd imagine the isKnown state to be composed of two parts.

The first property is a single number which can be either incremented or reset one time per pageview by the site the user has landed on. This will prevent the site from trying to use the value for tracking or fingerprinting, but allow the site to own its definition of "known user" in a way consistent with how it is currently being done in the wild.

The second property is a single number that represents the period of hours (not seconds, not milliseconds, to avoid fingerprinting) since the last Known access.

Some potential flows:

New users:

User arrives at a site
Site sees isKnown.count == 0
User's isKnown is incremented
User's isKnown.count == 1

Returning users (short time period):

User arrives at a site for the second time
Site sees isKnown.count > 0
Site sees isKnown.time < 720
This means it is the second time the user has visited in less than 30 days
User's isKnown is incremented
User's isKnown.count == 2

Returning user (long period):

User arrives at a site for the second time
Site sees isKnown.count > 0
Site sees isKnown.time >= 720
This means it has been more than 30 days since the last visit
User's isKnown is reset
User's isKnown.count == 1

By having a way to measure access reliably while not relying on privacy invasive methods like fingerprinting we can better support an open web and legitimate publishers who would like to maintain users' privacy while also remaining profitable.

We've discussed this in calls a few time and I keep promising to write something up, but generally have been low on time to do so, so I'll keep this brief and try to hash it out via Q&A to see if it makes sense to advance this idea further.

gffletch · 2020-09-11T15:14:46Z

For publishers with pay-walls I can see where this would be extremely helpful. However, from a security perspective for identity providers this isn't sufficient as the Identity Provider needs to know that a specific identity (user) has used this browser to login in the past. The identity data can be encrypted so that only the identity provider can access the information and the data can be written on the fully qualified identity provider domain so as to not leak to other sites within the root domain.

jkarlin · 2020-09-11T16:00:26Z

Paywalls are an interesting case. I think the challenge for them is that I'd imagine sites with paywalls would want said paywall counters to carry over to incognito/private mode as well. Is that right?

Edit: removing bikeshedding comment as it's non-productive at this point.

samuelweiler · 2020-09-11T19:50:02Z

Why not just do this with a non-identifying cookie or pair of cookies? You write of a 'post-cookie world', but I'm still seeing interest in having some (perhaps limited) persistence for first-party cookies. Is this solving a problem that doesn't need new primitives to solve?

AramZS · 2020-09-14T13:41:02Z

For publishers with pay-walls I can see where this would be extremely helpful. However, from a security perspective for identity providers this isn't sufficient as the Identity Provider needs to know that a specific identity (user) has used this browser to login in the past. The identity data can be encrypted so that only the identity provider can access the information and the data can be written on the fully qualified identity provider domain so as to not leak to other sites within the root domain.

@gffletch I agree, that isn't the use case I intend this for. That use case is being discussed in privacycg/is-logged-in#9. This is not a case intended to be used to identify the user, but instead an alternative used to understand patterns of access without requiring a lot of knowledge about the user to be stored.

AramZS · 2020-09-14T13:43:55Z

Paywalls are an interesting case. I think the challenge for them is that I'd imagine sites with paywalls would want said paywall counters to carry over to incognito/private mode as well. Is that right?

Edit: removing bikeshedding comment as it's non-productive at this point.

@jkarlin I mean... haha yeah, we'd love to have the paywall counters carry over to incognito/private mode! There are a lot of publishers out there now who actually block on incognito mode when they can detect it (which is infrequently). If an increased level of anonymity would allow browsers to create a cross-standard/incognito counter to understand a user a returning visitor within X time (and no other data about them) that would be amazing. I'm having a hard time imagining users being ok with that though?

AramZS · 2020-09-14T14:04:53Z

Why not just do this with a non-identifying cookie or pair of cookies? You write of a 'post-cookie world', but I'm still seeing interest in having some (perhaps limited) persistence for first-party cookies. Is this solving a problem that doesn't need new primitives to solve?

@samuelweiler A lot of paywall management is done with vendors who are third party scripts, something that seems very likely to end up blocked as we increase browser privacy. Beyond that, what forces a cookie to be non-identifying? If the goal is to give users an increasingly private experience over time with something like isLoggedIn (which is also a case that could be stored to a persistent cookie) than I think we might need this as well?

melanierichards · 2020-09-16T16:01:14Z

(My comment pertains to API shape + bits of entropy vs an opinion on the need for this proposed API, so feel free to defer until later; just wanted to get my thoughts on paper, so to speak.)

It is often maintained via a fingerprinting process (in the same way that sites attempt to retain the user's login outside of the cookie by profiling and recording information about their device) which is privacy invasive.

In the interest of avoiding fingerprinting behaviors, I wonder if it's possible to further minimize the data this proposal exposes and still address the motivating use case. Reading back a counter and a measure of recency could be pretty helpful in building up the unique-ness of a given user.

Instead of reading back a counter:

The site specifies a maximum count
The UA stores an internal counter and the maximum value
The site tells the UA when to increment the counter
In order to determine whether the user has hit the paywall limit, the site could read back a Boolean value (did the user hit the limit or not?)

We'd probably need to design this such that the site isn't setting unique maximum counts, or incrementing the max count continuously.

The time value could be similar. Instead of reading back a number of hours:

The site could set an IsKnown timeout
The UA could reset the internal IsKnown counter when the timeout is reached

arthuredelstein · 2020-09-17T07:02:27Z

This counter seems easily spoofable -- the UA just needs to reset the counter to zero and the user can read more free articles, right? So publishers will still have an incentive to fingerprint the user.

AramZS · 2020-09-17T14:42:55Z

The site specifies a maximum count
The UA stores an internal counter and the maximum value
The site tells the UA when to increment the counter
In order to determine whether the user has hit the paywall limit, the site could read back a Boolean value (did the user hit the limit or not?)

The more I think about it the more sense this makes, I agree, avoiding any exposure to the number value of the counter is a more private design.

The UA could reset the internal IsKnown counter when the timeout is reached

This would seem to me to be the simplest approach!

And I agree we would have to think about the design of how the counter is handled in order to avoid a situation where it becomes a fingerprinting vector.

AramZS · 2020-09-17T14:45:50Z

This counter seems easily spoofable -- the UA just needs to reset the counter to zero and the user can read more free articles, right? So publishers will still have an incentive to fingerprint the user.

Users already clear cookies, switch browsers, go incognito. Generally the efficacy of trying to fingerprint the user is going down now and going to continue to go down. I think that trying to design a methodology that removes the capability of a limited number of motivated people to avoid pay/reg walls is pretty much impossible. However, clean, non-identifying, recognition of a returning viewer is a useful potential signal and could be used for a lot more than just paywalls, sites can create other incentives as well. Further, we could examine this as a signal to potentially unlock API features in a limited way in the same way isLoggedIn does, but that is a larger conversation.

AramZS · 2020-09-17T14:49:27Z

An additional factor we should likely discuss is scope: I would prefer to avoid a situation where multiple entities on a page could create different values for isKnown and potentially creating a way to identify a user through that method, but I can also see the argument for embedded entities wanting their own isKnown value to handle their own actions in cases like video or embedded articles, etc...

Not sure what the answer is yet, but wanted to mark this as an issue to discuss if we wish to progress.

hober · 2022-03-14T18:29:06Z

hi @AramZS, are you still interested in pursuing this? @johnwilander, I wonder if this could be rolled up into the Login Status API?

johnwilander mentioned this issue Sep 10, 2020

Support "Remember me" functionality privacycg/is-logged-in#9

Open

AramZS added the agenda+ Request to add this issue to the agenda of our next telcon or F2F label Sep 11, 2020

TanviHacks added the agenda+F2F Request to add this issue or PR to the agenda for our upcoming F2F. label Sep 11, 2020

TanviHacks removed the agenda+ Request to add this issue to the agenda of our next telcon or F2F label Sep 11, 2020

AramZS mentioned this issue Sep 15, 2020

High-level trust signal (requestInstall / requestTrust)? #21

Open

hober removed the agenda+F2F Request to add this issue or PR to the agenda for our upcoming F2F. label Jan 11, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

isKnown state #20

isKnown state #20

AramZS commented Sep 10, 2020 •

edited

Loading

gffletch commented Sep 11, 2020

jkarlin commented Sep 11, 2020 •

edited

Loading

samuelweiler commented Sep 11, 2020

AramZS commented Sep 14, 2020

AramZS commented Sep 14, 2020

AramZS commented Sep 14, 2020

melanierichards commented Sep 16, 2020

arthuredelstein commented Sep 17, 2020

AramZS commented Sep 17, 2020

AramZS commented Sep 17, 2020

AramZS commented Sep 17, 2020

hober commented Mar 14, 2022

isKnown state #20

isKnown state #20

Comments

AramZS commented Sep 10, 2020 • edited Loading

gffletch commented Sep 11, 2020

jkarlin commented Sep 11, 2020 • edited Loading

samuelweiler commented Sep 11, 2020

AramZS commented Sep 14, 2020

AramZS commented Sep 14, 2020

AramZS commented Sep 14, 2020

melanierichards commented Sep 16, 2020

arthuredelstein commented Sep 17, 2020

AramZS commented Sep 17, 2020

AramZS commented Sep 17, 2020

AramZS commented Sep 17, 2020

hober commented Mar 14, 2022

AramZS commented Sep 10, 2020 •

edited

Loading

jkarlin commented Sep 11, 2020 •

edited

Loading