Jump to content

Talk:Third-party resources policy: Difference between revisions

From Meta, a Wikimedia project coordination wiki
Latest comment: 1 year ago by Bawolff in topic This doesn't read like a policy
Content deleted Content added
m →‎This doesn't read like a policy: talk pages! How do they work?
Line 80: Line 80:
::*Similarly i think "third-party" is confusing here because the context and threat model is subtley different from how most of the industry talks about such things. Not to mention the number of parties involved and their relationships.
::*Similarly i think "third-party" is confusing here because the context and threat model is subtley different from how most of the industry talks about such things. Not to mention the number of parties involved and their relationships.
::*regarding wmcs - i dont think that data is very meaningful by itself. Clearly there is an apetite to access external resources, but just from the data we don't know in what context the external api was called, what consent was obtained. More importantly we don't know what expectations the users have or how that varries between different groups, including vulnerable groups. Something being popular is just one part of the analysis, we should also analyze what the risks are, which are acceptable, which can be mitigated, etc.
::*regarding wmcs - i dont think that data is very meaningful by itself. Clearly there is an apetite to access external resources, but just from the data we don't know in what context the external api was called, what consent was obtained. More importantly we don't know what expectations the users have or how that varries between different groups, including vulnerable groups. Something being popular is just one part of the analysis, we should also analyze what the risks are, which are acceptable, which can be mitigated, etc.
::*re xss. To a lay person stealing session token sounds like some obscure piece of metadata, better to focus on actual consequences (e.g. turn your account into a vandal bot). But i would also talk about how for certain vulnerable people leaking the wrong person's ip to the wrong people could in the extreme case result in people dying. Privacy needs are extremely person specific.
::*re xss. To a lay person stealing session token sounds like some obscure piece of metadata, better to focus on actual consequences (e.g. turn your account into a vandal bot). But i would also talk about how for certain vulnerable people leaking the wrong person's ip to the wrong people could in the extreme case result in people dying. Privacy needs are extremely person specific. [[User:Bawolff|Bawolff]] ([[User talk:Bawolff|talk]]) 20:50, 5 June 2023 (UTC)


== Exemptions ==
== Exemptions ==

Revision as of 20:50, 5 June 2023

05 June 2023: Start of the policy conversation

Hello, feedback regarding the Third-party resources policy is welcome below this message. Feel free to use the initial questions below as a starting point for the conversation or bring your own. Thank you!

On behalf of the Wikimedia Foundation’s Security team — Samuel (WMF) (talk) 00:50, 5 June 2023 (UTC)Reply

Are risks sufficiently explained and relevant?

A security-wise less educated user can probably not understand the scope of the risks from these explanations alone (it is e.g. not obvious what you can do through harvested cookies and session tokens). I suggest adding links to somewhere where the concepts can be discussed a bit more in depth (perhaps a tailored set of security pages). It could also be worth pointing out that anything the owner of the script can do can be done by an attacker who succeeds in bypassing their defences – a benevolent owner may still use naïvely coded building blocks for their own scripts or otherwise neglect security.

The section on user privacy and safety does not differentiate between what a tool does and what it could do when interacting with a third-party resource. I think the section should concentrate on information that leaks regardless on how the tool is coded, as that's the (implicit) rationale for the proposal. Other leakage can be mentioned, but as this may include anything that the third party could do on a direct connection, it doesn't need to be thoroughly covered here. The latter theme could be covered on a page on WWW safety.

LPfi (talk) 14:25, 5 June 2023 (UTC)Reply

Hi @LPfi: and thanks for these suggestions. I am taking good note of your point about including explanatory resources for less security-savvy and illustrating how code used "naïvely" can lead to security issues. On the other hand, the portion about leakage is not that clear to me. When you suggest that the section should emphasize "information that leaks regardless on how the tool is coded", do you mean that the content should describe what personal information could be leaked or are you suggesting something else? — Samuel (WMF) (talk) 17:41, 5 June 2023 (UTC)Reply

Do the definitions and required precautions make sense?

It seems the proposed policy would require all interaction with third-party resources to be done through interfaces offered by WMF production servers. For it not to cause disruption, these need to be offered. My understanding is that OSM tiles are nowadays fetched and cached by these. Are there other resources needing similar treatment?

For the restrictions to make sense, WMF needs to guarantee privacy when using production servers or possible external resources needed e.g. for participating in elections and surveys. I am afraid WMF shouldn't trust third parties in such matters, but instead guarantee that no sensitive information is sent to third parties.

LPfi (talk) 14:25, 5 June 2023 (UTC)Reply

@LPfi: I hear your point and I agree that, generally speaking, safeguards should be in place to ensure the privacy of users. Regarding OSM, I'd like to note that the scope of this policy is purposely limited to User Scripts and Gadgets that load third-party scripts. Unless I am wrong, OpenStreeMap does not fall within those categories. — Samuel (WMF) (talk) 19:09, 5 June 2023 (UTC)Reply

How do you think the policy should be enforced?

When would it make sense to start enforcing the policy?

  • What would that even mean? If you delete the fluff, all this policy says is "Gadgets and user scripts should not load third-party resources". There are no requirements. Maybe this is just a language misunderstanding and you don't mean should in the rfc 2119 sense, but regardless it doesn't seem like there is anything to enforce here. Bawolff (talk) 16:39, 5 June 2023 (UTC)Reply
  • @Bawolff: "should" here is meant as a requirement rather than a suggestion. So, I understand how this may be confusing. As asked a bit below, I'm definitely open to considering language tweaks to make it sound more forceful. With respect to enforcement, there are at least a few things worth considering. How do we make sure all the gadgets and user scripts currently involved in privacy issues comply with the policy requirements? What would enforcement look like (eg: page blanking, CSP automatic block, etc)? Should there be a grace period before enforcement? Should enforcement be done in a phased way, with most critical gadgets and scripts having a longer grace period etc? Those are some questions that could be explored. As a side note, I don't believe that referring to most of the draft policy as "fluff" brings any improvement to the content. Instead, it is just dismissive to the efforts put in proposing something to be improved :). — Samuel (WMF) (talk) 19:37, 5 June 2023 (UTC)Reply

Should WMCS-hosted resources be considered third-party resources?

I assume that some of the resources are essential tools for part of the community, while coding tools isn't restricted to highly trusted users. For the proposed policy to make sense, tools that aren't scrutinised and controlled by trusted users must be regarded as third-party ones, while regarding all the tools as untrusted would cause severe disruption. –LPfi (talk) 14:25, 5 June 2023 (UTC)Reply

I think you need to draw a distinction between tools that are on a separate website (that you choose to go to) and tools that get embedded into some sort of user script. The former is 99% of tools and the privacy analysis between the two situations are quite different. Bawolff (talk) 17:19, 5 June 2023 (UTC)Reply

Examples of what this will end

Would be good to clarify what features this policy will end. For example we have the option for relief maps on WikiVoyage which notifies users that to activate it will share details with an external service.[1] I imagine this policy would end that? These relief maps are excellent especially for hiking.

Additionally we have a great deal of functionality on the wmcloud which could than become less usable, which IMO would be unfortunate. Doc James (talk · contribs · email) 10:57, 5 June 2023 (UTC)Reply

Yes, I'm confused on how OpenStreetMap falls into the "Risks" category. SHB2000 (talk | contribs) 11:48, 5 June 2023 (UTC)Reply
Hello @Doc James and SHB2000: and thanks for joining the conversation. The scope of the policy is purposely limited to User Scripts and Gadgets that load third-party scripts. It is my understanding that the Wikivoyage and OpenStreeMap examples you mentioned do not fall within those categories but I am glad to be proven wrong. That being said, it's worth noting that gadgets and scripts sharing user information with third-parties has always been a practice in conflict with both the Terms of Use and Privacy Policy (see Purpose section of the draft policy).
With respect to Wikimedia Cloud Services-hosted resources, you are raising a valid concern. Recent data-driven observations have surfaced that many of those resources conflict with the Wikimedia Privacy policy and Terms of use but are also helpful to various editing activities. This is why this policy conversation also explores the question of whether some WMCS-hosted resources should still be considered "third-parties" (see WMCS section above). The aim there is to evaluate avenues to enable some of those important editing-related resources while providing decent security and privacy guarantees to end-users. Any thoughts you may have in that direction are obviously welcome.
Samuel (WMF) (talk) 15:20, 5 June 2023 (UTC)Reply
i think it is a mistake to only include user scripts and gadgets here (presumably you are also including common.js which is neither a user script nor a gadget). Rules for thee but not for me type policies tend to trigger resentment and the privacy concerns don't change depending on who is writing the code. Bawolff (talk) 16:27, 5 June 2023 (UTC)Reply

This is a step backwards on our strategic goals

I'm concerned with this proposal, and I have some reasons. Graphs have been broken for more than a month now, and it doesn't seem that they are going to recover, losing opportunities to show things in a more interesting way. Some efforts we have made to show Ourworldindata maps and graphs are also stopped at "security review". Some years ago, we launched (and invested lot of money on) a system that would read the articles in some languages but it never succeeded because of the same reason.

And I would accept that having third-party resources is not the best option if we had product team working on those features. But this isn't happening and doesn't seem it will happen in the future. Some years ago we were obsolete. Now we are paleolithic, a relic for archaeologists of the Internet. Meanwhile, other platforms are advancing by the day on new features. Every day we are further from our 2030 strategic goal. Theklan (talk) 11:18, 5 June 2023 (UTC)Reply

I don't know what the article reading thing was, but security was hardly the only reason ourworldindata didn't happen, just the first hurdle it had to face. Besides technical concerns, the elephant in the room is that it was very unclear if enough users wanted it to make it worth it. Bawolff (talk) 17:00, 5 June 2023 (UTC)Reply
Let alone who was going to do the amount of work to make it viable within the WMF setting (ddos strengthening, privacy isolation, security, parsoid and VE support, cache consistency and purging). We are further away from 2030 goals, but this is not something unexpected. I and many others have been warning from the start of that whole process that we were already overstretched and that implementing the 2030 goals would require at least tripling the tech department, if not more. This was ignored for a long time and we have had several crisis with code maintenance as of late that clearly surfaced how fragile many elements were. —TheDJ (talkcontribs) 19:41, 5 June 2023 (UTC)Reply

This doesn't read like a policy

Policies are rules that people are obliged to follow or operating procedures. This just has some suggestions and best practises. If the document has no rules it is a guideline not a policy. I think this makes the intended purpose of this document confused.

Other comments:

  • i think the way this is using the term third party is confusing and misleading. The word third party makes people view this through the wrong lense. Often in other documents that refers to the authorship of the resource not the hosting provider (consider also external resources that are proxied or locally made resources hosted on github). I would suggest using the word "externally hosted" instead.
    • Additionally, this desperately needs to clarify where non-production resources (toolforge) falls (and any other things hosted by wmf that are not covered by the main privacy policy). Normally these would be considered external since the normal privacy policy doesn't apply.
    • it should maybe clarify if wmf can use such resources if contractual obligations are in place to ensure the privacy policy is followed.
  • it should clarify if the user is allowed to consent to such services (say if enabling a gadget) and under what circumstances consent is allowed as an escape hatch (my 2 cents is it should be acceptable in optionally enabled gadgets/user scripts but not in anything global enabled. But there is already an exception at wikivoyage)
  • i would suggest replacing the term end-user with user. End-user sounds a bit dehumanizing, but maybe that is just me.
  • this should probably emphasize not just explicit persinal info but incidental collection. E.g. when the external resource doesnt collect anything but their hosting provider logs ip addresses, etc.
  • re xss. This probably underplays the risks of xss a bit. However at the same time it is way too close to implying that xss is the only privacy risk here, which i think is misleading.
  • i think the "consider alternatives" section should be removed. I dont think it fits with this document; how to make a user script belongs elsewhere. Second it sort of implies that once you have considered alternatives you can use third party resources if no first party stuff does what you need. Third - realistically it is going to be pretty rare that that there is a local resource doing the exact same thing, especially when it comes to proprietary apis.

Bawolff (talk) 16:23, 5 June 2023 (UTC)Reply

I'd like to second the first point. In my reading, despite being labelled a "policy", this document does not require anyone to do anything, it only offers suggestions. It needs to be rephrased or renamed, depending on what was intended, before we can really discuss its contents. Matma Rex (talk) 18:29, 5 June 2023 (UTC)Reply
Hi @Bawolff and Matma Rex: and thanks for joining the discussion. I'd like to address some of the points shared while continuing to give the others some thoughts.
  • While I get the part about the text not reading like a policy, it is my understanding that the Required precautions section contains firm requirements such as "Gadgets and user scripts should not load third-party resources". Do you have any suggestions to tweak the language in a more assertive way perhaps?
  • I am not sure I get the portion about underplaying XSS. Do you think those risks should be emphasized, explained differently?
  • The "third-party" term was chosen to align more with the terminology used across the industry, especially within the Third-party risk management ecosystem.
  • WMCS-hosted resources are currently considered third-party resources. That being said, recent data-driven observations make it worth asking if and when some of those resources should be considered differently, hence the open questions at the top about WMCS. — Samuel (WMF) (talk) 18:55, 5 June 2023 (UTC)Reply
I suggest "must" instead of "should" and "could" if these are indeed requirements. Matma Rex (talk) 19:24, 5 June 2023 (UTC)Reply
Yeah, "avoid" is weak. It's a recommendation. It's a form of feedback from the Security team to the community, not a matter that requires feedback in return. ToBeFree (talk) 19:56, 5 June 2023 (UTC)Reply
  • re "aligning with industry" - i think that is the major problem here. Most of the parts of this policy that don't make sense here would make total sense as a part of a corporate policy designed to mitigate risk of third party components and supply chain attacks. The weaker language would make total sense in a corporate context where the assumption is that everyone has the same goals, people get fired if they play too fast and loose with doing what they "should" be doing, but exceptions still happen and there is a management chain to accept risk when things deviate from their ideal form. However that is not the context we are in, a top down policy from wmf is more like a law. If it doesn't set out specific binding requirements nobody will listen.
  • Similarly i think "third-party" is confusing here because the context and threat model is subtley different from how most of the industry talks about such things. Not to mention the number of parties involved and their relationships.
  • regarding wmcs - i dont think that data is very meaningful by itself. Clearly there is an apetite to access external resources, but just from the data we don't know in what context the external api was called, what consent was obtained. More importantly we don't know what expectations the users have or how that varries between different groups, including vulnerable groups. Something being popular is just one part of the analysis, we should also analyze what the risks are, which are acceptable, which can be mitigated, etc.
  • re xss. To a lay person stealing session token sounds like some obscure piece of metadata, better to focus on actual consequences (e.g. turn your account into a vandal bot). But i would also talk about how for certain vulnerable people leaking the wrong person's ip to the wrong people could in the extreme case result in people dying. Privacy needs are extremely person specific. Bawolff (talk) 20:50, 5 June 2023 (UTC)Reply

Exemptions

I think there should be some exemptions:

Personal exemptions

There should be an exemption for user's personal script files (e.g. Special:MyPage/vector-2022.css), these are self-managed and only apply to yourself. — xaosflux Talk 16:25, 5 June 2023 (UTC)Reply

Perhaps you get an on/off to accept these in your prefs... — xaosflux Talk 18:54, 5 June 2023 (UTC)Reply
The problem is that users use eachothers css/js files.. and only one of those users needs to have sysop, interface admin or checkuser rights to create complete chaos. —TheDJ (talkcontribs) 19:43, 5 June 2023 (UTC)Reply
which is probably a good argument to not conflate non-security privacy issues (non-xss) and security (xss) together in policies. Users can opt in to lower privacy and only affect themselves. They can't opt in to low security without affecting others. Bawolff (talk) 19:49, 5 June 2023 (UTC)Reply
Users use other people's browser extensions too, but in both cases they are choosing to do that themselves, to themselves. — xaosflux Talk 19:59, 5 June 2023 (UTC)Reply
Regarding users loading other user's scripts: that doesn't require a third party to cause a potential problem; if you load someone else's script you are subject to anything they put in there, the bad code can be local (as this discussion is about third party resources, not about prohibiting all local scripts as well). — xaosflux Talk 20:04, 5 June 2023 (UTC)Reply

Project exemptions

There should be a process for a project to determine if a TPR may be acceptable (such as by publishing a gadget), as per-user opt-in only. Some projects innovate with external resources such a mapping or translation services that there are no sufficient internal replacements for. — xaosflux Talk 16:25, 5 June 2023 (UTC)Reply

In a world with SUL it is very difficult to effectively isolate these sorts of things per-project Bawolff (talk) 17:01, 5 June 2023 (UTC)Reply
@Bawolff there are no global gadgets, so if these are opt-in only, you'd have to opt in at each project. Agree that a default-on gadget would affect unknowing users. — xaosflux Talk 18:50, 5 June 2023 (UTC)Reply
if each individual user is opting in, i think that is workable. For default enabled though, there is enough cross-wiki integration in wikimedia (not to mention everything being cors linked) i would worry that the user's privacy could potentially be compromised in some circumstances even if they never intentionally use the project in question. I guess it would matter a lot on specifics. Bawolff (talk) 18:58, 5 June 2023 (UTC)Reply
@Bawolff 100%! I'll clarify that above. — xaosflux Talk 19:02, 5 June 2023 (UTC)Reply

Proxy servers

Currently, https://fontcdn.toolforge.org/ is a proxy server for the Google fonts server. Would anonymizing proxy servers such as these fall under the category of third-party resources that could not be used? isaacl (talk) 17:33, 5 June 2023 (UTC)Reply

i don't know what this policy's take on it is, but the historical answer is that it is not allowed e.g. see discussion at https://phabricator.wikimedia.org/T209998 Bawolff (talk) 17:43, 5 June 2023 (UTC)Reply

Localhost

Please do not use CSP headers to block http://localhost and http://127.0.0.1; this "external" site is under the user's control and is useful for script development. Suffusion of Yellow (talk) 18:17, 5 June 2023 (UTC)Reply

New wiki for code incompatible with the CC-BY-SA?

Any thought of creating a new SUL-connected wiki, e.g. https://code.wikimedia.org/, not subject to the CC-BY-SA (but still requiring some sort of free license) where we can store GPL-"contaminated" user scripts? Then there would be no need to choose between reinventing the wheel, and storing such code on an "third-party" site. Suffusion of Yellow (talk) 18:33, 5 June 2023 (UTC)Reply

Personally i think we should have a specific gerrit (or gitlab) repo for this, not subject to the normal CR requirements (that is then automatically deployed to the site). MediaWiki isn't a fun source code management tool. Bawolff (talk) 18:38, 5 June 2023 (UTC)Reply