Page MenuHomePhabricator

Tor hidden service for WMF websites
Open, Needs TriagePublic

Assigned To
None
Authored By
tstarling
Jun 19 2017, 4:54 AM
Referenced Files
None
Tokens
"Love" token, awarded by andrea.denisse."Love" token, awarded by Steinsplitter."Love" token, awarded by Liuxinyu970226."Like" token, awarded by TheresNoTime."Barnstar" token, awarded by Dzahn."Love" token, awarded by jayvdb.

Description

@CristianCantoro has proposed introducing a Tor hidden service for reading and editing Wikipedia. This task is for the technical details of such a gateway.

Service or integrated

The basic implementation options are:

  1. Make a frontend or proxy which rewrites URLs and makes any necessary skin modifications to indicate to the user that they are on the .onion site.
  2. Reconfigure or hook MediaWiki to make it generate the right HTML in the first place.

EOTK is an example of option 1, it uses ~500 lines of nginx configuration and embedded Lua to do fairly naïve URL rewriting. It doesn't attempt to properly parse the JS, CSS and HTML that it rewrites.

MobileFrontend shows approximately what it would take to do option 2. It uses a BeforePageRedirect hook to modify 30x responses. It avoids a lot of HTML rewriting that EOTK tries to do, with a bit of knowledge of MediaWiki. MediaWiki uses host-relative URLs in internal links, and CSS and JS references, so as long as the path structure is the same, there's no need to rewrite them. T156847 is a proposal to make MediaWiki aware of the domain it is being viewed under, to reduce the need for these assumptions and hacks.

The old secure.wikimedia.org gateway was along the lines of option 2, but with a single domain name and path rewriting. It reconfigured MediaWiki on startup and fragmented the parser cache.

The meta wiki page suggests iaproxy and @csteipp's mediawiki-proxy as possible off the shelf service-based implementations.

The hostname and path

It would be easy to use scallion to brute-force the first 9 characters of the key hash to obtain wikipediaXXXXXXX.onion, where XXXXXXX is 7 random characters. It is not so trivial to brute-force 11 or 12 characters, hundreds of times, in order to include the language code in the 2LD. However, it is reportedly possible to have subdomains of .onion domains. This is not mentioned by the Tor design paper, which proposes a different interpretation of the third-level domain name label, but appears to be common practice. The Tor client apparently strips out the third level domain when establishing the circuit, and then the browser sends it in the Host header as normal.

So we can have en.wikipediaXXXXXXX.onion/wiki/Foo, or we can have wikipediaXXXXXXX.onion/en/wiki/Foo, if we allow path rewriting similar to what was done in secure.wikimedia.org.

Abuse control considerations

It's proposed that it won't be possible to edit via Tor unless logged in. So we won't have the issue of MW attributing hidden service edits to an internal IP address. There will always be a username for attribution. However, the CheckUser extension may need modification to tag users who are using the hidden service in a human-readable way.

Event Timeline

Hi,

Thanks Tim for filing this bug report. Two further considerations:

  1. based on my understanding, iaproxy uses regular expressions to replace (preg_replace) the original domain (called $TargetDomain) to the onion address (called $RequestDomain), so I would say it shares some limitations of EOTK (e.g. if there is an address in a <nowiki> tag it should not be replaced).
  2. as additional references for the handling of third-level subdomains of .onion address, I would point out these tweets by Alec Muffet (who worked for Facebook and set up an onion service there): « [...] Subdomains just "work" - the Tor daemon ignores them during resolution while the browser passes them along. Result is you host all the subdomains on a single onion address, but the browser is happy. Thus ".onion" can be like ".fr" or any TLD.» (source: 1, 2)

Thanks Tim, that's so constructive :)

Yes, subdomains work. There is even a special exception in the CAB Forum requirements about wildcard EV certificates specifically for .onion addresses (EV are required for .onion, and EV wildcards are otherwise forbidden).

Brute-forced domains have pros and cons. Unless we manage to find an otherwise readable domain (like Facebook's facebookcorewwwi), you run into the danger of people learning to read "wikipedia" and to ignore the second half of the domain, allowing someone to just generate an wikipediaYYYYYYY with scallion and attempt to impersonate us. (HTTPS with EV will help, though). Perhaps it'd be better to just generate domains with a "wiki" prefix, instead of the full Wikipedia one?

I ran scallion for a few days (intermittently) on my GTX 1050, to see what we get in terms of readable domains. I ran it with a lot of alternative prefixes, since that is efficient, just extra entries in a hashtable. I now have private keys for the following domain names:

1
2commonsgmd75obfi.onion
3mediawikic3gkygc.onion
4mediawikiirq6eof.onion
5mediawikijdlfmn7.onion
6mediawikimgpvk4y.onion
7mediawikiuh7oc4p.onion
8mediawikix4ijaw7.onion
9mediawikiygpwxtk.onion
10wcommons2uyz6p2l.onion
11wcommons2xdidcu4.onion
12wcommons6gyni3as.onion
13wcommonsbobq4tya.onion
14wcommonscp3r4dyg.onion
15wcommonsh47kr36s.onion
16wcommonshmghqock.onion
17wcommonslwt62r45.onion
18wcommonss23vruxj.onion
19wcommonsvqv6juqh.onion
20wikibooks3yn3vfq.onion
21wikibookskfy45vj.onion
22wikibookssstq4lb.onion
23wikibooksumkljfd.onion
24wikibookszrbincc.onion
25wikidatadfvyudn3.onion
26wikidatagin3spem.onion
27wikidatajlxulo7u.onion
28wikidatakkeetdni.onion
29wikidatapchoavkv.onion
30wikidatapvtrnqti.onion
31wikidatax66u46ja.onion
32wikimedia7nv6u2q.onion
33wikimediacyweboe.onion
34wikimediafeeetr3.onion
35wikimediamcwr67v.onion
36wikinews7p7n4scw.onion
37wikinewsbfvlikru.onion
38wikinewsejev3dox.onion
39wikinewset7hsapx.onion
40wikinewsflwrd66w.onion
41wikinewsio45a4om.onion
42wikinewso6k5qvo6.onion
43wikinewsp7mgo2qz.onion
44wikinewsrkmlpyxk.onion
45wikipediabsorqnj.onion
46wikipediajjy3h7k.onion
47wikipediakkazmkr.onion
48wikipediam5nhyml.onion
49wikiquot4b2oxyy6.onion
50wikiquotbown5pad.onion
51wikiquote3jggpgp.onion
52wikiquote45y4y3f.onion
53wikiquoteddppsli.onion
54wikiquoteogzknaf.onion
55wikiquotes7xsyle.onion
56wikiquotjsfhqsx6.onion
57wikiquotugcs7qi6.onion
58wikiquotvn5sbl7h.onion
59wikiquotw3q3ctbv.onion
60wikiquotx6tdpqdu.onion
61wikisourcu6smbxu.onion
62wikisourcy2h6guo.onion
63wikiversiatzsvei.onion
64wikiversimmdfz5j.onion
65wikiversioysugda.onion
66wikiversiqvl3c5w.onion
67wikivoyag2zhdk7d.onion
68wikivoyag4auvpbp.onion
69wikivoyag4ox7px4.onion
70wikivoyag4rzkh4m.onion
71wikivoyagpjl6r37.onion
72wikivoyagqxy76il.onion
73wikivoyagwhmze75.onion
74wikivoyagz7cgcfl.onion
75wikivoyagzkvxf7r.onion
76wikivoyge5yhosqh.onion
77wikivoyge7qykfdy.onion
78wikivoygebahdedy.onion
79wikivoygekku2bhw.onion
80wikivoygernfo4cz.onion
81wikivoygexm7vue5.onion
82wikivrstye5puzq7.onion
83wikivrstyi652zj2.onion
84wikivrstyjgnvvu3.onion
85wikivrstyjjeklwu.onion
86wikivrstyoiyrycs.onion
87wikivrstyt5lvcht.onion
88wiktionardvczkqh.onion
89wiktionarjwrhhkg.onion
90wiktionarqt4zwv3.onion
91wiktionarugivr4k.onion
92wiktionarxxph6vy.onion
93wiktionryfyf35ex.onion
94wiktionrylumdmqa.onion
95wiktionrynvwvnq3.onion
96wiktionrypsiyqgu.onion
97wiktionrysxk3xvf.onion
98wiktionryuvk5gpc.onion
99wmcommonsa5gr5pw.onion
100wmcommonsnzd7xjr.onion
101wmcommonss2alhlu.onion
102wmcommonst6udbzp.onion
103wmcommonsthyx4fx.onion

See what you think. The wikimedia*.onion domains could be used with a two-level subdomain, e.g. en.wikipedia.wikimediafeeetr3.onion.

  • wikipediabsorqnj.onion -- almost manages to spell "absorb"
  • wikimediacyweboe.onion -- obviously a web server sponsored by Wikimedia Wales
  • wikimediafeeetr3.onion -- feature misspelt with leet final letter

Hi! I'm Alec, I write EOTK. I've improved it a lot over the summer, culminating in a deployment at the New York Times: https://open.nytimes.com/https-open-nytimes-com-the-new-york-times-as-a-tor-onion-service-e0d0b67b7482

I've set up a read-only Onion site for Wikipedia & related properties, and have posted details of how to access it at the following Twitter thread: https://twitter.com/AlecMuffett/status/933735934272704512

Let me know what you think.

edit: have also annotated: https://meta.wikimedia.org/wiki/Grants_talk:IdeaLab/A_Tor_Onion_Service_for_Wikipedia#Demo_Onion_Site.2C_Temporarily_Available

edit2: I don't know whether it would be proper for me to edit: https://meta.wikimedia.org/wiki/Grants:IdeaLab/A_Tor_Onion_Service_for_Wikipedia

As discussed before, I think it is worth exploring whether we could add Orbot traffic channeling for our mobile apps. If performance is an issue, as an option (clearly marked during set up), and if possible - by default.

Facebook offers Orbot channeling, why shouldn't we?

As discussed before, I think it is worth exploring whether we could add Orbot traffic channeling for our mobile apps. If performance is an issue, as an option (clearly marked during set up), and if possible - by default.

Facebook offers Orbot channeling, why shouldn't we?

That's T163747: Add support for Tor or other proxy support to the Wikipedia Android App.

How is this progressing? Meanwhile TOR moved to new and longer addresses and Riseup just uses the standard ones so I think we should follow their lead and do the same https://riseup.net/en/security/network-security/tor

Is anyone gonna write a grant application before the deadline in march on this?

Additional context on why do we need this.

In the last few days there have been reports that the Russian government is mandating all users to install a government-issued CA. This means that in principle they could impersonate any HTTPS website: Bugzilla Bug #1758773.

Some thoughts I've been brainstorming:

  • For read traffic, onion service requests should go through Varnish/frontend caches to maximize caching/perf, as well as DoS prevention
  • For write traffic, it should be blocked via TorBlock. So we probably need to set some custom header on the request to flag it as being over the onion service.
    • If someone has IPBE and is allowed to use Tor, then we still need some IP address to log for CheckUser. Maybe we can set some special reserved IP address, which flags CU to just show the IP as "Tor network". Would that be useful even for normal Tor users?
  • This seems like something we should be able to deploy using k8s. It really just needs tor (via Debian package), the torrc config, and the private key for the hidden service (provisioned via private puppet)
  • Before officially announcing, we should discuss with Tor Project folks if they'd support adding human readable "Onion Names" for us, so you could go to http://en.wikipedia.tor.onion and it sends you to the correct very long onion service domain. https://securedrop.org/news/introducing-onion-names-securedrop/ describes how it works for SecureDrop (currently the only thing using this).

Might be a good idea to do this in stages fwiw — read access first, see how that pans out, given that the primary use case for this is people reading Wikipedia? Plus read access seems Easy and Straightforward™ in comparison to the other bits

IMO it's important we continue to portray Wikipedia as a thing you can edit (despite all the hurdles for Tor users...), so if this is an official service I don't think we should skip editing support, even temporarily.

One other point I forgot to mention earlier, we'll need something to do rewriting of URLs. Originally I had the idea of creating a MediaWiki extension to do that, but now I think it might be simpler if we used something like eotk (https://github.com/alecmuffett/eotk).