Hi. We're building The Nose (https://thenose.cc), a safe haven for training data that can't be taken down with DMCA. Since this involves copyright infringement, strong anonymity is a requirement.
The reason Tails isn't an option is because, as others have mentioned, there have been Tor browser exploits which reveal the IP address of the Tails user. While this is unlikely for our case, it's important to approach security from first principles with threat modeling. An attack from the FBI may seem unlikely today, but both Silk Road and one of its successors were taken down by mistakes they made when setting up their site. Learning from history, if you're not careful early, you're in for a surprise later.
Case in point: When I started Whonix Workstation to post this comment, the Whonix Gateway VM failed to boot. So when I tried to start Tor Browser and go to https://news.ycombinator.com, all I saw was a connection error. This kind of layered defense is essential if you're serious about staying out of jail.
Realistically, you'll likely dox yourself through some other means: sending Bitcoin to your pseudonym from your real identity, admitting to someone you know that you control your pseudonym (this work gets lonely, so this is a real temptation), or even accidentally signing off an email with "Thanks, [your real name]". And once you make a single mistake, you can never recover.
Day to day browsing is a pain. I use a VNC client to remote into our server, which is running a desktop environment with a regular browser. That way you can use apps (gmail, discord, etc) from outside the Tor network. But since you're tunneling through Tor, this is painfully slow. You'll likely want to type out long messages in Whonix, then copy-paste into your remote session. Each keystroke can sometimes take a full second to appear when animations are heavy.
Transferring large amounts of data is also painful. If you try to start Litecoin Core on Whonix, you'll need to sync more than 30 GB, which can take a very long time.
Patience is your weapon. You have all the time in the world not to make a mistake, and moments to make a fatal one. Think carefully about everything you do.
Stylometry scares me. AI can help here: run an assistant locally, and ask it to reword everything you write. You won't be able to use ChatGPT for this, obviously because OpenAI retains a history of everything you submit, but also because they require a real phone number to sign up. And you can't get a real number through any means I've found so far.
Payment is also a pain. I'm hoping to ask the community to donate Vanilla gift cards so that I can sign up for Tarsnap or spin up a droplet.
By applying the discipline normally found in aeronautics, I think it's possible to do this safely. But you'll still be risking jail time, and the intersection of people who want to do something for altruistic reasons and willing to risk prison is pretty small. I'll be documenting everything I do so that you can learn from my example, or perhaps from my mistakes.
I like the way you describe your process. As the person who made the stylometry thing that made the rounds a while back, I would say the best thing you can do on that front is to either get a "paraphraser" like ChatGPT/translators or just write less. Also, there's a site called smspva.com and a lot of sites like it where you can rent "real" phone numbers and they take every payment method under the sun. Depending on the country a phone number to receive an OpenAI confirmation code is about $0.50, most less popular services are like $0.10-$0.20.
llama.cpp runs LLaMa 2 7B on common hardware like a MacBook Pro. Haven't tried it yet on my RTX 3070 (Mobile) but there's no reason why it shouldn't work.
A 7B LLM has a huge quantity of knowledge about the world. You don't need that just to reword sentences. You can use a translation model with English input and English output, or other Text2Text model such as one for textual style transfer. A purpose-built model for rewording into a fixed style different from the input could be easily be 10M parameters or fewer (that's already big enough for translating between two languages, afterall) but you can readily find models in the 100M range for text style transfer.
Are you currently hosted on Shinjiru now? I'm thinking about using them as a reverse proxy in front of a site that might suffer false DMCA attacks. I don't want my web host to ban me just because they can't deal with the hassle, so I'm thinking about proxying all the requests.
What does Shinjiru do if they receive a DMCA notice?
When I ran a huge private torrent tracker I paid a decent chunk to get a host that ignored every single request of any type that they received.
I think if you're interfacing with your server without going through Whonix, you're asking for trouble. Not only do you need to pay for the server using BTC that can't be traced back to your identity, but anything that touches the server (such as your server you're proxying with) needs to take the same precautions, which means no DigitalOcean, unless you can somehow pay them without that also being tied to your identity.
If you're not actually worried that DMCA people will follow through on their threat to sue you, or you really want to risk losing your property in the event of a lawsuit, then perhaps this might work.
Feel free to email me for more advice or to keep in touch. Your project sounds interesting.
I wrote up our security procedures here: https://news.ycombinator.com/item?id=37346620
The reason Tails isn't an option is because, as others have mentioned, there have been Tor browser exploits which reveal the IP address of the Tails user. While this is unlikely for our case, it's important to approach security from first principles with threat modeling. An attack from the FBI may seem unlikely today, but both Silk Road and one of its successors were taken down by mistakes they made when setting up their site. Learning from history, if you're not careful early, you're in for a surprise later.
Case in point: When I started Whonix Workstation to post this comment, the Whonix Gateway VM failed to boot. So when I tried to start Tor Browser and go to https://news.ycombinator.com, all I saw was a connection error. This kind of layered defense is essential if you're serious about staying out of jail.
Realistically, you'll likely dox yourself through some other means: sending Bitcoin to your pseudonym from your real identity, admitting to someone you know that you control your pseudonym (this work gets lonely, so this is a real temptation), or even accidentally signing off an email with "Thanks, [your real name]". And once you make a single mistake, you can never recover.