Skip navigation

Tag Archives: file sharing

(WARNING: I make no promises here; my P2P software is vaporware until I get the details worked out. I don’t want anyone thinking there’s something coming until there actually IS something coming.)

First Generation: In the beginning, there was Napster. Napster was the first user-friendly MP3 sharing program. Sure, songs and media were shared via IRC and FTP sites before Napster, but Napster made it extremely simple and easy to share music with other people. The biggest problem with Napster was that the Napster servers ran everything: they maintained a master index of files and a list of users sharing those files, and connected users together to perform the actual transfer. When record labels got angry, they could easily point to Napster’s centralized catalog and say “there’s no reason you can’t block our songs from being downloaded, because you control the entire process!”

Second Generation: Ahh, yes…Morpheus, Grokster, LimeWire, and the infamous Kazaa. These networks dropped the central index by running searches directly from one computer to multiple other computers. In theory, this removed centralization and made it difficult to shut down the networks. Unfortunately, there was still centralization involved: someone had to tell the computers what other computers were on the network in the first place. The indexing of files was gone, but the network still largely relied on a parent company’s servers to operate. Some of this stuff is still around today with alternative servers being used, but they’re mostly defunct due to the third generation. Well, that and the fact that at least some of these networks had gaping security holes that were easily exploited to render them useless. It was easy as pie to flood the FastTrack network that powered Kazaa and Morpheus with corrupt data.

Third Generation:  Simply put, BitTorrent and eMule. These systems are hybrids; they operate both from servers (in BitTorrent they’re called trackers) as well as with a fully decentralized second network known as DHT (distributed hash tables, NOT dihydrotestosterone, for you chemistry nuts.) Multiple servers are available and there is much less centralization involved, plus DHT doesn’t go through “servers” at all: computers find each other through other computers, in what is known as the DHT “overlay network.” BitTorrent trackers exist which are completely open and that may be freely tacked onto existing torrents to prevent one tracker’s failure from killing the torrent.

However, one thing hasn’t changed since Napster: computers still communicate with each other directly, immediately revealing the IP address of the uploader and downloader to each other. Furthermore, the way that these networks’ servers operate means that hostile parties such as the RIAA, MPAA, porn production companies, etc. can simply connect to a server, request a list of peers for a supposedly infringing file of interest, and the server hands them a big batch of IP addresses that have that file. Even if the servers didn’t make it so easy, it’s trivial to extend a little more effort and scan the DHT networks for peers with that file, so elimination of the servers wouldn’t fix the issue. This is how content owners gather lists of IP addresses to threaten and sometimes drag into court.

Generation 3.5: MUTE file sharing. The reason I’ve labeled this as “generation 3.5” is because it didn’t quite catch enough momentum to grow, and because it still suffers from many security issues that have plagued P2P sharing since the beginning. My solution to the IP address revelation problem is more complicated than MUTE’s, but the essential idea is the same: pass data to peers who then pass them along to their peers, with the origination IP address not included. MUTE had the breakthrough idea for largely killing the IP address problem, but it seems that all effort went into the design of the routing scheme and algorithm, while tackling other logistical flaws was put on the back burner.

The most serious of these are the various forms of poisoning: index poisoning, where bogus index results come back, sometimes in huge enough quantities to make locating the intended data extremely difficult and frustrating; and file poisoning, where the “bogus” index results return real files that do not have the content expected. In the days of the FastTrack network, this became very common, with the worst example being MP3 files containing the first 20 seconds of a song looped repeatedly and cut off at the same track length as the original song, meaning that a cursory listen to the beginning of the MP3 to verify its content would “pass the test” while the MP3 would not actually be what was desired.

More Gen3-esque Software: Perfect Dark and Freenet. These programs have routing constructs similar to MUTE, and combine encrypted caches on the hard drives of users of the network as their “storage.” The only way to retrieve a file is to request it by its “key.” These networks add deniability to the storage of the data, since there’s no way for the user to know what’s in the encrypted data store. Unfortunately, these programs also suffer some issues; Freenet is designed to work like the Web rather than to share large files, and tends to be fairly slow and/or unreliable for that purpose (unpopular content in particular will slow down and eventually just vanish). Perfect Dark uses DHT, so it is no more secure for uploaders and downloaders than any other DHT implementation. Some users of Perfect Dark have been arrested in Japan for uploading popular television series, proving that anonymity is not protected by Perfect Dark in any meaningful way.

The next generation of file sharing programs has to fix the IP address issue completely, while also combating other major security problems (like poisoning, denial-of-service attacks) that have gone insufficiently addressed in previous peer-to-peer file sharing programs.

Don’t get too excited, but here’s where I am going with this: I am hesitant to announce vaporware, but given the amount of interest in my posts regarding copyright infringement notices and my own casual interest in the chilling effects of copyright trolling on free exchange of information and ideas, I have been working out the details of a fourth generation file sharing protocol that solves almost all of the issues surrounding file sharing’s general lack of anonymity and ease of censorship through lawsuits and settlement demands/threats.

I thought about how to fix the problems with torrents and DHT systems such as Kademlia. The solutions that came to mind seemed obvious, the practical applications that I began to come up with were full of glaring holes. When I solved the problem of tracking down an uploader or downloader by IP address, which is the obvious problem with all current systems, as the lawsuits and settlement demands clearly show, I thought I was a genius and wondered why no one else came up with the same solution…until I found programs like MUTE which work in a similar fashion. I thought about the problem in more depth, and realized that my perfect little system for losing the traceability of the IP addresses was merely the tip of the iceberg. DoS attacks, index and file poisoning, hash collisions, plausible deniability, man-in-the-middle attacks, and “Sybil attacks” are just a portion of the problems that have to be solved, and I think I’ve answered most (if not all) of these issues.

At some point, I’ll need help testing and implementing this, taking it cross-platform, and getting the word out about it once it’s confirmed to work as expected and stress tested in the real world. For now, I’m writing this to let my readers and the Internet at large know that the problem is being worked on. I look forward to the day that copyright trolls are, in a technical sense, neutered.

Here’s to my ideal P2P file sharing vaporware. When it’s more than an idea on paper, I’ll make a new post and link to it here. Stay tuned, everyone; this will be interesting.

Advertisements

UPDATE: I’m working out the details of a next-gen P2P file sharing program that should fix up most of the problems with P2P file sharing today, including the IP address targeting issue that spawned this article in the first place. I also found an Ars Technica article on why IP addresses aren’t enough to find file swappers.

COMMENTS ARE WELCOME AND ENCOURAGED.

I have always wondered how it is possible to prove in a copyright infringement case that peer-to-peer file sharers and Internet file locker downloaders are individually responsible for what they’re accused of, short of a confession by the person being targeted. I thought that it’s about time to place my logic here. Feel free to post comments poking holes in this logic. (Comments are moderated, by the way…people seem to wonder why they don’t appear immediately, so please don’t double post.)

An IP address is not a computer, and a computer is not a person. You ultimately must sue a person; not a computer, and not an IP address. That’s obvious.

Putting a person behind a keyboard through evidence is nearly impossible. Let’s use an analogy where instead of proving that infringement has occurred, we’ll discuss proving that I posted this article. How can you prove that I am at my computer right now, posting to this blog? You almost certainly can’t. You know it’s being posted, sure, but the challenges that can be mounted against proving the identity of the poster are quite intimidating:

  • How do you know which of my devices I’m supposedly using? You might say “by the IP address it’s posted from” but if it was posted from the static IP of my business, it could mean that someone in my business gained access to my account, or that someone broke in and used my already-logged-in account on an unlocked computer, or any number of other possibilities.
  • Even if you can point to a device, how do you know that I was in control of the device at the time that the post was uploaded from it?
  • Assume you can prove that I was using a device exclusively at the time of posting and that the post came from that exact device. How do you know that malicious software didn’t do it? How do you prove that I took the actions at the keyboard that posted the content, and not something else that might have been on my computer?
  • Assume you proved all of the above, plus in a forensic examination of the hard drive of my system, you could find no evidence that malicious software of any type was present. It’s just as possible that an infection was present in RAM that does not write itself to the hard drive (thus only working until system shutdown). The instant I shut off the computer and provide it to your computer forensic investigator to comply with your discovery subpoena, it would be wiped out, leaving no trace. This isn’t necessarily likely since most malicious software authors want it to persist across reboots, but it is very possible and such an infection would be nearly impossible to make antivirus signatures for or analyze due to the fact that all traces of it are lost at reboot or power down. (There are possible ways to catch it, but they’re very difficult and likely also beyond the skill sets of most casual computer programmers, including myself.)

Keep in mind, all of this isn’t proving that I posted this blog post. This (except the point about infection that’s only in RAM, to some extent) is the process of proving that I was merely capable of doing so. It’s the digital equivalent of proving that someone had a gun in their hand while proving that a murder was perpetrated by that person: the tool is present, but they still have to aim and pull the trigger. How can you prove (once you somehow manage to meet the burden of proving everything above) that a person pulled the trigger and downloaded a copyrighted file? I can only think of one way: show that the computer in question actually downloaded the file over the Internet. The only way that this can be possible is through ISPs logging all packets in and out of your computer or through the copyright holder uploading the file to you. In the latter case, you’d have a solid argument that they gave you permission by offering the file up in the first place, which is almost certainly why no copyright trolls can show traffic logs of this nature. ISPs cannot possibly archive every packet that travels across the Internet (imagine trying to archive everything that flies over a 10 gigabit network connection; unless you have a storage device that can store a gigabyte per second and has millions of gigabytes free, it isn’t happening.)

I just don’t see how anyone proves definitively that someone was responsible for something over the Internet without the targeted person spilling too much information. What do you think?

[Added 2012-12-11] In the case of the majority of file sharing software, files are distributed in pieces that are significantly smaller than the total size of the file. Even if you can prove that someone joined a network and started swapping partial pieces of a file back and forth with absolute certainty (which we have established is extremely unlikely if going by an IP address alone), arguments can be made regarding this distribution method that weaken the case of someone attempting to prosecute:

  • Pieces of a file are almost universally useless on their own: The pieces of a file that are shared are generally of very limited use on their own; in the vast majority of cases, without the first piece of a file containing header information that lays out the format specifications of the file, pieces are often completely useless and might as well be random noise. One could argue that having an unusable collection of pieces of the file cannot be considered infringement, because (depending on the file format) missing the header data, the end-of-file data, and/or intermediate data required to connect pieces is sufficient to make it impossible for the computer to reproduce a copyrighted work or a portion thereof from the incomplete file. Video streams in particular encode “key frames” every few seconds, and between those key frames, the only data is what has changed between each successive frame; thus, damage or missing data for a single frame in a video file will render hundreds of video frames thereafter useless.
  • Did you verify the file data solely from the uploader you’re prosecuting? The architecture of most peer-to-peer file sharing networks is such that downloading a file’s pieces is massively multi-sourced across many users with low upstream bandwidth. It is nearly impossible that any given downloader will acquire the entire file from a single uploader, and particularly in the case of large files such as feature-length DVD movie rips, even if an uploader sends the file at 90 KB/sec (not unusual for a decent DSL package) a typical 702MB (CD-length) DVD rip would require 133 minutes of the uploader sending the data solely to the downloader at full upload speed. Needless to say, a combination of client throttling, possible ISP throttling, multiple uploads at once, and other factors pushes the typical home DSL connection’s contribution to a peer swarm closer to 5-10 KB/sec (based on my own experiments with monitoring individual peer bandwidth while downloading torrents of Linux install DVDs, most peers appear to contribute 10 KB/sec or less at a time.) The chances of obtaining a file from a solitary uploader are very slim, and it could be argued that if the copyright prosecutor didn’t download the entirety of the file data from the targeted uploader exclusively, then they are prosecuting that uploader based on file data from other people. This would be no different than someone giving the rights holder two pieces to a puzzle that shows a pattern of random dots until completely assembled, the holder getting the rest of the pieces from 50 other people, then prosecuting all of the people for offering out the entire infringing puzzle based on the revealed image of the fully assembled puzzle based on their obvious possession of only a fractional piece that is not even viewable without the other pieces. Failure to verify that the person has transmitted a complete, usable copy of the infringing file is not convincing when the individual pieces without all other dependency pieces are effectively random noise.
  • Most peers in a P2P file sharing network don’t even have the entire file in their possession to offer for upload in the first place. If the person in question doesn’t have the entire file, they aren’t in possession of the copyrighted work. For reasons outlined above, a partial file is effectively useless; without verifying that the infringing party is “seeding” (has 100% of file pieces and offers 100% of those pieces as downloadable from them) the prosecuting party cannot truthfully state in a court of law that the target possesses the copyrighted work without committing perjury.

I’m interested in any comments on this subject, or any points that I might have left out.

HUGE FAT WARNING: I AM NOT A LAWYER. If you need legal advice, GET A REAL LAWYER.

I have a dedicated site for my guide on what to do if you receive a DMCA complaint or copyright infringement notice/settlement “offer” threat from your ISP.

Update 5, 2012-12-06: I’m working out the details of a next-gen P2P file sharing program that should fix up most of the problems with P2P file sharing today, including the IP address issue.

Update 4, 2012-10-18: Added a rambling post containing my thoughts on why it’s impossible to prove that individuals infringed over the Internet without their own confession to doing so.

Update 3, 2011-11-02: Added a new post with an analysis and the actual text of one of these notices.

Update 2, 2011-11-02: My little site at http://copyright-infringement-notice.com/ has been massively updated, including a guide for people who are panicking and feel a need to do immediate damage control.

Update: This is one of the most popular pages on my entire blog now…so, I’m now running a small website that provides information about copyright infringement notices. Check it out at http://copyright-infringement-notice.com/ and give me additional ideas, suggestions, or information to make it better!

I generally keep myself aware of what’s going on with the whole peer-to-peer file sharing scene, particularly because the case law it generates changes the nature of copyright law in this country, and as someone who writes software, I need to know about such changes.  Additionally, because I download a good number of legitimate files from BitTorrent trackers (i.e. Linux distribution CD images), I want to know what I’m stepping in.  I’ve noticed a very disturbing trend over time which concerned me enough to finally write a whole blog post:

“Copyright cops” who threaten users of BitTorrent trackers frivolously pursue anyone whose IP appears on their radar and their evidence would not stand up to even the most trivial review.

That’s right, companies such as BayTSP, Copyright Enforcement Group, U.S. Copyright Group, and other paid agents of large media companies are bringing claims against torrent users without even collecting evidence of infringement.  For example, the University of Washington was able to trigger a DMCA copyright infringement cease-and-desist notice being sent to their technical department.  The copyright cops caught the user at this UW IP address RED-HANDED, INFRINGING ON THEIR COPYRIGHT!

The IP address being accused of BitTorrent-based copyright infringement belonged to a network printer.

No, I’m not kidding.  The recording/movie/television industry copyright “enforcement” corporations accused their network printer of stealing movies.  That’s how easy it is to be wrongly accused.  But what else?  There’s another experiment from 2007 which was performed with a specially written BitTorrent client which explicitly did not download nor upload any material, only jumped on a tracker and added itself to peer lists.  This client, which was designed to be incapable of actually infringing copyrights, generated copyright infringement notices from BayTSP despite the fact that such infringement was simply not possible with that application!

I find this to be absolutely ridiculous, particularly because of the nature of these notices.  Many of them are also legal threats.  Regardless of innocence or guilt, any filing of a lawsuit against you costs money to handle, and if it’s so easy for these automated copyright scanning processes to both target the wrong person entirely AND target people who didn’t provably upload or download file data at all, that doesn’t bode well for any of the parties involved.  It’s fairly obvious that the “copyright cop” companies are basing their claims of infringement solely on the population of BitTorrent trackers’ peer lists.  They don’t actually download the entire file from you and keep logs that show they did so as evidence that you indeed infringed on their copyright; they merely see your address in a particular list and send off the notice.

Study 1:  http://dmca.cs.washington.edu/

Study 2:  http://bmaurer.blogspot.com/2007/02/big-media-dmca-notices-guilty-until.html

TechDirt article on this topic:  http://www.techdirt.com/articles/20100401/0846028831.shtml

What’s even more outrageous to me is that these companies advertise their services as being unethical right off the bat.  They resort to legal threats and mass lawsuits against “infringing parties” but they advertise it to content owners and rights holders this way:  “Monetize copyright infringement!  We can bring you income from a surprising source: people who download your content illegally!”  It’s not even about doing the right thing, it’s about the bottom line, meaning they have no reason to care about innocent people being caught in the dragnet.

Despite the risk of a lawsuit, if you happen to receive a DMCA copyright infringement notice which is forwarded by your ISP, either by email or regular mail, here’s my advice:

  1. DO NOT EVER CLICK ON ANYTHING IN AN EMAIL, VISIT ANY WEBSITE IN A LETTER OR POSTCARD, OR OTHERWISE REPLY OR MAKE CONTACT IN ANY WAY WHATSOEVER! You run a plethora of risks if you respond in any way, even indirectly such as by visiting the “copyright cops” website out of curiosity.  They can fingerprint your computer, you may be implicitly admitting guilt even if you’re innocent, you could hand them personal information such as your full name by accident…the list goes on.  DON’T DO IT.
  2. Read the studies above, as well as any other relevant material you find online such as articles on p2pnet.net, just in case anything happens.  If you end up in a bad situation, you need to be able to educate your lawyer on how their infringement detection tactics are grossly flawed.  Be prepared, JUST IN CASE.
  3. If you really did infringe on someone’s copyright, do the right thing. That means disposing of the things you’ve downloaded and putting yourself in a position where you’re less likely to end up with more infringement notices.  That doesn’t mean admitting guilt. Don’t ever admit guilt in any way, just delete the downloads, stop downloading stuff you shouldn’t be, and shut up about the whole thing.  Admitting ANYTHING is just plain begging for a lawsuit.
  4. If you’re truly paranoid, back up your data, zero out your hard drive using something like the Tritech Service System (running “dd if=/dev/zero of=/dev/sda” will do it on almost any computer out there), and reinstall clean so there’s no evidence left behind.  If you get in a legal fight and your computer gets subpoenaed for discovery, you can’t do this, but there’s nothing stopping you from doing as you please with your hard drive before receiving a subpoena.
  5. Most ISPs won’t kick you off their service for this.  Don’t respond to the ISP unless you receive direct threats from them.  If your ISP threatens to disconnect your service, use the information in the experiments above to explain to them that these people are making claims for which they have no real proof, and that you are not infringing on anyone’s copyrights.  Remember that the ISP has no reason to boot you unless you’re a very egregious media thief, and if that’s the case you probably can’t read this by now anyway.

As a creator of copyrighted works, I can’t condone the piracy of copyrighted material, but I also feel that the major media industry corporations have gone way too far with their “sue them all” tactics.  If someone pirated my creation and I found out, I wouldn’t threaten them or demand a settlement payment so quickly; I’d ask them to do the right thing and just pay up for it if they liked it (or toss it if they didn’t and tell me why so I could make it better.)

Don’t steal stuff, but don’t let big companies steal from you for something you didn’t do either.

It would be nice to hear from a real copyright lawyer on this issue.  Feel free to comment, especially if you’re a lawyer.  I don’t post email addresses, your comment will be as anonymous as you name it to be.

%d bloggers like this: