A co-founder of social-networking news aggregation site Reddit has been charged with stealing data, not distributing it, a charge that could get him up to 35 years in prison and a $1 million fine.
Aaron Schwartz, 24, was indicted today on charges of data theft over allegations he spent months using MIT's wireless and wired networks to download more than 4 million scientific and academic articles distributed by JSTOR – a commercial service that charges universities as much as $50,000 per year for access to the information.
To get the articles Schwartz allegedly played digital cat-and-mouse with security for both JSTOR and MIT to hide his equipment, storage devices and physical presence. After being chased around the wireless networks for weeks, he eventually, allegedly, broke into a wiring closet to get a physical connection that gave his computers more reliable access but made the whole process more risky for him.
He was arrested Jan. 6 and indicted today for computer fraud, wire fraud and damage to computer systems during the commission of a crime. A statement from JSTOR said the company retrieved all its data and is cooperating with the U.S. Attorney's office that is prosecuting the case.
Schwartz has not yet said anything publicly about the charges or his opinion of JSTOR's commercial service.
In 2007, though, according to his personal site Schwartz founded the nonprofit Open Library, a site designed to be to books what Wikipedia is to concepts. Its goal is to have a web page offering specific information about background, contents, point of view, availability and other information provided by members of the public, on every book ever published.
According to the indictment, Schwartz was trying to download as much of JSTOR's database of research and review articles as possible, then distribute them for free.
Rather than use his own legitimate access to JSTOR through his job as a fellow at the Harvard Ethics Center on Institutional Corruption, he tried to avoid detection by downloading the articles through guest accounts at MIT, which is right down the road from Harvard and offers free wireless access to guests who register with the network.
According to the indictment, Schwartz bought an Acer laptop specifically to serve as a download agent, loaded it up with an application called Keepgrabbing.py to automate the download of content from JSTOR and to get around JSTOR security designed to prevent mass downloads.
On Sept. 25, 2010, the indictment says, Schwartz drove to MIT, logged in as a guest to MIT's campus-wide wireless data network under the name "Gary Host," and began downloading an "extraordinary" volume of files from JSTOR.
JSTOR pays science and academic publishers for the right to distribute full-text and graphic versions of their published content, primarily to universities and research-oriented companies. Subscriptions to JSTOR can run to $50,000 per year for a university. JSTOR pays part of the proceeds back to the publications and authors and keeps the rest.
Within hours both JSTOR and MIT noticed someone hoovering up the content faster than any legitimate user would, and cut off access by blocking the IP address for "Gary Host's" laptop – which used the computer name "Ghost Laptop."
Sept. 26, the indictment charges, Schwartz went back to MIT, logged in using a different IP address, and turned the data faucet back on.
This time JSTOR blocked not only Ghost Laptop's IP, but a range of IP addresses in the same neighborhood, cutting of a chunk of MIT from access to JSTOR data for three days.
MIT, warned by JSTOR that someone was trying to download its whole database, cut Schwartz off by banning the MAC address on Ghost Laptop from getting an IP address on MIT's network and began a physical search for Ghost Laptop and its owner in the area of campus assigned the IP addresses Schwartz had used.
That kept Schwartz offline until Sept. 28, when he allegedly spoofed the mac address of another machine, re-registered Ghost Laptop and got a new IP address not in the range he was rejected from earlier.
On Oct. 28 Schwartz showed up again at MIT with a "Ghost Macbook" to which he registered the name "Grace Host."
On Oct. 29 the indictment charges, Schwartz used both ghosts simultaneously to request and donload "an extraordinary volume of articles from JSTOR. The pace was so fast that it brought down some of JSTOR's servers."
JSTOR cut off all of MIT from access to its data for days.
That didn't stop Schwartz, who kept coming back in November and December to download more than two million articles, "more than one hundred times the number of downloads during the same period by all the legitimate MIT JSTOR users combined," according to the Attorney General's note to the Grand Jury.
He couldn't keep up the pace if he had to keep switching MAC and IP addresses, though. He solved the problem by walking into MIT's Building 16, breaking into a wiring closet, and plugging directly into the network through Ethernet ports he found there.
He hid the laptops and a series of external hard drives under a box to keep searchers from finding it, but still had to sneak in to swap out the external hard drives when they were filled.
To avoid being identified by security cameras he held his bicycle helmet up to his face and peered through the air slits.
During one of his hard drive recoveries he was spotted by MIT Police, who asked him for identification. Instead Schwartz allegedly ran carrying a USB drive with a version of Keepgrabbing.py, the automated download program that had been vacuuming JSTOR's database.
He has not yet commented on either the indictment or his plans for defense.