In 2018, Laura Ghianda posted an image of the Venus of Willendorf on Facebook. Facebook flagged and removed the image, deeming the artwork “inappropriate content,” despite four attempts to repeal their decision. In October 2021, in response to Instagram’s similarly heavy-handed content moderation practices, Vienna’s tourist board announced they were starting an OnlyFans site where visitors could view sexually suggestive artworks from the collections of several prominent Viennese museums.
As museums digitize and disseminate their collections through networked platforms, curatorial departments face off against precariously employed content moderators and pornography detection algorithms to negotiate cultural, moral, and artistic distinctions — between art and pornography, original and copy — in real time.
But in this new image economy, much remains unchanged. Wealthy institutions continue to generate enormous social and economic capital by displaying, classifying, and quantifying images of bodies. As the power to determine the bounds of appropriate expression shifts from cultural gatekeepers to libertarian technocrats, whose gaze — colonial, computerized, or some hybrid of the two — is encoded?
These images sit uncomfortably between photograph and data. I found them by searching GitHub’s ‘nudity detection’ tag, in an open source repository called NudeNet: Neural Nets for Nudity Classification, Detection and Selective Censoring.
Tucked away at the end of the repository’s readme notes, following instructions in several coding languages for using the algorithm to classify, detect, and selectively censor nudity on one’s own machine or website, the algorithm’s author had placed a link to a .zip file containing 20,000 images. The images, bedapudi6788 wrote, were a small fraction of “the auto-labelled data” he “used to train the base Detector.”
I clicked the link and downloaded the file, and there, in a folder titled detector_auto_generated_data, was 696.3 MB of pornography and two spreadsheets. One spreadsheet consists of a simple key to the algorithm’s ‘classes,’ coded 0–5: exposed_belly, exposed_buttocks, exposed_breast_f, exposed_genitalia_f, exposed_breast_m, exposed_genitalia_m. The other spreadsheet contains 35,502 lines, one or more for each of the 20,000 images, with information about which category of nudity was detected and its precise coordinates.
Pornography is notoriously difficult to define. Justice Potter Stewart’s infamous phrase, “I know it when I see it,” uttered in the 1964 Supreme Court case Jacobellis v. Ohio, remains more or less the state of the art. “Any explicit sexual matter with the purpose of eliciting arousal” is the vague definition often cited in academic papers where computer scientists detail the design and training of their pornography-detecting algorithms — when they bother to define it at all.
In a literature review of papers describing methods for automatically detecting internet porn, researchers found that 84% simply “did not define pornography,” and, among those that did, “no studies gave the same definition.” They attributed this to the “difficulties in developing a universal definition” of pornography, voicing hope that “a better understanding” will lead to “more consistent and standardized ways of measuring these issues.”
Although published in 2010, the authors’ hoped-for insights have not emerged in over a decade of intensive research and widespread public use. A paper published in 2020 admits that the “currently available method to detect nude image[s] is still crude,” while another concedes that “the term ‘pornographic’ itself is ambiguous.” In a 2014 paper on the evocatively titled “Skin Sheriff” algorithm, its authors, in a brief aside, lament the fact that determining whether an image is pornographic “is not always possible. Even for humans it can be a subjective decision.” Nevertheless, the paper goes on to describe in precise detail the algorithmic process their “sheriff” uses to definitively classify an image into one of two binary categories: pornographic or non-pornographic.
These images are stolen. bedapudi6788, the algorithm’s author, doesn't mention where the images used to train NudeNet come from; their source is assumed. Almost every computer vision algorithm — the technology underlying applications built for tasks like recognizing faces, detecting emotions, and reading license plates — is trained on a massive dataset of images scraped, without consent, from the internet.
On GitHub, it’s easy to find repositories with names like NSFW Data Scraper and NSFW data source URLs. The latter, described as “lists of URLs that will help you download NSFW images” for “building big enough dataset to train robust NSFM classification model,” boasts that “after downloading and cleaning it's possible to have ~ 500GB or in other words ~ 1 300 000 of NSFW images.”
As I scroll through the detector_auto_generated_data folder, I see photographs of thousands of people. Some of them likely posted their images on Tube sites and Reddit forums to advertise their paid content. Some may have shared their photographs freely, as expressions of their sexuality. Many of the images were undoubtedly stolen: reposted from subscription sites by anonymous fans, perhaps, or from phones and hard drives by blackmailers or abusive exes.
No matter how these photographs got here, it’s safe to say that the people laboring and loving in them would not consent to their use in training NudeNet, an algorithm built expressly to expunge the internet of images like them.
There is a direct line from the colonial archive to the machine learning data set. From phrenology to sexology, scholars have traced the integral role photographic archives and bodily measurement played in colonial constructions of race and gender, intelligence and morality, human and something less-than. Today, once again, scientists are using technologies of vision and quantification to transform bodies into data — and using that data to classify, predict, discipline, and erase.
The people captured in NudeNet’s training data are not enslaved. But in their stolen images they are rendered inhuman. Their bodies are painstakingly measured, labeled, and sorted into detailed, hierarchical taxonomies.
NudeNet, and other algorithms like it, learns from these bodies-that-are-data how to flatten complex gradients into binaries — pornographic or non-pornographic, male-presenting nipple or female-presenting nipple, acceptable or censored — whose enforcement brings material consequences. As always, the most marginalized are most harmed, online and off.
Machines only know what we teach them. In the archive of millions and millions of images that constitute their training, what are they learning to remember?
These images are not mine to use. Their existence, in this context, is a violation. The privacy, labor, and agency of the people captured in them has been so thoroughly denied that their consent, or lack thereof, never even occurred to the researchers who used their intimate photographs to train an algorithm.
To the algorithm and its designers, the people in these stolen images matter only in aggregate, each body consumed and reconstituted as one data point among millions. Although they are painfully exposed, they were never meant to be seen individually. As I scroll and scroll, I want to acknowledge and address the people captured in the 20,000 images that make up a vanishingly small slice of NudeNet’s training data.
How do you show something that shouldn’t exist? How do you show something that’s not yours to show? How do you hold the archive accountable without reinscribing the violence that produced it?
For decades, artists, activists, and scholars have asked these questions as they confront and untangle the violations of the colonial archive. Their clearest answers — repatriation and reparations — may be impossible in the machine learning archive, filled with digital objects that are endlessly copied and circulated. But other, less material, practices may offer answers.
Stephanie Syjuco, a contemporary Filipina artist, works with archives of anthropological photographs from the Philippines. She uses her body and a range of formal methods to intervene in them by shielding the subjects from the camera’s extractive gaze. In a recent work, Shutter/Release, Syjuco uses a Photoshop tool called the “healing brush” to digitally remove the subjects from old prison mugshots.
In the now-“healed” images, the documents, along with spectral traces of their inhabitants, remain. But the people have been, in Syjuco’s words, “liberated” from their colonial and carceral environments.
In NSFW Venus, I use Syjuco’s healing brush strategy on these images that are not mine to use. I draw my finger across the trackpad, moving my mouse over some of the people captured in NudeNet’s archive, erasing them from view. Photoshop’s healing brush also uses a computer vision algorithm. The algorithm reads the pixels surrounding the areas I’ve covered, and, drawing on its memories of the millions of images it’s seen, guesses which pixels should fill the void where a person was.
But, as Syjuco implicitly asks, can the harms of the algorithmic archive be healed?