Many organizations (official or grassroots) have objectives which exceed their capacity, i.e., they have fewer resources than they think they need. In order to either better place limited resources, or to improve processes generally, some of these organizations have taken to collecting data about their objectives and use of resources. For a drought management agency in the Horn of Africa, this might have to do with the location of agripastoral communities and their access to water. For a school district in Michigan, this might be test scores or (better yet) teacher attendance. By documenting historical data and changes linked to actions taken, an understanding of whether or not a goal (equal representation, access to resources, etc) is being reached is more grounded in reality. Data, like all things, is political. What data is collected, how it is collected, where it is stored, to whom it is visible, and who gets to act on it can re-centralize power or become mechanisms of accountability and community empowerment.
This post explores how police departments have been collecting data about the location and types of arrests made as a way to track how much crime is happening in a certain place, as a way of placing their limited resources (cops and their weapons) more accurately (to their eyes). But of course their data has to do with arrests, not crime, and their definition of crime is still based on enforcement of law. This use of force, already untenable, can be seen by some as “unbiased” when based on data. Here we explore why this is not only inaccurate but will further embed systemic racial bias, while maintaining that data collection and subsequent action can be a useful thing when led by the communities themselves. Here, we specifically address questions of large sets of data against which algorithms can be run, and how we can make choices to maximize benefit and mitigate damage of these operations while transitioning from the world we’re in to the world we want.
I anticipate the audience for this blog is more acutely aware of things like state-sponsored surveillance, malware used by abusers to further control others, or circumvention tools than the usual crowd. But there is more to the technology and abilities of networks than just these components. Let’s talk about the data that networks generate, the algorithms by which that data is navigated, and how data is acted upon. One end of the arbitrary spectrum of action is enforcement – an external party exerting force in order to maintain the rule of law. The other end is data-driven introspection – an individual or group of people generating data for tracking changes within their own control. This article explores how to understand and increase the likelihood of just actions taken based on data and algorithms.
The Double-Edge of the Master’s Blade
My favorite patch says love the machine, hate the factory. I can love aspects of current life but hate how we came to have those things. And the thing about data, about algorithms, is that we have an ongoing chance to make the same (or similar) everyday choices in very different ways. We have Open Referral as a way to maintain access to human services without centralized authority. We have pockets of groups in conflict zones who decided not to war with each other, not because of pacifism, but because they didn’t get sucked into the rhetoric of the conflict. The practices of those communities are being shared and replicated by other groups through data-driven introspection.
One group in particular embodies just how convoluted data and algorithms can be. The Human Rights Data Analysis Group (HRDAG) has used data to help bring the perpetrators of war crimes to justice, to bring closure for families in Colombia (only available in Spanish at the time of this writing), and to help Amnesty International understand detention centers in Syria.1
But in addition to using data analysis, HRDAG has recently called into question how data analysis is done, and who is doing it. Their most recent report, To Predict and Serve, is about predictive policing in the US. Its thesis is that the biases in law enforcement are being encoded into the data informing choices in where to focus future policing. In systems thinking, we refer to this as a “positive feedback loop” – the output of something is likely to cause more of that output to happen.2 This is both likely to increase policing in already over-policed neighborhoods and to remove the link of accountability to a human making choices – instead pointing at “hard data” and “unbiased algorithms.”
The turn of phrase used in data science for getting biased outputs from unbiased algorithms is “garbage in, garbage out.” It means that if biases, gaps, or other issues exist with a dataset, any analysis performed on it is trash. The algorithms selected to run on the data will also be biased. Ergo, if we assume law enforcement to date has generally been biased3, use of that historical data to inform future choices will also be biased.
Who Creates the Data?
But how could we know if the data police departments are using for predictive policing is garbage or not? It’s difficult to have ground truth (the idea that data should approximate reality) for anything, but it was especially difficult for HRDAG to understand ground truth4 related to crime in historically targeted neighborhoods where people are not likely to report due to Legit Reasons.5
Their research points to the data police departments are using to inform their predictive policing models are indeed based on past policing data, which means all the biases (intentional or otherwise) of enforcement are simply being transmogrified into this new system. HRDAG points out that “police records do not measure crime. They measure some complex interaction between criminality, policing strategy, and community-police relations.” It’s important to point out that even their report is misaligned with the types of data PredPol (the group they studied) was aiming to provide for police to take informed action.
The problem here isn’t just about the generation of data—the arbitrary boundary of issue is how the data is acted upon. What if data was gathered for self-improvement (for an individual or for a community), rather than external intervention? Communities might become self-improving through data-driven introspection. If we strip away the external enforcement aspect, and instead think of behavior many might not want in their communities (sexual assault or other exertions of power) rather than “crime,” data could be useful. Kate Crawford calls the data that communities choose to collect and examine for themselves “small data.” The work of those groups of people opting out of conflict referenced in The Double Edge of the Master’s Blade section? The systemization of their patterns which have allowed others to follow suit is small data. The group decides what is important to them to track as indicators of overall trajectory—maintenance of a fence, quality of a shared water source, how safe vulnerable members of their community feel. That data being immediately available to them allows discussion, choices, and actions to be taken which can improve the group’s circumstance.
In light of this positive generation and use of data (but also of bringing genocidal fucks to justice), should we be using these data tools to tell our stories and to fight our fights, or should we be disavowing the data and algorithms as well as the police? One of the things that’s always been in conflict for me about anarchism is that its limitation of harm (no institutional violence) happens by avoiding institutional legibility, de facto limiting our ability to do large acts of collaboration (space, medicine) because we are not employing the scientific method on ourselves.
The issue is not that of data, or of algorithms, but (as always) imbalances of power and control.
Wielding Power through Encoding
We can take every day actions and work with our communities to determine if and how we’d like to collect data on ourselves, how to keep that data safe, what sorts of algorithms we’ll run against the data, and what sorts of actions are okay to take. Let’s take this back out of the dataset and into the algorithms applied against the data. We use our own algorithms/heuristics, whether in software or mentally, to organize at network scale against nonconsensual power structures. Defining our patterns and algorithms for examination and iteration increases the risk of being tracked. Once explicit, the same sorts of algorithms we use could instead be applied to identifying organizers and participants of actions. But institutions are also selective in who they choose to “see,” as found in issues of facial recognition software failing to see people of color. And while that might seem super useful when wanting to act in stealthy ways, it further embeds and abstracts issues of inequality.
When the touchless sink fails to see me, I think “I’m invisible to machines!” Which will come in handy when Skynet becomes self-aware.
— Jordan Bunker (@TensorFlux) April 8, 2013
Joy Buolamwini is critiquing and making alternatives to these algorithmic biases with her Algorithmic Justice League. This proactive approach both accounts for the flaws of our current world while also aiming towards one we’d rather be in. I’d like to follow her lead – if we want to prevent “garbage in, garbage out” while we build tools to help ourselves organize at scale without State intervention, we need to be paying attention to what data we’re gathering and who is at the table. And when it’s the State collecting the data, we should consider how to strategically feed parts of it garbage as one of the tools in our toolbox. The difficulty will lie not only in making these choices, but also in maintaining integrity in our own self-reflection.
- Data-backed group collaboration which is aware of (and combats) inequalities represented in that data
- How most big data projects disempower folk, how to combat that
- An institutionalist viewpoint on data science
- Queer data
- Data scientists, humanitarians, etc working together to define “responsible” generation, use, storage, destruction, etc of data
- Academic practitioners working on community representation, empowerment, and technology
- Academics studying the overlap of data and society
- There’s a whole conversation in here about baseline human rights and the enforcement of that baseline. One of the (many) reasons I’m a terrible anarchist is that I like the externalization of societal values into something as clear and malleable as law.
- See also: Capitalism.
- Spoiler: it has been.
- Their methods for creating a non-biased dataset without having this real-world information but also without the garbage existing data went way over my head, to be honest. It seems to make as much sense as anything?
- Why call the cops when they’re more likely to hurt someone?