Saturday, May 13, 2023

Thoughts on Wiggins and Jones, How Data Happens

This morning I finished reading Chris Wiggins and Matthew L. Jones, How Data Happened: A History from the Age of Reason to the Age of Algorithms (Norton, 2023), based on a course the two teach at Columbia University. It does one of the things that a good history should do. It makes clear the contingent nature of the present, how things could have turned out differently but for specific events or decisions, and argues that we have more choices than we think. 

It also did something that great histories do, it made me see a subject in a new light. In the past, I have read books and articles discussing how data has been used in one fashion or another, whether it has been how it was abused by "race science," how data processing was used to facilitate the Holocaust, how governments have amassed huge archives of often incompatible and sometimes irretrievable data, how data facilitates surveillance, and much else. This book focuses on the parallel evolution of our understanding of data and of statistics over the last quarter-millennium or so in Europe, America, and India. 


Wiggins and Jones move through the early history of these subjects and explore how they were shaped by concerns about race and eugenics. So much of the early history of statistics was shaped by the concerns and beliefs of its founders about race or the possible decline of the "white race." "Moral panics can create new sciences," they note (p. 35). It was also in this era that conflicts between those who wanted statistics to be more grounded in mathematics and given a sound statistical basis arose with those concerned with applied problems of engineering, government, and business. 


The middle of the book covers the effects of World War II, the Cold War, and the growth of what we loosely call AI. The conflict between the two sides (mathematical and applied) continued. For a time, it appeared that the mathematical statisticians had won the field, and the early history of AI was shaped accordingly. It was more complex than that of course. There were smaller conflicts within the larger ones, more nuanced differences of what was thought important, and real differences in world views. The different factions have won and lost a number of skirmishes, and the results of that shape the present hype, cautions, and battles over generative AI and ethics. 


One point the authors make in the middle of the book (123-124) is that Alan Turing, whose name is bandied about so often in these discussions, had a "capacious vision of intelligence" drawn from the human and the animal world, and including much more than logic and reason. The (mostly) men who began to develop the field of AI after him, concerned about building bombs, making money, breaking codes, or on more theoretical objects, narrowed that vision to calculation, data processing, and logic. Put another way, we have been bequeathed to us impoverished visions and expectations of AI. 


The final section concerns how financial, social, political, and ethical factors have shaped the world of data that now surrounds and penetrates our every moment. This is where Wiggins and Jones really bring forward the contingency of the present and future.  Their backgrounds are important here. Jones is a professor of History at Columbia University. His previous books have been about the Scientific Revolution and the early history of calculating machines. Wiggins is an associate professor of Applied Mathematics, but, perhaps of more importance for the insights it has afforded him in the final chapters, is also chief data scientist for the New York Times. The authors understand that the way we handle data, AI, ethics, privacy, and related issues are going to have an outsized importance in the future. They are concerned with the forces and structures that produced this situation and how those can be changed.


As I read these chapters, I began to understand more about the conflicts between different AI factions that have become so prominent and vehement over the past two years, especially since the release of ChatGPT. These go back to the beginning of AI, but they also reflect approaches to ethics and governance of AI that emerged between those who would try to encode ethics and governance into algorithms, reducing them to rules, and those who understand them in terms of larger, human and political contexts.


They packed a lot into just over 300 pages, and they did it in a readable way. It is a good read. There is so much more to their topic. This book whetted my appetite for more. 

No comments: