4 – From sources to data to databases

Historians have collected extensive experiences with the quantification of serial historical sources such as registers. Quantitative approaches history and its derivatives were increasingly used in beginning in the 1950’s and reaching the height of their popularity in the 1970’s. The idea of a quantifiable past with its promise of reproducibility and thereby adherence to the scientific method seemed within reach. It is however considerably more difficult to extract quantifiable data from narratives. In this session you will get hands-on experience with the extraction of network data from a first-person narrative of a Jewish survivor of the Holocaust, based on methods developed in Qualitative Data Analysis.

— make sure that you bring a laptop to this session —

Required Readings

 

 In class

Code matrix for relations:

code matrix relations

Code matrix for attributes:

code matrix attributes

 

Here are the links to the spreadsheets you will use to enter the data: Group 1 / Group 2

Here is the shortened version of the codebook

Start on page 15 of Ralph Neumann’s text. Read through the page carefully then begin coding the relations he describes.

 

Tools of the day (Patrick)

4 thoughts on “4 – From sources to data to databases

  1. 1. All the readings this week deal with the concern about historians’ use of big data. However, none of the authors provides clear definitions of big data. What constitutes “big data”? What makes “big data” different from other types of data?
    2. Erickson shows a good example of how graduate students can construct their own personal database, take notes and reshuffle them again and again to make historical account. His methodologies enabled him to see larger patterns of his sources. However, one thing we should concern is ecological fallacy. How can we explain patterns from the data and connect it with individual cases?
    3. Theibault’s essay offers a concise overview of how historians have dealt with visualization over time. According to him, digital projects quickly have enlarged the capacity of online interactive visualization. While these projects are innovative in various ways, I don’t think users are greatly different from the two groups that Theibault mentioned as “the readers of quantitative histories of the era.” (“those who read the text and assumed the charts and graphs confirmed what was said there and those who read the charts and graphs while paying scant regard to the text.”) It is largely because in these digital projects, I do not see places where people share their findings from interactive activities. Do you agree? What do these projects suggest us as users and potential evaluators?
    4. Theibault also mentions that the potential deficiency of network visualization is that it requires a “mathematical principles that have little relation to lived human lives.” Human lives contain aspects of contingency and irrationality that do not always produce rational solutions. Is there any digital works that deal with these problems and offer some insightful solution?

  2. 1. Erickson demonstrates the utility of research databases, but also emphasizes how his original categories had the potential to restrict the way he conceptualized his archival research. How might we organize information in a way that retains the connectivity between ideas, questions, and sources in our research that overlap traditional categories such as “race” and “gender”?

    2. How might we encourage flexibility in our workflow as our research questions and interests evolve and change?

    3. In conversations about big data, there seems to be two major conflicting approaches to method. Some would make use of the large number of sources now available to develop “comprehensive” projects, ala Dan Cohen. Others would use tools like text mining to sift through vast amounts of material to find appropriate sources for close reading. How might we think about these approaches in regards to the various visualization projects discussed in Theibault?

    4. How might we visualize questions about daily life or individual experiences?

  3. 1. Design elements are important in the visual representation of historical information. Is it possible that these representations are subject to some of the generalizations made about historical facts through their design? While they may represent true facts, do the visual elements, in themselves, represent a certain argument?

    2. As Owens illustrates, the potential of digital data is still largely misunderstood. Data that is challenging to collect is often shied away from. How can we create a greater motive for the research of the historically important contexts of ephemeral digital content?

    3. Our cultural ideologies often affect how we organize information. Since archives themselves are often subject to this effect, how can additional organizational metadata developed by historical researchers be incorporated into existing collections to improve the quality and scope of research materials?

    4. There is a push for the greater use of quantitative analysis in certain research areas. Will the disregard of more subjective interpretations have long term effects on the way that researchers analyze historical data?

  4. 1. Ansley Erickson discusses the advantages of organizing notes into relational databases, arguing that they allow more flexible categorization and recategorization as well as better search functionality. However, might not the underlying structure of this new database impose limits on note-taking and interpretation, just as the system of notecards (after all, another form of a “database”) does? What might those limits be?

    2. Erickson’s notecards are a valuable distillation of many disparate sources which she found in her research which surely represent a source of scholarship in their own right. After she publishes her work derived from those cards (presumably a printed monograph), do you think it would be appropriate or valuable for her to potentially share the contents of those cards with her fellow researchers? Or should this process of forming ideas and categories from sources always remain a solitary act, to be discarded or filed away after one’s research is complete?

    3. Theibault says that some “difficulties in interpreting innovative visualizations … are caused by a simple lack of familiarity with them” which might be “overcome by building more such sites.” It seems curious that he mentions that many visualizations are poorly thought out or obtuse, but suggests that the solution is to produce more of them. Last week I asked about the role of outside experts in digital history projects. Might information visualization specialist be one such type of advantageous expert? Further, Theibault mentions that “information design” literature is focused mostly on journalism and commerce-focused audiences. How can we change this?

    4. Trever Owens refers to a sample application of processing texts known as N+7, in which every noun is replaced with another one located 7 entries ahead in a a dictionary. In one of the examples he links to (“Hacking the Accident”), Mark Sample explains that this method is “an attempt to divest the creative expression of two hobgoblins that haunt the modern age: the myth of the muse-touched creative genius and the equally debilitating reductivism of the Freudian unconscious.” This would seem to be an example of a changed “frame of understanding for a particular set of data” referred to by Owens. Is it a useful one? Is it a compelling reason to make something processible? And further, what other good reasons are there for making texts processable other than statistical analysis or decadent philosophizing?

Leave a Reply

Your email address will not be published. Required fields are marked *