title
ABOUT
  news
  video_installations
  cinematic video
  photography
  drawings
black_spacer writings
  reviews
  cv
  press
  contact
 
Big Data
Lisa Jevbratt, Christina McPhee and Andrea Polli
Brett Stalbaum, editor

Abstract

Artists confront the problems of data density and range in the aesthetic of the sublime. Together with an introduction by Brett Stalbaum, these essays by Lisa Jevbratt, Andrea Polli and Christina McPhee were first published in print for YLEM Journal, Volume 24 Number 6, May-June 2004 (McPhee) and Volume 24 Number 8, July-August 2004 (Jevbratt & Polli), at the suggestion of Loren Means. The YLEM Journal is the bimonthly publication of YLEM, a twenty-three-year-old organization dedicated to the nexus of art, science, & technology. For more information on joining YLEM and to view the YLEM Journal online, visit www.ylem.org.

Sense of Place and Sonic Topologies: Towards a Telemimetic Sublime in the Data Landscape
Christina McPhee

illustrations: digital print series on fujiflex, each 19 x 66 inches, ©Christina McPhee 2003-2004

Background
bg7
slipstreamkonza.2

Global warming implicates the increasing atmospheric level of carbon as a primary agent. Nonetheless, the total worldwide carbon budget, which takes into account all known petrochemical usage on an annual basis, shows that terrestrial systems must be absorbing more carbon than we realize.  According to the carbon budget mathematical models, carbon concentrations should be increasing faster than they actually are.  The hypothesis is that the carbon flux patterns of selected microsystems worldwide may reveal conditions under which more carbon is been absorbed than is being released. On and near Konza Prairie, since 1997, diurnal and annual data are collected as "eddy correlation" or "eddy covariant" flux measurements.  From two of the sites, a located on the Rannels Ranch next to the Konza field station, wireless net carries the live data online for collection and analysis. Jay Ham, PhD, agronomist and climatologist, conducts research into carbon flux dynamics relative to models of climate change, at Kansas State University. He is the scientific partner for the present project, Slipstreamkonza. Slipstreamkonza addresses aesthetics of digital data expression of land as a breathing ecosystem.  The time based data stream of carbon flux is interpreted as rhythmic, virtual expression of sound and image in net based and spatial installation. 

I net and Gaia
Imagine interpolated virtual and actual spaces thrive and decay, die and live in a riparian zone, watered by pervasive computing as a neural territory or intelligent topology, the net acts as if alive. As a place of continuous ruin and simultaneous regeneration, the networked space of electronic communications is re-presenting, itself. A semiotic model may offer us the net as a subjective topology, a synaptic process-space. This space is not silent. Semiotically, it ‘voices’ itself. A model of the net as a live voice finds some echo in analogy to the Gaia hypothesis on the nature of the physical landscape. As life, Gaia persistently self-represents, or emits information about herself [1]. This is an old idea in new dress. “Day by day pours forth speech,” declares the Psalmist. In semiotic terms, a landscape of voice or self-expressive phenomena, as actual, real information—is both a data landscape and sonic topology. Where and what is this place? What is the sense of place in the data landscape?

bd8slipstreamkonza.3

ii topology and telemimesis
You might picture the structure, or topology of data streams, whether in the electronic or in the natural ecosystem, as an invisible domain that persists over, and through discontinuities. The leap across the breaks, or breakdowns, can be expressed musically by means of formal structures of recursion and feedback loops, as in classic cybernetic theory, but also as in Baroque fugue structures. I imagine recursion and flow, between natural data and human/machine, an interpolated, mutual consciousness.

A topology is a word (logos) of a place (topos). A hypothesis about what constitutes this ‘word’ or voice of a place on the level of artistic process is aesthetic in nature and intent. Aesthetically, such a place may be explored as a process of telemimesis. “Telemimesis” joins tele -- vectors across distance in space, as if space is actually layered time—with mimesis, in the Platonic sense of figuration of a prescient or hidden motif.

iii sublimity + entropy
As a visual artist, one may turn a gaze to what cannot be ‘seen’. Here we move into a zone of the sublime. Sublimity refers to that which is below, beyond or immanent relative to an ontological or cognitive threshold. I assume that there is a way of expressing this indeterminate zone, or invisible condition, in both the realms of the physical, cultural landscape and in the interior, “behind the screen” landscape of the net.

As an ecosystem, the data landscape may be described as continually subject to entropy, following the second law of thermodynamics. Life itself may be thought of arising, like a phoenix from ashes, as an articulate resistance to entropy. A continuous dialectic between entropy and the architectural self-structuring process of life means that homeostasis is predicated on breakdown, or ruin. Data stream is not always continuous. Scientific instrumentation for measurement and transmission of physical data may fail. Anomalies of landscape data are not always explicable based on known models. Humans struggle with the limitations of their bodies, including, fatigue, inattention, illness and mortality. A telemimetic aesthetic of the sense of place in the data landscape accommodates breakdown of the ‘language’ of information streams.

bd9
slipstreamkonza.4

iv synaptic recursions
Imagine recursion and flow, between natural data and human/machine, an interpolated, mutual consciousness. [2] The place of flow is sonically expressive, or so the artist hopes. A possibility is that human synaptic pathway performs as a layer of dynamic connotation. Like a trace, or vector, over and through the data landscape, the synaptic layer is human. Maybe we primates collaborate in a system of connotation that is never fully seen, heard or actualized: existing in time, the system is grasped in spatial fragments. Or, maybe --- between ‘natural’ morphologies in the brain/mind, and the remediation of landscape as ‘big data’ , is a space, or place of telemimesis. It is mimetic in that it represents itself relative to a precessive content (landscape data) and does so at a distance (tele) from itself.

bd10
slipstreamkonza.5

Recursion – could it be a metaphor of breathing, in and out, between the inside of the landscape-body and the outside? Within Gaia, imagine an emission of data, and a gathering-in of data. Global climatology attempts to study climate change by collecting millions of data samples of diurnal carbon absorption and release on the tall grass prairie, an ecologically critical environment in North America and elsewhere. The prairie is implicated within the phenomenon global warming in ways that are not clearly seen at present. Slipstreamkonza is an art/science collaborative project that addresses aesthetics of digital data expression of land as a breathing ecosystem and as manifestation of climate change. Slipstreamkonza uses the time based data stream of carbon flux as a basis for a sonic and telemimetic installation. Like the ancient Greek hero, Orpheus, this project attempts to make music from “Hades”, in other words, the invisible domain of data, like a voice from a hidden subject. At the sequence of delays, or layers in time, the ‘sense of place’ seems to be in a feedback loop. Things move in and out of a condition of being nameable, or heard.

bd11
slipstreamkonza.6

vi konza and telemimesis

Konza is the Osage term for “south wind.” Like breath on a mirror, konza is an evanescent imprint of an invisible dynamic. Prairies worldwide capture and release carbon in a waveform breath. At the threshold of the exchange between atmosphere and surface is the life of the planet; the Konza prairie is a site that can be interpreted aesthetically in terms of a telemimetic topology using sonic forms.

Slipstreamkonza can’t be about conventions of scientific visualization, since the researchers already have many ways of doing this in order to better understand the data. Yet, the encounter between the human response and the landscape's self expression as data, is intriguing as a paradox of technology and/in nature. I am interested in the relationship, or dynamic, between the data and the human imagination. Data goes live as a dialectic or interface between paratopic, polyphonic, and polychromatic volumes. Think about interpolation and superimposition, like montage, as virtual and physical spaces, using layers of content that are expressions of hidden data through a semi-permeable membrane, or data transport mode. Maybe time becomes metabolic: it gives rise to a productive structure, composed of intelligent units, or affective artifacts in continuous movement and states of disclosure.

bd12
slipstreamkonza.7

vii conclusion, or another beginning
A moving sense of place gathers its momentum and definition on the fly, like a continuous improvisation that is not entirely responsive to human use and reflection. A poetics of that place, both virtual and physical, in the mixed volumes of fluid media, might give rise to a polyphonic design strategy, where arching shifts between recursion and sonority, darkness and density, transparency and light, processional and volume are as responsive interactive structures in multimedia installation. Like a fold or complex cut in the fabric of the data landscape, sonic topologies become a conceptual practice realm in contemporary art.

bd13
slipstreamkonza.8


top
ph

Notes

[1] Geri Wittig has looked at the Gaia hypothesis relative to the discourse on landscape data, holism and science, and includes a brief, helpful bibliography on this topic, at <http://www.c5corp.com/research/complexsystem.shtml>.

[2] Brett Stalbaum asserts that “data's role in the instantiation of the actual may be a matter of virtual informatic interrelations (or external relations between data sets), forming their own consensual domains that heretofore have not yet been observed as such, but which potentially inflect the operation of actual systems via informational transfer between neighboring systems of interrelations.” (http://www.noemalab.com/sections/ideas/ideas_articles/
stalbaum_landscape_art.html
)

Christina McPhee

 
Atmospherics/Weather Works:
Artistic Sonification of Meteorological Data

Andrea Polli
www.andreapolli.com

PROUD music of the storm!
Blast that careers so free, whistling across the prairies!
Strong hum of forest tree-tops! Wind of the mountains!
Personified dim shapes! you hidden orchestras!
You serenades of phantoms, with instruments alert,
Blending, with Nature's rhythmus, all the tongues of nationsExcerpt from Walt Whitman’s “Proud Music of the Storm” [1]

Introduction
For over ten years, I have been creating art works that translate numerical data to sound, from algorithmic compositions modeling chaos to live improvisation using video analysis systems. Areas of particular interest to my research have been modeling human methods of improvisation in interactive computer systems and using data sonification to illustrate complex information. Visualization is the interpretation of scientific data through the visual image, and likewise sonfication interprets data through sound. Sonifications can help scientific researchers understand data in a different way.

Since 2001, I have been working on the sonification of meteorological data in collaboration with Dr. Glenn Van Knowe at MESO, Mesoscale Environmental Simulations and Operations <http://www.meso.com> a leading firm in the development and application of atmospheric and other geophysical models for research and real-time applications. MESO works with the Mesoscale Atmospheric Simulation System (MASS) to create a highly detailed simulation of the weather based on terrain, initial conditions, and other factors. The atmospheric data sets produced by MESO are extremely detailed, and although they have a variety of visualization tools to interpret the data, much of the data represented is not visual in nature (temperature and atmospheric pressure for example). Through the project we wanted to learn what would happen if the data was interpreted sonically. In April 2003, we completed a series of multi-channel sonifications of two historical storms, a tropical hurricane and a winter snowstorm at five elevations as part of a storm sonification project called Atmospherics/Weather Works.

The Atmospherics/Weather Works project has three primary goals: the development of a software system for the creation of sonifications based on meteorological and other data to be used in performances and installations, live and recorded musical performances, and a web site for the presentation and distribution of the recordings and software. [2]

The first public installation of the project was in April, 2003 at Engine 27 <http://www.engine27.org>, a non-profit organization devoted to the research, creation and dissemination of multi-channel sound works in New York City. A 16-channel sound installation spatially re-creates two historic storms that devastated the New York/Long Island area first through data, then through sound. The resulting turbulent and evocative compositions allowed listeners to experience geographically scaled events on a human scale and gain a deeper understanding of some of the more unpredictable complex rhythms and melodies of nature.

Why is scientific data so often presented as visual information and much less often presented as sound? One reason might have to do with time. A still visual image can be scanned over time, allowing a viewer to study various aspects of an image. A soundscape or piece of music, although it is also temporal, cannot be examined in detail without the destructive process of stopping, selecting, and replaying various parts. Aspects of the visual image are also easily defined by viewers. Specific colors and shapes can be described and understood more often than specific notes or musical phrases. Specific sounds also can have a level of ambiguity. Although some sounds are easily identified (like a barking dog or a cat's meow for example) the source of other sounds are not quite as clear. If noise or an echo interacts with a sound, it is like looking at a visual image wearing glasses that are heavily fogged, making recognition more difficult.

However, unlike a still visual image, music and soundscapes are inherently narrative. For example, as I listen to footsteps and voices outside my apartment door, I can determine that two people are walking up the stairs of my apartment building. I can determine approximately what floor they are on and even gather a little information about their relationship (are they a couple? a mother and child? have they been recently arguing or laughing?) In a visual image, a photograph of a family for example, unless the emotional states of the subjects are highly exaggerated, an observer is likely to encounter a certain amount of ambiguity in determining the relationships between the subjects.

Can an enhanced narrative and emotional content enhance the understanding of meteorological data? Some meteorologists call themselves 'storm hunters'. They travel far and wide at considerable physical risk in order to experience a hurricane or tornado. Is it because the physical and emotional exhilaration enhances the scientist's understanding of the storm? The storms hunters would most certainly answer in the affirmative. They experience the sound, scale, and physical properties of the storm as well as its direct effect on the environment. A storm experienced only through visualization, whether animated or static, does not convey this visceral information. Scientists must use their imagination to create a mental image of a storm's potential devastation. A sonic experience of a storm can benefit communities beyond the meteorologist's lab. If a scientist is alerted by a visceral experience that a storm is likely to cause destruction, communities may be more quickly notified to prepare a proper response to the storm.

Our work represents a part of a growing movement in data sonification research. In 1997, The Sonification Report was prepared for the National Science Foundation by members of the International Community for Auditory Display (ICAD).[3] This report provides an overview of the current status of sonification research and proposes a research agenda. Most significantly to us as interdisciplinary collaborators, the report stressed the need for interdisciplinary research and interaction. Our project is well-suited to sonification according to the findings of ICAD. The data sets produced by MASS are extremely large and complex, and although there are a variety of visualization tools in use to interpret the data, much of the data represented is not visual in nature (temperature and atmospheric pressure for example). The data represented often portrays complex changes over time, an aspect of data particularly suited for sonification.

My personal interest in data sonification is in the artistic creation of new languages of data interpretation. As individuals and groups are faced with the interpretation of more and more large data sets, a language or series of languages for communicating this mass of data needs to evolve. Data interpreted as sound can communicate emotional content, and I am particularly interested in the sonification of data related to the atmosphere and the weather because of the long history of the weather used as a metaphor for emotion in the arts.

2 Project Planning
The project began when I met Dr. Van Knowe in the summer of 2001 at the first meeting of Bridges, an International Consortium on Collaboration in Art and Technology, a joint project of The USC Annenberg Center for Communication & The Banff Centre for the Arts New Media Institute [4]. Dr Van Knowe had joined MESO as a Senior Research Scientist after 24 years as a meteorologist for the Air Force. He was Chief of Meteorology at Rome Lab in New York where he directed the meteorological aspects of all research and was chief of the modeling and simulation development branch for the Air Force's Combat Climatology Center (AFCCC) at Scott AFB, IL.

Dr. Van Knowe and I brainstormed at that meeting and then continued to communicate via email and telephone to develop a project plan. After developing a proposal and being invited to participate in one of the first spatialized sound production residencies at Engine 27 to create a storm sonification, we met at MESO to plan the project. We wanted to create a spatial sonification of one or more storms that occurred in the New York area in the recent past in the hopes that some members of the audience would remember the specific storms.

Dr. Van Knowe and Dr. John Zack of MESO suggested we try to create a sonification of a major winter snowstorm that in 1979 was not foreseen by the existing meteorological models and inspired years of research and development into improving the models. The "President's Day Snowstorm" initially formed as a weak wave of surface low pressure on a front in the Gulf of Mexico on 18 February 1979. Since this storm was not predicted by the existing meteorological models of the time, a large amount of data on this storm was available.

Later, Dr Van Knowe found a strong tropical Hurricane, Hurricane Bob, that passed though the same coastal region. We decided to attempt to sonify two storms that have a very different physical structure to see if the sonifications would yield insight into the nature of these two different types of storms.

3 Modeling the Storms for Spatialized Sound
Since the Engine 27 space has a very specific and unusual 16-channel speaker arrangement, we decided to map each speaker to a specific point in space proportional to the area spanning from Northern Florida to Northern New York State and from the Eastern tip of Massachusetts to Western New Jersey with New York City situated near the center. Simulated point data was to be modeled for an area of approximately 1000km. This area was mapped to the size and shape of the Engine 27 space. (see figure 1)

The kind of model output needed for sonification was very different that the output formats already in use by MESO for visualization. Dr. Van Knowe and his colleagues use the Mesoscale Atmospheric Simulation System (MASS) to create a highly detailed simulation of the weather based on terrain, initial conditions, and other factors. MASS takes real data inputs from satellite or surface readings and couples the information with global and regional models. There are several MASS output file formats: 3D array files, 2D horizontal (x-y plane) files, 2D vertical cross sections (x-z plane), 1D x,y simulated point observations, and 1D vertical profile (x,z) simulated point atmospheric soundings.

Our project required files of individual variables output for each geographical point at regular temporal intervals. Dr. Van Knowe and Dr. Kenneth Waight of MESO created a custom piece of software to output the data in this format. Kenneth T. Waight joined MESO in October 1987 after completing his Ph.D. in atmospheric science at the University of Wyoming. His first three years at MESO were spent on a project funded by the NASA Marshall Space Flight Center. Dr. Waight relocated to MESO's Troy, New York office in 1990 to assist in the development of MESO's real-time operational mesoscale modeling system.

Dr. Van Knowe then created a complete model of each storm at 5 points of elevation: sea level, approximately 8500 feet, approximately 18,000 feet, approximately 35,000 feet, and approximately 60,000 feet (or, the top of the atmosphere). Each variable was output every three minutes for a 24 hour period of the greatest storm activity. The model grid resolution was 10km. Nine variables were modeled at this stage, but only six variables were used in the final sound compositions: atmospheric pressure, water vapor, relative humidity, dew point, temperature, and total wind speed.

4 Creating the Sonifications
After the storms were modeled and the data output, we were left with 720 data files of 481 values each and the daunting task of translating these numbers into sound. Engine 27 master programmer Matthew Ostrowski joined us at this stage and he and I worked at the Engine 27 space for a period of about four weeks creating a system for reading and translating the files to spatialized sound using Max/MSP.

We decided to create a composition of each day’s storm activity in full at each of the five elevations. We started by simply and directly mapping each variable to the pitch of a sound sample of a distinct timbre. We somewhat arbitrarily used long tones for temperature and pressure related variables and percussive tones for water related variables. The bank of sound samples used included vocal sounds, sounds created by wind instruments, and environmental sounds including the sounds created by various insects. The resulting sound compositions were interesting, but listeners found it difficult to hear the changes in each individual variable.

We then decided to map the total wind speed to the amplitude of the sound. Directly mapping loudness to wind speed for every speaker (every geographic point) created a dramatic spatialization effect. The fastest wind speeds, representing the greatest storm activity, created the most sonic activity and excitement.

However, the combination of timbres was still overwhelming to the listener, limiting the listener’s ability to make sense of the data. At this point, we decided not to limit the number of variables presented through the sonification for the sake of the public presentation. Had we been creating the sonifications for research only, at this stage we might have brought Dr. Van Knowe and his colleagues into the space to listen to and compare and contrast sound compositions created by single variables. However, there was a deadline for a public presentation of the work to a general audience and aesthetically we felt that the single variable compositions lacked the fullness necessary to engage a general audience expecting to hear a musical composition.

The first aesthetic choice was to translate the atmospheric pressure data to a very low frequency sound. In doing so, listeners lost the ability to hear a detailed melody line describing the pressure changes, but gained a visceral sense of the storm.

Then, we began experimenting with using some of the variables as filter variables for sound samples representing other variables. Some of the variables in the model were highly coupled or inversely related to other variables. We created a band-pass filter that filtered a sound representing temperature with dew point values and filtered water vapor with relative humidity values. We found at this point that we needed to choose sounds with a wide spectrum in order to hear the filtering most effectively. White noise has the widest spectrum, and selecting ‘noisy’ sound samples proved the most effective in communicating the data and also was the most effective aesthetically due to the variation in the resulting sounds.

The scaling of the data for sonification presented particular challenges. Although the overall wind speeds varied with elevation levels, we decided to use global scaling for wind speed. This created the effect of the compositions building and receding in intensity. However, using global scaling for variables such as temperature mapped to pitch or water vapor mapped to a band pass filter proved to be much less dramatic that creating a scaling system for each elevation level of each storm since the variables differed widely between levels.

Finally, since the sonifications were to be performed in the format of a spatialized sound installation, we developed a daily schedule in which various compositions present the data sets at the five elevations, moving from ground level to the top of the atmosphere. In the installation, each storm was performed for approximately 1/2 hour six times each day. A storm consisted of six approximately five minute compositions presenting all variables at a single elevation and one combination of elevations based on the heights of the speakers. These compositions were marked by a number of ringing bell sounds, marking time and elevation like the ringing of church bells.

Conclusion
The final compositions were well received by both the general and the scientific audiences. Visitors to the installation particularly enjoyed remembering where they were during Hurricane Bob and the President’s Day snowstorm while listening to the sonifications. Some audience members found a metaphorical meaning in the series of rising elevations, finding the compositions nearer to the ground to be more visceral while those compositions representing activity closer to the top of the atmosphere were felt to be more ethereal and spiritual.

Dr. Van Knowe was particularly intrigued by the spatialization of the sound, and was interested in how the wave patterns of the storms were moving in space. The sonifications reinforced some known aspects of the particular storms. The winter storm was more intense near the top of the atmosphere while the hurricane’s fastest wind speeds occurred at lower elevations. This change in intensity was communicated very clearly through the varying degrees of loudness of the compositions. The patterns of movement of the tropical hurricane were known to be more chaotic than the winter storm, and the resulting compositions also reinforced this concept. Most listeners found that they could understand more the more they listened to the compositions, and there was an overall consensus that the work opens up doors for more research both in science and the arts.

Andrea Polli

References
[1] Murphy, Francis. Ed. “Proud Music of the Storm” from Walt Whitman: The Complete Poems New York: Viking Press; Reprint edition, 1990. [2]POLLI, Andrea, and VAN KNOWE, Glenn, Atmospherics/Weather Works: The Sonification of Meteorological Data. 2003. /studio/atmospherics [3]KRAMER, Gregory et. al, The Sonification Report: Status of the Field and Research Agenda, 1997. Available from http://www.icad.org/websiteV2.0/references/nsf.html [4] The USC Annenberg Center for Communication & The Banff Centre for the Arts New Media Institute, Bridges: International Consortium on Collaboration in Art and Technology, 2001.

top

Software Development
Platforms for Large Datasets: Artists at the API


Brett Stalbaum
C5 Corporation
www.c5corp.com

In 1998, C5 had a problem; two problems actually.
We had organized that year as a business without a model to do a data collection and analysis project at SIGGRAPH 98, called the Remote Control Surveillance Probe project. The impetus for the founding of C5 was to see what kinds of business opportunities were available to a collaborative group of artists and theorists already working for many years with information as our primary medium. The expertise of C5 members was brought under one umbrella to tackle problems in domains relative to our collective experience, which includes autopoietic theory, artificial intelligence, information systems design and programming, public relations, emergent behavioral systems, semiotics, literary criticism, military studies, library science, and fine art.

bd14

Shortly after organizing, we were invited by Steve Dietz of the Walker Art Center in Minneapolis to do a net.art project related to a work by C5's president Joel Slayton, “Not to See a Thing”. The project had been exhibited as part of the 1997-98 "Alternating Currents: American Art in the Age of Technology" exhibition at the San Jose Museum of Art, in collaboration with the Whitney Museum of American Art. The “Not to See a Thing” project collected about 10 gigabytes of information about audience participation with the work during the time it was installed in the SJMA. What Steve Deitz was interested in was how we might hybridize the “Not to See a Thing” data with the infrastructure of the Internet itself to create a net.art project. This in essence created our two problems.

bd15
bd16

On the one hand we had a fairly large, but still manageable set of biometric data from Slayton's installation, which we had to mingle with the tremendous infrastructure of the Internet itself. And of course we had to find a way to make the manifestation of that data mingling visible/navigable to the user. Thus the first problem was related to the size of the datasets, and the need to develop a strategy for exploring them and exposing something about them. The second problem was that we were faced with two large sets of data that were superficially unrelated to one another. Our efforts culminated in the “16 Sessions” project, and the realization of the C5 IP database that Lisa Jevbratt developed to facilitate the mingling between the “Not to See a Thing” data and IP space. This paper focuses on the strategies that emerged from these projects and how they inform the matter of how artists can and should contribute solutions to these kinds of problems.

I'll begin with the scale problem first, because it is the less interesting of the two, and the solution is more obvious. The question is "How do you create a context in which information artists with different experiences and different sets of IT skills can participate in the exploration of and experimentation with large data sets?" We believe it is important to create a context that is amiable to both collaboration and independent endeavor at a variety of interface levels. Technically this requires the development of multiple interfaces to the data which are congruent with the experience of the various groups of people who will be working with it. To ensure this, whenever possible, artists should be involved with or completely responsible for the development of the various interfaces. Given that artists today are also computer programmers, database administrators, information architects, engineers, and theorists, it’s important that the data to be worked with be arranged for maximum access. Access which ranges from the raw data (files or database interface), all the way through standard user interfaces that highly mediate access to the data through end visualizations at the presentation layer. In between these extremes, artists should have access to the all of the API's and middleware layers, and preferably be responsible, for the development of these layers. Working on “16 Sessions”, and in subsequent software projects such as “SoftSub”, C5 had in place people with experience in all of these layers of software development, and importantly experience working with each other, so the process was relatively smooth. Of course, this is not the situation with larger sets of institutionally collected data, where the standards, data formats, and API's can often be quite obtuse.

bd17

Different challenges exist with the emergence of large collections of public data such as is available from the United States Geological Survey, NASA, NOAA, and the Human Genome Project. Not only challenges presented by the technical sophistication of the data and the tremendous size of the data, but in strategizing appropriate interfaces to the data that allow users of very diverse backgrounds to participate in the process of consuming the data and generating new knowledge from the data. C5 has been active in this area. For example, the C5 Landscape database is a relational database, Perl API and set of sample interfaces designed specifically to help users in creating their own programs that can easily access, analyze and display information about the shape of the earth. The database is designed to eliminate much of the complexity in acquisition, database interface, processing and imaging common in the manipulation of geo-data, so that artists have a manageable platform in which to write their own software and perform mapping experiments. Artists using the software can work with the database from various levels of technical sophistication. These levels range from a web-based GUI directly through SQL, Perl DBI and Java JDBC programming techniques. An API also provides a variety of features and capabilities through easy to use Perl modules. There are of course many projects that incorporate the idea of artists working with data at all levels. Notable are Lisa Jevbratt's “Mapping the Web Infome”, and Rhizome's “alt.interface” projects. Rhizome's “alt.interface” project involves exposing (to artists) the database API of the Rhizome website and its large text object collection, such that they can create alternative interfaces. Jevbratt's web crawling project is especially notable because of the way that she worked with the invited artists to create both an interface for the 'alternatively' technical artists involved, as well as working at the database and API levels with many of the artists to collaboratively implement features suggested by artists. It is appropriate for artists to be involved in the development of the public API's and application layer interfaces through which the public at large will have access to large data, because in many cases artists working collaboratively already have experience in working out the inherent interface issues that are involved in making data available to 'technically diverse' or even non-technical users. Artists in both new media academia and fine art practice have been involved in this kind of work for many years.

bd18

The second issue is a deeper one involving how artists have and can contribute to dealing with inter-relations between very different datasets, as well as unexplored intra-relations within single large datasets of considerable complexity. The exploration of large datasets is one of the most provocative and interesting issues for artists today because of the explosion of availability of such large data sets being made available to the public.

bd19

Why? Artists as cultural workers have always sought to contribute to the state of our knowledge near the edges of human understanding. Among the new cultural problems we face today are the problems of big data. And lest you assume that this is exclusively the domain of computer science, the large datasets of today present new kinds of problems which computers and networks are not traditionally used to solve, and perhaps even that the traditional use of computers and networks can not solve. The familiar notion of the "information processing life cycle" is the basis of contemporary data processing. This is the very colonial idea that data is notion holds that data must be processed into useful information, and to accomplish this you normally start by considering the output you want, the available input, and then determine the algorithm that will take your raw and untreated data and turn it into a manageable, cognizable, useful thing we call information. The entire field of Data Mining and Knowledge Management as we know it today is predicated on the pre-existence of semantic models that allow data to be algorithmically mined for meaning. This is the basic philosophy and approach to data and information, and is of course profoundly successful, but its application reaches severe limitations in dealing with contemporary data and the new kinds of problems it presents.

For example, traditional problem solving is not at all applicable to the situation C5 faced with “16 Sessions”. We had two very different data sets, and although we had some preconceptions of what they meant, we had no idea how they were related or if they were related, and no clear idea of what kind of question to ask. Neither set of data was collected with a protocol that was designed to facilitate the type of endeavor that we were charged with performing. Again, standard information processing techniques are not useful for all problems, especially when you do not have a question, when you have a poorly formed question, or when the dataset itself is not entirely understood or contains information potentials that were unplanned at the time it was collected. Data may have non-transparent semantics, or may be so complicated that you do not know where to begin to search, or it may take on new roles as new needs emerge after the data is collected. These issues are of course also related to the problem of what questions to ask. When you don't understand your data, you will naturally have poorly formed questions about it.

Why is this an important problem? The answer is that there is ever more data being collected in various endeavors about which we don't know what questions to ask. For example, the Human Genome Project has sequenced and published the entire human genome, but that tremendous data set is largely unexplored, because in part, scientists have not sought the answers to questions not yet raised. While this may seem quite tautologically obvious, it is simultaneously a tremendous and real problem. As put by Lisa Jevbratt, the process of exploring genomic data can be "described as that of a group of people in a dark room fumbling around not knowing what is in the room, how the room looks or what they are looking for." Genomic data is non-unique in this respect. There are, for example, vast datasets available from the United States and other governments regarding all kinds of interesting things that we don't yet fully understand, or that we think we understand but which has behavior and relations that have been overlooked. And artists, who do not always participate in the scientific method, may well make discoveries or observations in their aesthetic and conceptual pursuits with such data that lead to such questions, even if the artists are participating as blind probe heads in data space.

The exploration of such data, I argue, is the most productive and culturally useful positions from which to perform as an artist in the 21st century. It is hard now to make faced with large sets of data without a map nor a clearly defined problem definition is one of the most interesting and provocative problem types we face in an era where our ability to collect data outpaces our ability to generate knowledge from it. Asking questions and exploring spaces in poorly defined problem domains consisting of huge datasets is the natural, useful, and potentially highly productive cultural role in which artists should play a part.

C5's approach to these types of problems is to explore the application of autopoiesis as a conceptual framework for understanding the behavior of data and information. Autopoiesis takes place in systems that differentiate themselves from other systems on a continual basis through operational closure, and that produce and replace their own components in the process of interaction with their environment (structural coupling), that occurs via a membrane containing the organization of the unity in question, thus allowing distinction between it and its environment. A basic question for any analysis of the autopoietic potentials of data involve distinguishing a membrane, or the interface, where operational closure (inside) and structural coupling with an environment (outside) are expressed. It is in patterns of structural coupling that relations between complex data can be analyzed. If you can find a membrane, you have revealed a relation between or within data sets. To find membranes, you need to mingle data. For example, there are contemporary explorations within the social sciences that demonstrate that relations exist containing information the landscape (for example drainage, land cover, or topography), can reveal insights when mingled with historical data. C5 views these types of data processing explorations as very interesting instances of structural coupling between data sets, even those as superficially different as geological and historical data.

Most of C5's approach to autopoietic frameworks for the understanding of large data has been developed by Joel Slayton and Geri Wittig. Perhaps the key idea that emerges from their work is the notion of a composibility of relations, in that composibility indicates the potential for autopoietic membranes existing as data relations via third order structural coupling in a coded environment. This allows for the analysis of data sets where the semantic relationships are uncertain. In a sense, this idea can be described as the search for algorithms in which superficially different data sets might be shown to couple based on their subject-less form through inherent sans-semantic or pre-semantic models, and to seek these relations specifically to flag the potential for the presence of immanent, unplanned, or otherwise unrecognized semantics flowing from mingled relations, thus revealing something about the ontology of the sets that produces new knowledge about them. It is unlikely that there is a universal algorithm for this, (such as a universal visualization system for all data), but if there is, it is likely to be accidentally discovered by researchers searching for inter-relations between data sets. Obviously, artists should be involved in this endeavor.

This is only one approach, undertaken by a small self-funded organization that believes a very particular theoretical framework can be expressed in coded relations that deliver their own answers. To explore this, we of course need a lot of data. It is important that science organizations create the circumstances that will allow a diversity of independently theorized approaches to emerge based on public interest in and public access to the data. It is in casting large sets of scientific data into the realm of artists, and indeed the public at large, which will allow a multitude of self-organized modes of discovery to develop.

Brett Stalbaum

NOTES
First Published as: Software Development Platforms for Large Datasets: Artists at the API, Leonardo Electronic Almanac volume 11, number 5, May 2003 ISSN #1071-4391 This essay began as speaking notes for a talk of the same title delivered at a rhizome.org event sponsored by Qbox in San Francisco CA, April 26th 2002. http://rhizome.org/events/rhizome_sf_apr.php3

http://www.c5corp.com/projects/rcsp/index.shtml

http://www.c5corp.com/walker/gateway.html

http://www.c5corp.com/projects/16sessions/index.shtml

The internet protocol is the numerical addressing scheme used to identify devices on the internet.

This later became the technical basis for 1:1, http://www.c5corp.com/projects/1to1/index.shtml

API is the acronym for application programming interface, which is a group of public functions that exist in a library of code that other programmers can make use of to implement their own code. Artists should design API's as well as use them.

http://www.c5corp.com/softsub/index.shtml

A good example of this is the Spatial Data Transfer Standard. According to computer scientist Gregg Townsend, "The adoption of SDTS was a giant step backwards. While previous DEM files could be read by relatively simple programs, SDTS file are difficult to read even with the help of a large external library." http://www.cs.arizona.edu/topovista/sdts2dem/

http://cadre.sjsu.edu/~gis

http://dma.sjsu.edu/jevbratt/lifelike/

http://rhizome.org/interface/

For  example, see http://fisher.lib.virginia.edu/projects/salem/ The GIS of "Salem Village in 1692" is part of an electronic Research Archive of primary source materials related to the Salem witch trials of 1692

Wittig, Geri, Expansive Order Situated and Distributed Knowledge Production in Network Space, <http://www.c5corp.com/research/situated_distributed.shtml>

Slayton, Joel and Wittig, Geri Ontology of Organization as System, Switch - the new media journal of the CADRE digital media laboratory, Fall 1999, Vol 5 Num 3, http://switch.sjsu.edu/web/v5n3/F-1.html

http://cse.ssl.berkeley.edu/nvo/nvo.htm

top

 
ph
pj