Image-Based Information Access    
 

Summary

There is an astonishing amount of information on the web and it is constantly increasing. To avoid being overwhelmed by the volume of information available and confused by its uneven quality, people need assistance in efficiently finding task-relevant information and in effectively managing complex dynamic information collections.
Current interfaces primarily employ textual representations for accessing and organizing personal information collections. Access is either via taxonomies or queries to search engines and results are typically organized as lists or hierarchies of web page titles. Given the ability of images to assist memory and the common exploitation of space in everyday problem solving to simplify choice, perception, and mental computation, it is surprising that so little use is made of images and spatial organization to aid information access and organization. In this project we are examining how spatial and temporal organization of images can serve as effective interface components for the design of personal information enviornments. We have developed a flexible multiscale software system (Dynapad) and an extensible set of region tools to provide various forms of subtask-specific support in local regions of collection-management workspaces. Our software development efforts are driven by ethnographic and experimental studies of spatial and temporal strategies for image-based access and organization of information.

People spend substantial time maintaining personal collections of varied types of digital information: photos, video clips, web bookmarks, email archives, professional documents, and other files. Our on-going research goal is both to understand the cognitive strategies people use in managing such collections in a visual workspace, and to build a versatile infrastructure of tools to support those strategies. Although diverse content types are often supported by different applications, our premise is that the same basic cognitive strategies likely underlie the activities of exploring and organizing any collection. This may be why spatial arrangements of elements in piles have proven to be such a natural and effective mechanism for managing physical desktops. Considerable research has explored how people make use of space to organize information. Likewise, the utility of image-based and time-based workspaces has drawn increasing interest. Our work continues these themes. When using spatial workspaces, people typically allocate regions of space to play specialized roles within broader activities. For example, piles can be used to categorize items or to function as reservoirs of items yet to be examined. Our work shows that the affordances of physical piles can be dissociated and selectively engineered in digital environments.

While our research explores spatial tools for managing many types of digital information, our objective in this project is to explore tools to assist in managing personal collections of digital photos and iconic representations of PDF documents such as journal articles. A central targetted activity is to demonstrate how a generalization of the “pile” metaphor can serve as a foundation for new types of tools to enrich regions of a collection-management workspace with local task-specific behaviors. Throughout our research effort we have draw on ethnographic observations ofpeople using developing versions of our tools to organize their own collections of digital photos and research documents.

Design and Implementation of Dynapad Multiscale Interface and Visualization Software Infrastructure
Dynapad is the third generation of our multiscale interface and visualization software. The Dynapad software infrastructure supports our exploration of spatial tools to assist with managing personal information collections.

One impact of Dynapad is to allow us to begin to generalize a notion of spatially-located physics — to develop an infrastructure of regional tools whose physics both automate the creation of microstructure and guide the management of macrostructure at multiple scales, to create variegated, interactive, task-specific workspaces. But we also realize that no software, however clever, will be able to anticipate the full variety of arrangements and strategies people employ in such reflective and opportunistic interactions. A tool to support, rather than dominate, such activity must afford the user both the authority to override its initiatives and the expressiveness to employ a variety of strategies. The unlocking and reediting of a portrait-collage as discussed below is one example. In short, any automation or physics must be adequately humble.

Region Tools for Managing Personal Information Collections
Our primary research objective is to explore and generalize the notion of a “pile” as the foundation for a versatile suite of region-tools which provide unobtrusive assistance for organizational and other sensemaking activities. Our multiscale piles are provided with a cognitively convivial physics to self-adjust their size as elements are added or deleted, tp allow one to naturally zoom to "ideal" levels to interact with indivial members or specific piles, and to assit navigation in rich multiscale workspaces. While this and other structure-preserving affordances are useful, the most important impact of our effort has been the development of proactive behaviors to assist the user in creating meaningful structure. We have developed a range of tools. For example a collection of digital photos can be organized along a timeline to show when they were taken, a collection of iconic representations of PDF files can be organized in terms of the dates the PDFs were created or the dates they entered one's personal collection of papers. We also support linked brushing not just between instances of the same object (to help see that the same object appears in multiple piles), but also between objects related in various ways: for example, files in the same directory, photos taken the same day, papers by the same author, and citation relations. Our software architecture allow any source of metadata to be represented in this way.

Dynamic Multiscale Iconic Representations
Our research emphasizes the value of visual access to information. One content type, photographs, are already visual and are represented in a Dynapad workspace as thumbnails. Other types of content, however, are more challenging to convert to a graphical form. We have adapted Dynapad to support graphical representations of PDF documents, typically focusing on collections of research papers downloaded from the web. Because images from these papers can be effective retrieval cues, we extract and collage them into “portraits” or “enriched thumbnails” of the documents. One of the figures below shows sample portraits of a paper and many of its references. In Dynapad thumbnails are automatically replaced by high-resolution versions when users zoom into them. In addition, other applications can be accessed via the images (e.g., image editing programs for digital photos and PDF viewers for files associated with the portraits).

Currently, the algorithm we use to generate document portraits is relatively simple: we automatically extract all component images, sort them by file size (which reflects both image size and complexity, and therefore salience), and arrange the top few over a background image of the document’scover page. We are exploring more sophisticated strategies as well. But of course, no algorithm can always guess correctly what will be the most effective portrait for a given paper; Dynapad uses an evolving set of heuristics to make a best guess, but lets the user edit any portrait-collage.. When a portrait is unlocked for editing, all component images may be moved and resized (although they are forced to stay contiguous with the background image). When locked, the collage is cropped to the boundaries of the background and unused images are “stored” out of sight and may be accessed later if the collage is again unlocked. We are experimenting with including customizable text fragments in the portraits – for example, paper title words, keywords, or author names that can be automatically extract from PDFs. In addition, we are exploring dynamic portraits, whose appearance changes in different contexts (e.g., showing a slideshow of the contained images as the mouse hovers over a portrait) or at different viewing scales (e.g., using “semantic zooming” to show only the largest image when zoomed out or a representative sampling when zoomed in). Finally, in new work to integrate Dynapad with the Stanford Diver system for annotating audio-video source material, we have prototyped portrait-collages for video clips in which heuristically chosen keyframes serve roles similar to images from a PDF file.