Open Syllabus 2.0 | Co-assignment Galaxy

A zoomable, searchable visualization of the top ~160k works in the Open Syllabus dataset (everything with 20+ assignments). Using each syllabus as a grouping mechanism that indicates a relationship between the readings assigned for the course, we build a weighted citation graph that captures the number of times that each pair of works appear together in a course. The graph is then embedded with node2vec (128d), PCA’ed down to 50d, and then projected to 2d with UMAP, which gives the final layout shown here. The 30,000 most-assigned texts are loaded in bulk into the client and rendered via WebGL with Pixi.js; and, as the viewport zooms down to higher levels of magnification, more data is dynamically queried via Elasticsearch geo queries. Covered in Fast Company, EdSurge, Open Culture, and Boing Boing; cited in The Washington Post.

Launch the project Read the post

Open Syllabus 2.0 | Citation extraction pipeline

A NER and entity-linking pipeline that identifies references to books and articles in college course syllabi. This is similar to traditional citation-extraction projects that match structured bibliographic strings (eg, in scientific papers); but with syllabi, we have to account for a large amount of fuzziness and inconsistency, since syllabi are messy documents with essentially no standardization in terms of how texts are referenced.

We take a two-step approach – first, starting with a bibliographic database of ~65 million books and articles, we surface a set of candidate matches based on a raw keyword match of tokens from the title of the work and the author’s last name. (We index the title and author token sequences in a space-optimized trie, implemented in Rust, which makes it possible to match all possible references with a single linear scan through tokens in a document). This produces a large set of matches, places where the title and author of a text appear in close proximity in a document. In many cases, this is enough to identify a work — for example, if the tokens attention is all you need and vaswani appear in close proximity, then this is almost certainly a reference to the paper. The difficulty, though, is that large bibliographic databases will also contain a set of works (generally books) with short titles that consist of relatively frequent tokens, and where the author name is also somewhat common. Eg, the “Politics” by Aristotle — if politics and aristotle appear within ~10 tokens, then in might be a reference to the Politics; but it might also just be an incidental co-occurrence of the words in regular prose. These works produce millions of false-positive matches, when then need to be pruned.

To do this accurately, we need to incorporate contextual information from the document. We apply a validation model that extracts features from the document contexts before, between, and after the raw title and author keyword sequences, and predicts whether the match is a legitimate text reference. This is implemented in PyTorch (LSTMs over character and word embeddings) and trained on ~12k hand-labeled examples, and gets to ~90% accuracy.

Read the post

“Headlines as Networked Language: A study of content and audience across 73 million links on Twitter” (Masters thesis)

A study of the news ecosystem from a standpoint of textual discriminability, modeled as a source prediction task – classifiers are presented with the headline from a news article, stripped of context, and trained to predict which outlet produced it (NYT, Fox, CNN, Breitbart, etc). By analyzing the geometries of the learned representations, it becomes possible to survey the differences in content and style across media brands with a high level of granularity; and to track movement in the latent space over time, as outlets evolve into new configurations at the level of topic, semantics, and syntax.

Read the post

Open Syllabus Project | Explorer app

A web application that makes it possible to explore the texts that are assigned in a corpus of 1M+ college course syllabi. We launched this with an op-ed in the NYT Sunday Review in early 2016, and the project has since been featured in Nature, Time, The Washington Post, FiveThirtyEight, Lifehacker, Business Insider, Marketwatch, QZ, Der Spiegel, The Chronicle of Higher Education, WGBH, WNYC, Edsurge, The Stanford Daily, and elsewhere.

Launch the project Read the post

Open Syllabus Project | Text graph

The ~10k highest-degree nodes in a 1M-node, 10M-edge network built from the text assignment data extracted from the Open Syllabus Project corpus. Each node represents a book or article; edges connect nodes that are assigned together on the same syllabus. The layout was computed in Gephi, and then a 50,000 x 50,000 pixel render of the graph was generated with Python bindings onto GraphicsMagick. This image was sliced into a Zoomify tile pyramid, and Leaflet is used to render the tiles in the browser.

Launch the project Read the post

EarthXray

Earth Xray is a mobile VR website that renders a 3d “x-ray” of world geography as it exists in real-world, physical space below your feet. It’s sort of like one of those astronomy apps that traces out an atlas of constellations when you hold the phone up to the sky. But, flipped upside-down – instead of stars in the sky, Earth Xray draws a regular atlas of political geography as it exists “underground,” hidden by the surface of the earth as it curves away below the horizon.

Launch the project Read the post

Visualizing the Humanist list

A visualization of the 27-year-old, 12-million-word “Humanist” listserv, an email discussion group started by Willard McCarty at the University of Toronto in 1987. Using a modified version of Textplot designed to capture high-level changes in word usage in large corpora over time, I was able to generate a horizontal network layout that teases out a kind of visual intellectual history of the list, a high-level semantic shift between 1987 (mainframe, microcomputer, vax, modem, bitnet, diskette, telnet, hypercard) and 2014 (twitter, facebook, gmail, ipad, wordpress, blogspot, digitalhumanities).

Launch the project Read the post

Textplot: (Mental) maps of texts

Textplot is a little program that converts a text document into a network of terms that models the patterns by which different words “distribute” inside the text – terms that tend to show up in the same places get linked together with strong ties. Then, when the networks are passed through a force-directed layout algorithm, you end up with a little mental geography of the document – a 2D layout that surfaces the underlying topic structure of the text. For example, in this image, War and Peace – “war” on the left, “peace” on the right, and “history” on top.

Launch the project Read the post

Exquisite Haiku

Exquisite Haiku is an experimental web application written with Node.js and Socket.io that makes it possible for groups of people to work together in real-time to write haiku over the course of 30-40 minutes. Loosely inspired by the Exquisite Corpse (a classic Surrealist parlor game from the 1920s) I was curious to see if I could create a social algorithm that would produce “authorless” texts – poems that make sense, but can’t be traced back to any individual person.

Read the post

Neighborhoods of San Francisco

An interactive map of neighborhoods in San Francisco, traced out with Neatline’s vector annotation tools on top of the lovely Stamen “Toner” layer. I liked how nearby neighborhoods often share parts of their names (eg, Pacific/Lower Pacific/Presidio → Heights, Outer/Central/Inner → Richmond, Outer/Central/Inner → Sunset), and played around with ways of minimizing the total number of words on the map by daisy-chaining together the shared terms with little schematic arrows.

Launch the project Read the post

Napoleon in Russia

A Neatline-powered edition of Charles Minard’s famous infographic showing the gradual deterioration of the French army over the course of Napoleon’s 1812 invasion of Russia. Minard does a brilliant job of showing just how much of the French force was lost, but I realized that I didn’t really have an intuitive sense of the scale of the whole thing – how long it took, how much distance was covered, etc. So, I sketched in a big “ruler” annotation along the top of the map that marks off the 540-mile distance between Moscow and the Russian border. Then, on the right side of the screen, I added an interactive chart that plots the total size of the army over the course of the 6-month span from June to December, with each data point corresponding to one of the rectangular segments in Minard’s map. Hover on the map to focus the crosshairs, and click the chart to focus the map.

Launch the project Read the post

Project Gemini over Baja California

An interactive edition of two photographs of the southern tip of Baja California taken from aboard the Gemini 5 and 11 missions in the 1960s, plastered on top of modern satellite imagery of the same location. I was really interested in the difference in perspective between the two sets of imagery – the oblique, angled, historically-situated perspective in the Gemini shots vs. the flat-on, depthless, and (seemingly) timeless view from the satellite. To play with this, I sketched in a huge “ruler” that measures out the distance between the camera and the city of La Paz in the distance, and used the “Perspective Grid” tool in Illustrator to render a second version of the shape as if it had been floating in front of the lens in 1965. Then, I used Neatline’s SVG-import workflow to convert the Illustrator documents into spatial coordinates and position them on the map.

Launch the project Read the post

The (Digital) Gettysburg Address

An interactive edition of the “Nicolay copy” of the Gettysburg address, which was the first draft of the speech and possible the actual reading copy that Lincoln held during the address. I wanted to experiment with ways to make it easy to quickly move back and forth between the handwritten words in the manuscript and the transcription, and added little visual guidelines that connect the two instantiations of each word in the two viewports when the cursor hovers on either version of the text. Scroll down for trivia about the history of the manuscript.

Launch the project Read the post

Neatline Timelapse

Right before releasing the first version of Neatline 2.0, I spent an afternoon beta testing a release candidate by plotting out my 2007 thru-hike of the Appalachian Trail. For about 45 minutes, while I outlined the trail on top of georectified maps from the AT conservancy, I recorded the screen and then compressed the tape down to 90 seconds and set the whole thing to Chopin.

Read the post

Stage S-II of the Saturn V

I love space. One time, during a Wikipedia binge, I found this beautiful shot of the second stage of the Saturn V rocket being hoisted onto a test stand at Stennis Air Force Base in Mississippi, 1967, which drips with that eerie, cold war sense of hugeness. Click on the spans in the post to focus on corresponding sections in the image, or click the shapes on the image to scroll the text.

Launch the project Read the post

A poem, typeset inside its own title

An experimental reading interface that I built for a little poem I wrote a couple years ago called “Polyphemus” – each of the 94 words in the poem is positioned inside of the letters in the title, and the poem can be “traversed” by dragging the slider at the bottom of the screen or using the arrow keys to increment through the words.

Launch the project Read the post

“By the pricking of my thumbs”

A single line from Shakespeare’s Macbeth, laid out inside of a Neatline exhibit with each successive word geometrically enveloped inside of the last – read the line by zooming “inward” or “downward” towards the end of the line. This was one of those fun cases where the interaction design ended up having an unintended parity with the content – it feels like descending down towards hell, or something, which fits well with the line. I’ve spent a lot of time recently thinking about experimental reading interfaces, and this was an early experiment with mixing poetry with the deep zooming capabilities of digital mapping technologies.

Launch the project Read the post

“A Coat”

Another Neatline-powered typesetting project: a lovely little poem called “A Coat,” in which Yeats renounces what he considers to be the stylistic affectation of his work in the 1890’s and vows to write with a more “naked” simplicity. This time, I wanted to invert the descending motion of the line from Macbeth – the first line of the poem is the “smallest” or “deepest,” and each successive line is positioned below the last, at a much larger scale, so that the previous line notches into a gap formed by the letter glyphs. Read the poem by scrolling down to zoom out.

Launch the project Read the post

“The Song of Wandering Aengus”

A typesetting of the last three lines of “The Song of Wandering Aengus.” This time, each successive line is embedded inside a got on top of an “i” on the previous line. Use the mouse wheel or zooming buttons to move forward (downward, really) towards the final line.

Launch the project Read the post

Mapping the Catalogue of Ships

“Mapping the Catalogue of Ships” is a collaboration between Courtney Evans, Ben Jasnow, and Jenny Clay in the UVa classics department and the Scholars’ Lab to apply modern GIS techniques to the index of place names in Book 2 of the Iliad, where Homer lists out each of the contingents in the greek army that sailed to troy. Working in collaboration with Jeremy Boggs and Wayne Graham, I developed the extensions for Neatline that make it possible to connect individual words in the greek text with locations on the map (an early version of the code that turned into Neatline Text) and designed the Neatline exhibit that displays the results. Courtney and Ben’s paper won the prestigious Fortier Prize at DH 2013 in Lincoln, Nebraska, given to the best paper by a young scholar.

Launch the project Read the post

UVa Campus Map

A simple little exhibit, built as a Neatline feature demo, that plots out building and landmarks on the University of Virginia campus. Click on the vector annotations to pull up little snippets of background information about the location.

Launch the project Read the post

Pic d’Anie via Lescun

An interactive gallery showing of a series of photographs that I took in 2008 while climbing Pic d’Anie, a small but beautiful mountain on the border between France and Spain. Instead of just plotting out the locations of the pictures, I wanted to find a schematic way to showing the direction and range of the view, so, for each dot, I drew a line in the direction that the camera was pointed, scaled to represent the distance of the terrain in the background.

Launch the project Read the post