Recently I created two different interactive apps visualizing and exploring the Titles of the U.S. Code. You can browse the text of Title 35 (Patent) and Title 17 (Copyright) in a visually interesting manner. Click on the photos below to use them.
I have created a new experimental app for visualizing and exploring U.S. Law using a force-directed graph. You can click on the picture above to launch it. This force-directed visualization is more intended to be visually interesting rather than a full-fledged U.S. law navigation tool.
Explore the Copyright Code or the Patent Code
This app allows you to explore two titles of the US Code.
Title 17 – The Copyright Code
Title 35 – The Patent Code
Hub and Spoke Representation of US Code Hierarchy
The chart uses a “hub and spoke” layout to represent the hierarchy of a given Title of the U.S. Code such as Title 35. The center circle represents a “parent” portion of the code – a portion with sub-portions under it (e.g. Chapter 10) and the surrounding circles on the edge represent the “children” portions that belong to that parent (Section 100, Section 101, section 103..).
You can click on an outer circle to open up the “children” parts that reside under that circle. If a circle has “children” parts, the circle border will be a thick grey. The selected circle will then be the new “parent”, and its “children” portions will be displayed.
Force Directed Graph
The app uses a “force directed graph” engine to display the titles of the U.S. Code. Force directed graphs are often used to model interactions between physical objects, such as molecules reacting to gravity. Because force directed graphs such as this simulate physical forces such as gravity, using this framework to display data means that various parts can tend to move around somewhat randomly.
Probability Tree Diagram
This post will discuss an Interactive Conditional Probability Tree Diagram that I created and how and why to do it.
Conditional Probability and Probability Trees
I include some basic probability theory as part of a Problem Solving Course that I teach to law students. Probability can be a useful skill for law students to learn given that attorneys are often called upon to make decisions in environments of uncertainty.
In teaching my students about Conditional Probability, it is often helpful to create a Conditional Probability Tree diagram like the one pictured above. Probability Tree diagrams can help the students visualize the branching structure of conditional probability.
The diagram automatically computes the relevant conditional probabilities given the input data. It also allows you to change the input probabilities and recompute.
The explicit structuring of the U.S. law allows for increased computational analysis and visualization of the law, like this experimental demonstration app for navigating and visualizing the Titles of the U.S. Code that I recently created.
This post will discuss what it means for US law to be structured and why this enabled increased data analysis and visualization.
Structured Law and Computer Analysis
Around 2013, the U.S. government released the United States Code – in xml (extensible markup language) format. Releasing the laws in “.xml” means that the federal laws have now been given an explicit structure that can be read by computers.
To see why explicitly structuring the law in “machine-readable” form allows for more advanced computer analysis, let’s first examine the concept of explicit computer-readable structure and what this has to do with law.
The Structure of the United States Code
Most of the laws that the US Congress passes are collected in the US Code which is a large compilation of federal statutory law.
Title 15 - Commerce and Trade .. Title 25 - Internal Revenue Code .. Title 35 - Patent Law
Loosely speaking, a “Title” corresponds to a different topical area for lawmaking. For instance, Title 35 contains most of the the Patent Laws, Title 20 contains many of the Education Laws. (Note that some Titles are a hodgepodge of unrelated topics housed under one document – e.g. Title 15 – Commerce and Trade, and the laws regulating some topics are found across multiple titles). However, the fact that laws are loosely placed by topic within Title is one form of overall structure.
Title Hierarchy: Parts -> Chapters -> Sections
Each Title, in turn, has its own structure and hierarchy. Every title is divided into smaller parts and sections in different levels. A typical structure of a title of the US code will have it divided into
Chapters, Sub-Chapters, Parts
Sections, Sub-Sections, Paragraphs
and so on. For instance, Title 35 – the Patent Code – has multiple patent laws located located in different parts of the overall hierarchy. Those laws related to the Patentability of Inventions in Chapter 10, etc.
Title 35 - Patents Part 1 -United States Patent And Trademark Office CHAPTER 1— Establishment, Officers And Employees CHAPTER 2— Proceedings In The Patent And Trademark Office § 21. Filing Date And Day For Taking Action .... Part 2 -Patentability Of Inventions And Grant Of Patents ... CHAPTER 10— Patentability Of Inventions § 100. Definitions § 101. Inventions Patentable
Where is the law that tells us what types of inventions are patentable? That is located in Section 101 – “Inventions Patentable”. Within the overall hierarchy of Title 35, it’s located in Title 35 – Part 2 – Chapter 10 – Section 101.
Title 35 Part 2 Chapter 10 Section 101
And the text of section 101 is
SECTION §101 – Inventions patentable Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Plain Text Law: Unstructured Text
The section just presented is an example of what might be called an “unstructured” (really “semi-structured”, but henceforth “plain text”) version of the law. A “plain text” version of the law means that the law as we normally see it written – in ordinary sentences designed for people to read (as opposed to computers).
I used the phrase “designed for people to read” to emphasize a point: such a plain text sentence may not be easy for computers to read. Computers are likely to find laws written in plain-text – like the one above – difficult to read. “Plain text” can be contrasted against “machine-readable” text, like the example below.
<section number="101"> <sectionText> Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter </sectionText> </section>
Computers prefer text to be rigidly structured and precisely labeled in this way. Such text is “structured” (and machine-readable) because a computer can, following rigid rules, methodically go through and unambiguously identify each part. In the example above, there is legal language within <sectionText>, and the computer knows exactly where the <sectionText> language begins and where it ends.
Plain Text Law: Implicit Structure
A typical law written as plain text does have a structure, but that structure is implicit. The structure includes what legal text goes with what section (i.e. do the words “Whoever invents a new..” go with Section 101 or Section 102), and the hierarchy (i.e. What parts are under what other parts – does Section 101 belong under Chapter 10 or Chapter 11). Let’s understand why the structure of a plain-text law is implicit and therefore difficult for computers to read.
TITLE 35 - PATENTS - Part II - Chapter 10 SECTION §101 - Inventions patentable Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
If you’re an attorney, you might be thinking, “There is an obvious structure in the law above – it is divided up into chapters, sections, etc. I can see that plainly.” If an attorney were to look at a printout of Title 35, she could see that it is divided into 5 “Parts”, each “Part” contains multiple “Chapters”, each “Chapter” in turn contains “Sections”, etc.) However, nothing in a printout of Title 35 that explicitly tells us about the hierarchy and organization of sections. There is nothing explicitly in the document that says, “A ‘Part’ is above a ‘Chapter’, and a ‘Chapter’ is above a ‘Section’.
Rather, that organization is implicit in the way the text is displayed and labeled given legal conventions. Attorneys learn to parse this hierarchy by relying upon common conventions about how law is labeled and structured, and general legal knowledge. Attorneys from training and experience – understand that in federal law there is a structure, and that “Chapters” come below “Parts” in the hierarchy, and that “Sections” are contained within “Chapters”.
Visual Cues and Implicit Structure of Law
In looking at the law, we also rely upon visual clues to show what portions are sub-parts of other portions. Often, when the law is printed, it is indented by several spaces each new level in order to make the hierarchy apparent, and sometimes emphasis like bold, etc, are used.
Additionally, we rely on visual cues to understand the different elements (e.g. headings vs. text of the law), and where one element begins and another ends. For instance, in looking at the above plain text printing of Title 101, we understand that the heading of the section is “Inventions Patentable”, and that the heading ends with the word “Patentable” where the bolded text ends. Thanks to bolding and spacing, we under that the text of the section begins with “Whoever invents…” The change in formatting and spacing indicates visually where the heading begins, and the content ends.
Unstructured Law: Difficult For Computers
The implicit structure in “plain text” sentences – like the law above – are obvious for people to see. However, to a computer, this implicit structure is typically difficult to unambiguously understand. A computer would not be able to understand (without accuracy issues) the same implicit cues (spacing, headings) that humans easily rely upon to separate out the law into its components and subcomponents.
In general, computers are not as good as people at understanding arbitrary visual cues – like Bold and spacing – that indicate the various parts. A computer might, for instance, not understand where the heading “Inventions Patentable” ends, and the content of the law “Whoever invents” begins. A computer might accidentally read the whole paragraph as one entity, “Inventions Patentable Whoever invents..”
While in principle you can program a computer to make educated guesses about the structure based upon the formatting and spacing, the computer is liable to make errors in “parsing” or reading the law and its structure if there are even minor changes.
In sum, when the law is printed as plain text – as it has traditionally been printed for hundreds of years – very basic computer tasks – such as separating out a Title into its different parts and sub-parts (e.g. Headings, content, chapters,etc), are be comparatively difficult to do with a high level of accuracy.
A simple task that merely involved reading the plain text law and counting the number of Sections in Title 35 – an easy task for a person — would risk errors in a computer.
US Code – Released as XML
In 2013, the U.S. House of Representatives released the titles of the U.S. Code as structured data in xml format. (Previously the Cornell Legal Information Institute had released an unofficial xml version of the federal law as well).
The fact that the law is now marked-up in .xml means that the Section 101 of the Patent Code now looks like this:
<section style="-uslm-lc:I80" id="id223e3b13-a7cf-11e4-a0e4-817d0c170cd7" identifier="/us/usc/t35/s101"> <num value="101">§ 101.</num> <heading> Inventions patentable</heading></p> <p style="font-size: 11pt;"><content> <p style="-uslm-lc:I11" class="indent0"> Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title. </p></p> <p style="font-size: 11pt;"></content> </section>
Computer Friendly Law
This version of the law is much less human-friendly to read, but much more computer friendly to read. Computers excel when there are precise, unambiguous rules to follow.
The .xml version of the U.S. Code makes the structure and hierarchy of the law explicit in a way that a computer can be told read. For instance, rather than guessing about where the text of section 101 begins ands ends based upon bolding and spacing, we have been told explicitly thanks to the <section> tags. The text of Section 101 is everything between the labels
<section> and </section>
The US Government took the time to label the exact start and end of every single section, part, etc of every law in the U.S. Code.
This means that a computer no longer has to approximate based upon visual cues or spacing to determine the start or end of the section. The end result is that a computer can unambiguously and accurately extract the text of any section, subsection, chapter, etc in any US Title.
Extracting the Hierarchy
Additionally, the hierarchy of parts within each US Title has been made explicit. For instance, Title 35 in .xml looks something like this:
<title><num value="35">Title 35—</num> <part> <num value="II">PART II—</num> <chapter><num value="2">CHAPTER 2—</num> <section><num value="101">§ 101.</num> </section> </chapter> </part> </title>
This structure means that the computer does not have to guess about the hierarchy (e.g. what Part contains what Chapter) in the law based upon visual clues and indenting. Rather, “Title 35” explicitly contains Part I within its tags:
Including Part I inside the Title tags <title></title> indicates that Part II is below “Title” in the law hierarchy. Similarly, Chapter 2
<chapter><num value="2">CHAPTER 2—</num>
has been explicitly been placed within Part 2’s opening and closing tags <part> </part>.
<part> <num value="II">PART II—</num>
This indicates that Part II is contained within Chapter 2, and so on. By explicitly placing one portion within the tags of the other portion, you are explicitly defining the hierarchy in a way that the computer can read.
The upshot is that computers can now precisely read or “parse” the structure (but not the meaning) of the U.S. code. Because of this, we can begin to create interesting visualizations and apps like the U.S. Code explorer that were not previously easy to in the era of “plain-text” law.
In a follow up post, I will explain more about parsing the U.S. Code in .xml and creating visualizations and apps based upon that
Visualizing the US Code: Law Explorer
I have created a new demonstration application for visualizing and browsing the US Code – the US Code Explorer (beta) (pictured above). Click on the link or photo to see it in action.
The app is meant as an experiment in visualizing and interacting with the US Code since it has been marked-up in xml by the federal government.
I selected Title 35 (Patent Law) as my example.
There is also a second version with three Titles of the US code: Title 35 (Patents), Title 17 ( Copyright), and Title 20 (Education). Due to the size, the second version takes a bit longer to load.
The look and presentation of the visualization parallels the visual style that that I use when I present the law to my students when I teach Patent Law and Introduction to Intellectual Property. During class, the visualizations are static Powerpoint slides. This is a more interactive version.
Please note – this is merely a beta version of this visualization. Neither the computer code, nor the US code, have been thoroughly tested. Please do not rely on this app for the law as there may be errors or omissions.
I will have a follow up post explaining in more depth what I did, but in short, I wrote a parser in python to read through the US Code xml files and extract the law hierarchy from the titles. I then exported the structure in .json format.
And finally, I used the amazing d3 data visualization framework to create the visualization. Here, I borrowed heavily and employed a modified version of Mike Bostock’s d3 collapseable hierarchical tree.
This is the first is a series of data visualization experiments of the US Code that I will employ using the d3 framework. The projects will be found in here.