{"id":414,"date":"2015-04-09T20:07:34","date_gmt":"2015-04-10T02:07:34","guid":{"rendered":"http:\/\/harrysurden.com\/wordpress\/?p=414"},"modified":"2015-04-09T20:07:34","modified_gmt":"2015-04-10T02:07:34","slug":"structuring-us-law-part-1","status":"publish","type":"post","link":"https:\/\/www.harrysurden.com\/wordpress\/archives\/414","title":{"rendered":"Structuring US Law: Part 1"},"content":{"rendered":"<div style=\"font-size: 11pt;\">\n<div style=\"font-size: 11pt;\">The <a href=\"http:\/\/uscode.house.gov\/\">U.S. Code<\/a> &#8211; (the primary collection of Federal Statutory Law) &#8211; has become structured. It always had an <em>implicit <\/em>structure. However, since 2013 it has had an <em>explicit<\/em>, <a href=\"http:\/\/en.wikipedia.org\/wiki\/Machine-readable_data\">machine-readable<\/a> structure.<\/div><p>&nbsp;<\/p>\n<div style=\"font-size: 11pt;\">\n<div id=\"attachment_416\" style=\"width: 568px\" class=\"wp-caption aligncenter\"><a href=\"http:\/\/harrysurden.com\/projects\/visual\/USCode_D3\/US_Code_Tree_d3_1g.html\" target=\"_blank\"><img aria-describedby=\"caption-attachment-416\" decoding=\"async\" loading=\"lazy\" class=\"wp-image-416\" src=\"http:\/\/harrysurden.com\/wordpress\/wp-content\/uploads\/2015\/03\/US-Code-Explorer-Screen-shot.png\" alt=\"US Code Explorer Screen shot\" width=\"558\" height=\"309\" srcset=\"https:\/\/www.harrysurden.com\/wordpress\/wp-content\/uploads\/2015\/03\/US-Code-Explorer-Screen-shot.png 1016w, https:\/\/www.harrysurden.com\/wordpress\/wp-content\/uploads\/2015\/03\/US-Code-Explorer-Screen-shot-300x166.png 300w, https:\/\/www.harrysurden.com\/wordpress\/wp-content\/uploads\/2015\/03\/US-Code-Explorer-Screen-shot-620x344.png 620w\" sizes=\"(max-width: 558px) 100vw, 558px\" \/><\/a><\/p>\n<p id=\"caption-attachment-416\" class=\"wp-caption-text\">US Code Explorer &#8211; Click to Open Explorer<\/p>\n<\/div>\n<p style=\"font-size: 11pt;\">The explicit structuring of the U.S. law allows for increased computational analysis and visualization of the law, like <a href=\"http:\/\/harrysurden.com\/projects\/visual\/USCode_D3\/US_Code_Tree_d3_1g.html\" target=\"_blank\">this experimental demonstration app for navigating and visualizing the Titles of the U.S. Code<\/a> that I recently created.<\/p>\n<p style=\"font-size: 11pt;\">This post will discuss what it means for US law to be structured and why this enabled increased data analysis and visualization.<\/p>\n<h3 style=\"font-size: 11pt;\">Structured Law and Computer Analysis<\/h3>\n<p style=\"font-size: 11pt;\">Around 2013, the U.S. government released the United States Code &#8211; i<a href=\"http:\/\/uscode.house.gov\/download\/download.shtml\">n xml (extensible markup language) format<\/a>. Releasing the laws in &#8220;.xml&#8221; means that the federal laws have now been given an explicit structure that can be read by computers.<\/p>\n<p style=\"font-size: 11pt;\">To see why explicitly structuring the law in &#8220;machine-readable&#8221; form allows for more advanced computer analysis, let&#8217;s first examine the concept of explicit computer-readable structure and what this has to do with law.<\/p>\n<h3 style=\"font-size: 11pt;\">The Structure of the United States Code<\/h3>\n<p style=\"font-size: 11pt;\">Most of the laws that the US Congress passes are collected in the US Code which is a large compilation of federal statutory law.<\/p>\n<p style=\"font-size: 11pt;\">The <a href=\"http:\/\/uscode.house.gov\/\">US Code has a structure<\/a>. At the highest structural level, the Federal Laws are divided into over <a href=\"http:\/\/uscode.house.gov\/\">50 &#8220;Titles&#8221;<\/a>.<\/p>\n<pre style=\"font-size: 11pt;\">Title 15 - Commerce and Trade\r\n.. \r\nTitle 25 - Internal Revenue Code\r\n..\r\nTitle 35 - Patent Law<\/pre>\n<p style=\"font-size: 11pt;\">Loosely speaking, a &#8220;Title&#8221; corresponds to a different topical area for lawmaking. For instance, Title 35 contains most of the the Patent Laws, Title 20 contains many of the Education Laws. (Note that some Titles are a hodgepodge of unrelated topics housed under one document &#8211; e.g. Title 15 &#8211; Commerce and Trade, and the laws regulating some topics are found across multiple titles). However, the fact that laws are loosely placed by topic within Title is one form of overall structure.<\/p>\n<h3 style=\"font-size: 11pt;\">Title Hierarchy: Parts -&gt; Chapters -&gt; Sections<\/h3>\n<p style=\"font-size: 11pt;\">Each Title, in turn, has its own structure and hierarchy. Every title is divided into smaller parts and sections in different levels. A typical structure of a title of the US code will have it divided into<\/p>\n<p style=\"font-size: 11pt;\"><strong>Chapters, Sub-Chapters, Parts<\/strong><\/p>\n<p style=\"font-size: 11pt;\"><strong>Sections, Sub-Sections, Paragraphs<\/strong><\/p>\n<p style=\"font-size: 11pt;\">and so on. For instance, Title 35 &#8211; the Patent Code &#8211; has multiple patent laws located located in different parts of the overall hierarchy. Those laws related to the Patentability of Inventions in Chapter 10, etc.<\/p>\n<pre style=\"font-size: 11pt;\">Title 35 - Patents\r\n  Part 1 -United States Patent And Trademark Office\r\n     CHAPTER 1\u2014 Establishment, Officers And Employees\r\n     CHAPTER 2\u2014 Proceedings In The Patent And Trademark Office\r\n        \u00a7\u202f21. Filing Date And Day For Taking Action\r\n  ....\r\n  Part 2 -Patentability Of Inventions And Grant Of Patents\r\n  ... \r\n     CHAPTER 10\u2014 Patentability Of Inventions\r\n        \u00a7\u202f100. Definitions\r\n        \u00a7\u202f101. Inventions Patentable<\/pre>\n<p style=\"font-size: 11pt;\">Where is the law that tells us what types of inventions are patentable? That is located in <a href=\"https:\/\/www.law.cornell.edu\/uscode\/text\/35\/101\">Section 101 &#8211; &#8220;Inventions Patentable&#8221;. <\/a>Within the overall hierarchy of Title 35, it&#8217;s located in <a href=\"https:\/\/www.law.cornell.edu\/uscode\/text\/35\/101\">Title 35 &#8211; Part 2 &#8211; Chapter 10 &#8211; Section 101<\/a><a href=\"https:\/\/www.law.cornell.edu\/uscode\/text\/35\/101\">.<\/a><\/p>\n<pre style=\"font-size: 11pt;\">Title 35\r\n  Part 2\r\n    Chapter 10\r\n      Section 101<\/pre>\n<p style=\"font-size: 11pt;\">And the text of section 101 is<\/p>\n<div class=\"quote1\" style=\"padding-left: 30px;\">TITLE 35 &#8211; PATENTS<br \/>\nSECTION \u00a7101 &#8211; Inventions patentable Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.<\/div>\n<h3 style=\"font-size: 11pt;\">Plain Text Law: Unstructured Text<\/h3>\n<p style=\"font-size: 11pt;\">The section just presented is an example of what might be called an <a href=\"http:\/\/en.wikipedia.org\/wiki\/Plain_text\"><strong>&#8220;unstructured&#8221;<\/strong><\/a> (really &#8220;semi-structured&#8221;, but henceforth &#8220;plain text&#8221;) version of the law. A &#8220;plain text&#8221; version of the law means that the law as we normally see it written &#8211; in ordinary sentences designed for people to read (as opposed to computers).<\/p>\n<div class=\"quote1\" style=\"padding-left: 30px;\">SECTION \u00a7101 Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter&#8230;<\/div>\n<p style=\"font-size: 11pt;\">I used the phrase &#8220;designed for <em>people<\/em> to read&#8221; to emphasize a point: such a plain text sentence may not be easy for <em><strong>computers <\/strong>to read. <\/em>Computers are likely to find laws written in plain-text &#8211; like the one above &#8211; difficult to read. &#8220;Plain text&#8221; can be contrasted against <em>&#8220;<\/em><a href=\"http:\/\/en.wikipedia.org\/wiki\/Machine-readable_data\"><em>machine-readable&#8221; text<\/em><\/a>, like the example below.<\/p>\n<pre class=\"brush: xml; title: ; notranslate\" title=\"\">\r\n&lt;section number=&quot;101&quot;&gt;\r\n &lt;sectionText&gt;\r\n   Whoever invents or discovers any new and useful process,\r\n  machine, manufacture, or composition of matter\r\n &lt;\/sectionText&gt;\r\n&lt;\/section&gt;\r\n<\/pre>\n<p style=\"font-size: 11pt;\">Computers prefer text to be rigidly structured and precisely labeled in this way. Such text is &#8220;structured&#8221; (and machine-readable) because a computer can, following rigid rules, methodically go through and unambiguously identify each part. In the example above, there is legal language within &lt;sectionText&gt;, and the computer knows exactly where the &lt;sectionText&gt; language begins and where it ends.<\/p>\n<h3 style=\"font-size: 11pt;\">Plain Text Law: Implicit Structure<\/h3>\n<p style=\"font-size: 11pt;\">A typical law written as plain text does have a structure, but that structure is <em>implicit<\/em>. The structure includes what legal text goes with what section (i.e. do the words &#8220;<em>Whoever invents a new..&#8221;<\/em> go with Section 101 or Section 102), and the hierarchy (i.e. What parts are under what other parts &#8211; does Section 101 belong under Chapter 10 or Chapter 11). Let&#8217;s understand why the structure of a plain-text law is implicit and therefore difficult for computers to read.<\/p>\n<pre>TITLE 35 - PATENTS - Part II - Chapter 10\r\nSECTION \u00a7101 - Inventions patentable Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.<\/pre>\n<p style=\"font-size: 11pt;\">If you&#8217;re an attorney, you might be thinking, &#8220;There is an obvious structure in the law above &#8211; it is divided up into chapters, sections, etc. I can see that plainly.&#8221; If an attorney were to look at a printout of Title 35, she could see that it is divided into 5 &#8220;Parts&#8221;, each &#8220;Part&#8221; contains multiple &#8220;Chapters&#8221;, each &#8220;Chapter&#8221; in turn contains &#8220;Sections&#8221;, etc.) However, nothing in a printout of Title 35 that explicitly tells us about the hierarchy and organization of sections. There is nothing explicitly in the document that says, &#8220;A &#8216;Part&#8217; is above a &#8216;Chapter&#8217;, and a &#8216;Chapter&#8217; is above a &#8216;Section&#8217;.<\/p>\n<p style=\"font-size: 11pt;\">Rather, that organization is implicit in the way the text is displayed and labeled given legal conventions. Attorneys learn to parse this hierarchy by relying upon common conventions about how law is labeled and structured, and general legal knowledge. Attorneys from training and experience &#8211; understand that in federal law there is a structure, and that &#8220;Chapters&#8221; come below &#8220;Parts&#8221; in the hierarchy, and that &#8220;Sections&#8221; are contained within &#8220;Chapters&#8221;.<\/p>\n<h3 style=\"font-size: 11pt;\">Visual Cues and Implicit Structure of Law<\/h3>\n<p style=\"font-size: 11pt;\">In looking at the law, we also rely upon visual clues to show what portions are sub-parts of other portions. Often, when the law is printed, it is indented by several spaces each new level in order to make the hierarchy apparent, and sometimes emphasis like bold, etc, are used.<\/p>\n<p style=\"font-size: 11pt;\"><a href=\"http:\/\/harrysurden.com\/wordpress\/wp-content\/uploads\/2015\/04\/Screen-Shot-2015-04-09-at-11.46.42-AM.png\"><img decoding=\"async\" loading=\"lazy\" class=\"aligncenter wp-image-499 size-full\" src=\"http:\/\/harrysurden.com\/wordpress\/wp-content\/uploads\/2015\/04\/Screen-Shot-2015-04-09-at-11.46.42-AM.png\" alt=\"Screen Shot 2015-04-09 at 11.46.42 AM\" width=\"441\" height=\"145\" srcset=\"https:\/\/www.harrysurden.com\/wordpress\/wp-content\/uploads\/2015\/04\/Screen-Shot-2015-04-09-at-11.46.42-AM.png 441w, https:\/\/www.harrysurden.com\/wordpress\/wp-content\/uploads\/2015\/04\/Screen-Shot-2015-04-09-at-11.46.42-AM-300x99.png 300w\" sizes=\"(max-width: 441px) 100vw, 441px\" \/><\/a><\/p>\n<p style=\"font-size: 11pt;\">Additionally, we rely on visual cues to understand the different elements (e.g. headings vs. text of the law), and where one element begins and another ends. For instance, in looking at the above plain text printing of Title 101, we understand that the heading of the section is &#8220;<strong>Inventions Patentable&#8221;<\/strong>, and that the heading ends with the word &#8220;<strong>Patentable&#8221; <\/strong> where the bolded text ends. Thanks to bolding and spacing, we under that the text of the section begins with &#8220;Whoever invents&#8230;&#8221; The change in formatting and spacing indicates visually where the heading begins, and the content ends.<\/p>\n<h3 style=\"font-size: 11pt;\">Unstructured Law: Difficult For Computers<\/h3>\n<p style=\"font-size: 11pt;\">The implicit structure in &#8220;plain text&#8221; sentences &#8211; like the law above &#8211; are obvious for people to see. However, to a computer, this implicit structure is typically difficult to unambiguously understand. A computer would not be able to understand (without accuracy issues) the same implicit cues (spacing, headings) that humans easily rely upon to separate out the law into its components and subcomponents.<\/p>\n<p style=\"font-size: 11pt;\">In general, computers are not as good as people at understanding arbitrary visual cues &#8211; like Bold and spacing &#8211; that indicate the various parts. A computer might, for instance, not understand where the heading &#8220;Inventions Patentable&#8221; ends, and the content of the law &#8220;Whoever invents&#8221; begins. A computer might accidentally read the whole paragraph as one entity, &#8220;Inventions Patentable Whoever invents..&#8221;<\/p>\n<p style=\"font-size: 11pt;\">While in principle you can program a computer to make educated guesses about the structure based upon the formatting and spacing, the computer is liable to make errors in &#8220;parsing&#8221; or reading the law and its structure if there are even minor changes.<\/p>\n<p style=\"font-size: 11pt;\">In sum, when the law is printed as plain text &#8211; as it has traditionally been printed for hundreds of years &#8211; very basic computer tasks &#8211; such as separating out a Title into its different parts and sub-parts (e.g. Headings, content, chapters,etc), are be comparatively difficult to do with a high level of accuracy.<\/p>\n<p style=\"font-size: 11pt;\">A simple task that merely involved reading the plain text law and counting the number of Sections in Title 35 &#8211; an easy task for a person &#8212; would risk errors in a computer.<\/p>\n<h3 style=\"font-size: 11pt;\">US Code &#8211; Released as XML<\/h3>\n<p style=\"font-size: 11pt;\">In 2013, the U.S. House of Representatives released the titles of the U.S. Code as structured data in xml format. (Previously the <a href=\"https:\/\/www.law.cornell.edu\/\">Cornell Legal Information Institute<\/a> had released an unofficial xml version of the federal law as well).<\/p>\n<p style=\"font-size: 11pt;\">The fact that the law is now marked-up in .xml means that the Section 101 of the Patent Code now looks like this:<\/p>\n<pre class=\"brush: xml; title: ; notranslate\" title=\"\">\r\n\r\n&lt;section style=&quot;-uslm-lc:I80&quot;\r\nid=&quot;id223e3b13-a7cf-11e4-a0e4-817d0c170cd7&quot;\r\nidentifier=&quot;\/us\/usc\/t35\/s101&quot;&gt;\r\n\r\n&lt;num value=&quot;101&quot;&gt;\u00a7\u202f101.&lt;\/num&gt;\r\n\r\n&lt;heading&gt; Inventions patentable&lt;\/heading&gt;&lt;\/p&gt;\r\n&lt;p style=&quot;font-size: 11pt;&quot;&gt;&lt;content&gt;\r\n&lt;p style=&quot;-uslm-lc:I11&quot; class=&quot;indent0&quot;&gt;\r\n\r\nWhoever invents or discovers any new and useful process,\r\nmachine, manufacture, or composition of matter,\r\nor any new and useful improvement thereof,\r\nmay obtain a patent therefor, subject to the\r\nconditions and requirements of this title.\r\n\r\n&lt;\/p&gt;&lt;\/p&gt;\r\n&lt;p style=&quot;font-size: 11pt;&quot;&gt;&lt;\/content&gt;\r\n&lt;\/section&gt;\r\n<\/pre>\n<h3>Computer Friendly Law<\/h3><p>This version of the law is much less human-friendly to read, but much more computer friendly to read. \u00a0Computers excel when there are\u00a0precise, unambiguous rules to follow.<\/p><p>The\u00a0\u00a0.xml version of the U.S. Code makes the structure and hierarchy of the law explicit in a way that a computer can be told read. For instance, rather than guessing about where the text of section 101 begins ands ends based upon bolding and spacing, we have been told explicitly thanks to the <strong style=\"font-size: 11pt;\">&lt;section&gt;<\/strong> tags. The text of Section 101 is everything between the labels<\/p><p><strong>&lt;section&gt; <\/strong>and<strong> &lt;\/section&gt;<\/strong><\/p><p><span style=\"font-size: 14.6666669845581px; line-height: 22px;\">The<\/span> US Government took the time to label the <em>exact<\/em> start and end of every single section, part, etc of every law in the U.S. Code.<\/p><p>This means that a\u00a0computer no longer has to approximate based upon visual cues or spacing to determine the start or end of the section. The end result is that a computer can unambiguously and accurately extract the text of any section, subsection, chapter, etc in any US Title.<\/p>\n<h3>Extracting the Hierarchy<\/h3><p>Additionally, the hierarchy of parts within each\u00a0US Title has\u00a0been made explicit. For instance, Title 35 in .xml looks something like this:<\/p>\n<pre class=\"brush: xml; title: ; notranslate\" title=\"\">\r\n\r\n&lt;title&gt;&lt;num value=&quot;35&quot;&gt;Title 35\u2014&lt;\/num&gt;\r\n\r\n &lt;part&gt;\u00a0&lt;num value=&quot;II&quot;&gt;PART II\u2014&lt;\/num&gt;\r\n\r\n   &lt;chapter&gt;&lt;num value=&quot;2&quot;&gt;CHAPTER 2\u2014&lt;\/num&gt;\r\n\r\n    &lt;section&gt;&lt;num value=&quot;101&quot;&gt;\u00a7\u202f101.&lt;\/num&gt;\r\n    &lt;\/section&gt;\r\n\r\n   &lt;\/chapter&gt;\r\n\r\n  &lt;\/part&gt;\r\n\r\n&lt;\/title&gt;\r\n\r\n<\/pre>\n<\/div><p>This structure means that the computer does not have to guess about the hierarchy (e.g. what Part contains what Chapter) in the law based upon visual clues and indenting. Rather, &#8220;Title 35&#8221; explicitly contains Part I within its tags:<br \/>\n&lt;Title 35&gt;<br \/>\n&nbsp&lt;Part I&gt;<br \/>\n&nbsp&lt;Part II&gt;<br \/>\n&nbsp\u00a0&lt;Chapter II&gt;<br \/>\n&nbsp&lt;Part III&gt;<br \/>\n&nbsp&lt;Part IV&gt;<br \/>\n&nbsp&lt;Part V&gt;<\/p><p>&lt;\/Title&gt;<\/p><p><span style=\"font-size: 11pt; line-height: 1.5;\">Including Part I inside\u00a0the Title tags &lt;title&gt;&lt;\/title&gt; indicates that Part II is below \u00a0&#8220;Title&#8221; in the law hierarchy. Similarly, Chapter 2<\/span><\/p>\n<pre>   &lt;chapter&gt;&lt;num value=\"2\"&gt;CHAPTER 2\u2014&lt;\/num&gt;<\/pre><p>has been explicitly been placed within Part 2&#8217;s opening and closing tags &lt;part&gt; &lt;\/part&gt;.<\/p>\n<pre><strong>  &lt;part&gt; &lt;num value=\"II\"&gt;PART II\u2014&lt;\/num&gt;<\/strong><\/pre><p>This indicates\u00a0that Part II is contained within Chapter 2, and so on. By explicitly placing one portion <em>within<\/em> the tags of the other portion, you are explicitly defining the hierarchy in a way that the computer can read.<\/p><p>The upshot is that computers can now precisely read or &#8220;parse&#8221; the structure (but not the meaning) of the U.S. code. Because of this, we can begin to create interesting visualizations and apps like <a href=\"http:\/\/harrysurden.com\/projects\/visual\/USCode_D3\/US_Code_Tree_d3_1g.html\">the U.S. Code explorer<\/a> that were not previously easy to in the era of &#8220;plain-text&#8221; law.<\/p><p>In a follow up post, I will explain more about parsing the U.S. Code in .xml and creating visualizations and apps based upon that<\/p>\n<\/div>\n","protected":false},"excerpt":{"rendered":"<p>The U.S. Code &#8211; (the primary collection of Federal Statutory Law) &#8211; has become structured. It always had an implicit structure. However, since 2013 it has had an explicit, machine-readable structure.&nbsp; The explicit structuring of the U.S. law allows for increased computational analysis and visualization of the law, like this experimental demonstration app for navigating [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":[],"categories":[7,13],"tags":[],"_links":{"self":[{"href":"https:\/\/www.harrysurden.com\/wordpress\/wp-json\/wp\/v2\/posts\/414"}],"collection":[{"href":"https:\/\/www.harrysurden.com\/wordpress\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.harrysurden.com\/wordpress\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.harrysurden.com\/wordpress\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.harrysurden.com\/wordpress\/wp-json\/wp\/v2\/comments?post=414"}],"version-history":[{"count":10,"href":"https:\/\/www.harrysurden.com\/wordpress\/wp-json\/wp\/v2\/posts\/414\/revisions"}],"predecessor-version":[{"id":575,"href":"https:\/\/www.harrysurden.com\/wordpress\/wp-json\/wp\/v2\/posts\/414\/revisions\/575"}],"wp:attachment":[{"href":"https:\/\/www.harrysurden.com\/wordpress\/wp-json\/wp\/v2\/media?parent=414"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.harrysurden.com\/wordpress\/wp-json\/wp\/v2\/categories?post=414"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.harrysurden.com\/wordpress\/wp-json\/wp\/v2\/tags?post=414"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}