Generate Continuity Graphs#

Another task you can accomplish with patent_client is automatic generation of continuity graphs, like this one:

continuity_graph

To do this, we’re going to use patent_client in conjunction with pandas, networkx and pygraphviz. PyGraphviz is a set of python bindings to the GraphViz data graph visualization library. We can use it to generate figures that illustrate the relationships between patent applications.

Let’s begin with the standard imports, and indicate one of the applications in the family of interest. Here, we’ve picked a patent application owned by Tesla as an example:

[1]:
import pandas as pd
from patent_client import USApplication

target = '15384723'

Collect the Data#

To collect all the data we need for our graph, we’re going to need to iterate through the application, and every application related to it. We do that through an iterative algorithm that keeps a list of:

  • Applications it needs to retreive

  • Applications it has already retreived

  • Parent / Child data

  • A list of missing cases

Starting with the target patent, the algorithm retreives a case, records all the relationship data to the relevant lists, and adds a reference to the application itself to the applications list. With the data recorded, it then determines if any of the cases related to the current one haven’t been visted yet. If they have not been visited, it adds them to the “to_visit” list. This process continues until the “to_visit” list is empty. The result is:

  • applications -> USApplication objects for the target and every related case

  • parent_entries -> Relationship objects for all parents of all applications

  • child_entries -> Relationship objects for all children of all applications

  • orphans -> application numbers for applications that were not found (typically an unpublished continuation application)

[5]:
visited = list()
to_visit=[target,]

applications = list()
parent_entries = list()
child_entries = list()
orphans = list()

while to_visit:
    to_visit = list(sorted(to_visit))
    app_no = to_visit.pop(0)
    try:
        app = USApplication.objects.get(app_no)
    except Exception as e:
        print(e, app_no)
        visited.append(app_no)
        orphans.append(app_no)
        continue
    applications.append(app)

    parent_entries += app.parent_continuity
    child_entries += app.child_continuity
    related_app_nos = [a.parent_appl_id for a in app.parent_continuity] + [a.child_appl_id for a in app.child_continuity]

    for app in related_app_nos:
        if (app not in visited and app not in to_visit):
            to_visit.append(app)
    visited.append(app_no)

For every application, the parent records indicate a clear relationship - e.g. continuation / divisional / etc, so we’re only going to work with the parent entry data, which we can convert to a Pandas dataframe as:

[6]:
cont_df = pd.DataFrame.from_records(r.to_dict() for r in parent_entries)
cont_df.head()
[6]:
child_appl_id parent_app_filing_date parent_app_status parent_appl_id relationship
0 15384723 2015-09-23 Patented 14862609 is a Continuation of
1 15384723 2014-05-19 Patented 14281679 is a Continuation of
2 15384723 2013-04-19 Patented 13866214 is a Continuation of
3 15384723 2010-05-18 Patented 12782413 is a Division of
4 15384723 2009-01-29 Abandoned 12322218 is a Continuation in part of

Filter the Relationships#

Now we have a problem. If we just graph this data, we get a total jumble. A lot of this parent data is duplicated. For example, assume we have 3 applications in the following relationship:

Application A - is a CON of - Application B - is a CON of - Application C

From the data we’ve pulled, we’ll see three separate records:

child

relationship

parent

A

is a CON of

B

B

is a CON of

C

A

is a CON of

C

But we only want two records. We don’t care if A is a CON of C, because when we produce the graph, that will be shown by the fact that A is a CON of B, which is a CON of C. So we need to simplify the records. We use another algorithm. Fortunately, NetworkX can come to the rescue. In graph theory, this operation is called a “transitive reduction”, which NetworkX implements. So let’s do that!

[26]:
import networkx as nx
from networkx.algorithms import transitive_reduction

graph = nx.from_pandas_edgelist(cont_df, source="parent_appl_id", target="child_appl_id", edge_attr="relationship", create_using=nx.DiGraph)
simple_graph = transitive_reduction(graph)
simple_df = nx.to_pandas_edgelist(simple_graph, source="parent_appl_id", target="child_appl_id", edge_key="relationship")
new_cont_df = (simple_df.set_index(['parent_appl_id', 'child_appl_id'])
               .join(
                   cont_df.set_index(['parent_appl_id', 'child_appl_id']),
                   how="left")
               .reset_index()
              )
new_cont_df
[26]:
parent_appl_id child_appl_id parent_app_filing_date parent_app_status relationship
0 14862609 15384723 2015-09-23 Patented is a Continuation of
1 14281679 14862609 2014-05-19 Patented is a Continuation of
2 13866214 14281679 2013-04-19 Patented is a Continuation of
3 12782413 13866214 2010-05-18 Patented is a Division of
4 12322218 12378790 2009-01-29 Abandoned is a Continuation of
5 12322218 12782413 2009-01-29 Abandoned is a Continuation in part of
6 12380427 12782413 2009-02-26 Patented is a Continuation in part of
7 12380427 12381846 2009-02-26 Patented is a Continuation of
8 12380427 12381853 2009-02-26 Patented is a Continuation of
9 12378790 12586493 2009-02-19 Abandoned is a Division of

Make the Relationships Look Nice#

Now we’re going to modify the filtered_cont_df DataFrame to make it look nicer. We will simplify each of the relationship types into neat 2-3 letter codes.

[27]:
def assign_identifiers(string):
    string = string.lower()
    if ('continuation-in-part' in string
        or 'continuation in part' in string):
        return 'CIP'
    elif 'continuation' in string:
        return 'CON'
    elif 'division' in string:
        return 'DIV'
    elif 'provisional' in string:
        return 'PRV'
    elif 'stage' in string:
        return 'NS'

new_cont_df['relationship'] = new_cont_df['relationship'].apply(assign_identifiers)

Generate the Graph#

Now for the heavy-lifting. This is going to proceed in two stages.

  1. We’re going to add in node objects for each application, with text that’s descriptive of the case

  2. We’re going to add edge objects for each relationship, with the relationship code

The result can then be displayed in the notebook by simply calling the Graph object (dot), or by calling dot.render() GraphViz will save a file to disk containing the image.

[29]:
from graphviz import Digraph
import textwrap
from titlecase import titlecase
from collections import OrderedDict
import datetime

# Create the Graph Object
dot = Digraph(format='svg', node_attr={'shape': 'rectangle', 'style': 'filled'})

# Stage 1 - Add the Nodes
for app in applications:
    title = '\n'.join(textwrap.wrap(titlecase(app.patent_title), 30))
    string = 'U.S. App. ' + app.appl_id + f"\n{title}\n(Filed {app.app_filing_date})"
    color = 'orange'

    # Add patent information for issued patents
    if app.patent_number:
        string = f'U.S. Pat. {app.patent_number}\n{title}\n(Filed {app.app_filing_date}, Issued {app.patent_issue_date})'
        color = 'greenyellow'

    elif app.app_early_pub_number:
        status_text = app.app_status.split('-')[0].replace('Mailed', '').replace('Filed', '').replace(',', '\n').strip()
        status = f"{status_text} ({app.app_status_date})"
        string = string + f'\n{status}'
        if 'abandon' in app.app_status.lower():
            color = 'red'
        else:
            color = 'lightblue'

    dot.node(app.appl_id, string, fillcolor=color)

# Stage 2 - add the Edges
for i, connection in new_cont_df.iterrows():
        dot.edge(connection['parent_appl_id'], connection['child_appl_id'], connection['relationship'], splines='ortho')

dot#.render()
[29]:
../_images/examples_5_-_Generate_Continuity_Graph_11_0.svg
[ ]: