Generate Continuity Graphs#

Another task you can accomplish with patent_client is automatic generation of continuity graphs, like this one:

continuity_graph

To do this, we’re going to use patent_client in conjunction with pandas, networkx and pygraphviz. PyGraphviz is a set of python bindings to the GraphViz data graph visualization library. We can use it to generate figures that illustrate the relationships between patent applications.

Let’s begin with the standard imports, and indicate one of the applications in the family of interest. Here, we’ve picked a patent application owned by Tesla as an example:

[1]:

import pandas as pd
from patent_client import USApplication

target = '15384723'

Collect the Data#

To collect all the data we need for our graph, we’re going to need to iterate through the application, and every application related to it. We do that through an iterative algorithm that keeps a list of:

Applications it needs to retreive
Applications it has already retreived
Parent / Child data
A list of missing cases

Starting with the target patent, the algorithm retreives a case, records all the relationship data to the relevant lists, and adds a reference to the application itself to the applications list. With the data recorded, it then determines if any of the cases related to the current one haven’t been visted yet. If they have not been visited, it adds them to the “to_visit” list. This process continues until the “to_visit” list is empty. The result is:

applications -> USApplication objects for the target and every related case
parent_entries -> Relationship objects for all parents of all applications
child_entries -> Relationship objects for all children of all applications
orphans -> application numbers for applications that were not found (typically an unpublished continuation application)

[5]:

visited = list()
to_visit=[target,]

applications = list()
parent_entries = list()
child_entries = list()
orphans = list()

while to_visit:
    to_visit = list(sorted(to_visit))
    app_no = to_visit.pop(0)
    try:
        app = USApplication.objects.get(app_no)
    except Exception as e:
        print(e, app_no)
        visited.append(app_no)
        orphans.append(app_no)
        continue
    applications.append(app)

    parent_entries += app.parent_continuity
    child_entries += app.child_continuity
    related_app_nos = [a.parent_appl_id for a in app.parent_continuity] + [a.child_appl_id for a in app.child_continuity]

    for app in related_app_nos:
        if (app not in visited and app not in to_visit):
            to_visit.append(app)
    visited.append(app_no)

For every application, the parent records indicate a clear relationship - e.g. continuation / divisional / etc, so we’re only going to work with the parent entry data, which we can convert to a Pandas dataframe as:

[6]:

cont_df = pd.DataFrame.from_records(r.to_dict() for r in parent_entries)
cont_df.head()

[6]:

	child_appl_id	parent_app_filing_date	parent_app_status	parent_appl_id	relationship
0	15384723	2015-09-23	Patented	14862609	is a Continuation of
1	15384723	2014-05-19	Patented	14281679	is a Continuation of
2	15384723	2013-04-19	Patented	13866214	is a Continuation of
3	15384723	2010-05-18	Patented	12782413	is a Division of
4	15384723	2009-01-29	Abandoned	12322218	is a Continuation in part of

Filter the Relationships#

Now we have a problem. If we just graph this data, we get a total jumble. A lot of this parent data is duplicated. For example, assume we have 3 applications in the following relationship:

Application A - is a CON of - Application B - is a CON of - Application C

From the data we’ve pulled, we’ll see three separate records:

child	relationship	parent
A	is a CON of	B
B	is a CON of	C
A	is a CON of	C

But we only want two records. We don’t care if A is a CON of C, because when we produce the graph, that will be shown by the fact that A is a CON of B, which is a CON of C. So we need to simplify the records. We use another algorithm. Fortunately, NetworkX can come to the rescue. In graph theory, this operation is called a “transitive reduction”, which NetworkX implements. So let’s do that!

[26]:

import networkx as nx
from networkx.algorithms import transitive_reduction

graph = nx.from_pandas_edgelist(cont_df, source="parent_appl_id", target="child_appl_id", edge_attr="relationship", create_using=nx.DiGraph)
simple_graph = transitive_reduction(graph)
simple_df = nx.to_pandas_edgelist(simple_graph, source="parent_appl_id", target="child_appl_id", edge_key="relationship")
new_cont_df = (simple_df.set_index(['parent_appl_id', 'child_appl_id'])
               .join(
                   cont_df.set_index(['parent_appl_id', 'child_appl_id']),
                   how="left")
               .reset_index()
              )
new_cont_df

[26]:

	parent_appl_id	child_appl_id	parent_app_filing_date	parent_app_status	relationship
0	14862609	15384723	2015-09-23	Patented	is a Continuation of
1	14281679	14862609	2014-05-19	Patented	is a Continuation of
2	13866214	14281679	2013-04-19	Patented	is a Continuation of
3	12782413	13866214	2010-05-18	Patented	is a Division of
4	12322218	12378790	2009-01-29	Abandoned	is a Continuation of
5	12322218	12782413	2009-01-29	Abandoned	is a Continuation in part of
6	12380427	12782413	2009-02-26	Patented	is a Continuation in part of
7	12380427	12381846	2009-02-26	Patented	is a Continuation of
8	12380427	12381853	2009-02-26	Patented	is a Continuation of
9	12378790	12586493	2009-02-19	Abandoned	is a Division of

Make the Relationships Look Nice#

Now we’re going to modify the filtered_cont_df DataFrame to make it look nicer. We will simplify each of the relationship types into neat 2-3 letter codes.

[27]:

def assign_identifiers(string):
    string = string.lower()
    if ('continuation-in-part' in string
        or 'continuation in part' in string):
        return 'CIP'
    elif 'continuation' in string:
        return 'CON'
    elif 'division' in string:
        return 'DIV'
    elif 'provisional' in string:
        return 'PRV'
    elif 'stage' in string:
        return 'NS'

new_cont_df['relationship'] = new_cont_df['relationship'].apply(assign_identifiers)

Generate the Graph#

Now for the heavy-lifting. This is going to proceed in two stages.

We’re going to add in node objects for each application, with text that’s descriptive of the case
We’re going to add edge objects for each relationship, with the relationship code

The result can then be displayed in the notebook by simply calling the Graph object (dot), or by calling dot.render() GraphViz will save a file to disk containing the image.

[29]:

from graphviz import Digraph
import textwrap
from titlecase import titlecase
from collections import OrderedDict
import datetime

# Create the Graph Object
dot = Digraph(format='svg', node_attr={'shape': 'rectangle', 'style': 'filled'})

# Stage 1 - Add the Nodes
for app in applications:
    title = '\n'.join(textwrap.wrap(titlecase(app.patent_title), 30))
    string = 'U.S. App. ' + app.appl_id + f"\n{title}\n(Filed {app.app_filing_date})"
    color = 'orange'

    # Add patent information for issued patents
    if app.patent_number:
        string = f'U.S. Pat. {app.patent_number}\n{title}\n(Filed {app.app_filing_date}, Issued {app.patent_issue_date})'
        color = 'greenyellow'

    elif app.app_early_pub_number:
        status_text = app.app_status.split('-')[0].replace('Mailed', '').replace('Filed', '').replace(',', '\n').strip()
        status = f"{status_text} ({app.app_status_date})"
        string = string + f'\n{status}'
        if 'abandon' in app.app_status.lower():
            color = 'red'
        else:
            color = 'lightblue'

    dot.node(app.appl_id, string, fillcolor=color)

# Stage 2 - add the Edges
for i, connection in new_cont_df.iterrows():
        dot.edge(connection['parent_appl_id'], connection['child_appl_id'], connection['relationship'], splines='ortho')

dot#.render()

[29]:

../_images/examples_5_-_Generate_Continuity_Graph_11_0.svg

[ ]: