Generate Continuity Graphs#
Another task you can accomplish with patent_client is automatic generation of continuity graphs, like this one:
To do this, we’re going to use patent_client in conjunction with pandas, networkx and pygraphviz. PyGraphviz is a set of python bindings to the GraphViz data graph visualization library. We can use it to generate figures that illustrate the relationships between patent applications.
Let’s begin with the standard imports, and indicate one of the applications in the family of interest. Here, we’ve picked a patent application owned by Tesla as an example:
[1]:
import pandas as pd
from patent_client import USApplication
target = '15384723'
Collect the Data#
To collect all the data we need for our graph, we’re going to need to iterate through the application, and every application related to it. We do that through an iterative algorithm that keeps a list of:
Applications it needs to retreive
Applications it has already retreived
Parent / Child data
A list of missing cases
Starting with the target patent, the algorithm retreives a case, records all the relationship data to the relevant lists, and adds a reference to the application itself to the applications list. With the data recorded, it then determines if any of the cases related to the current one haven’t been visted yet. If they have not been visited, it adds them to the “to_visit” list. This process continues until the “to_visit” list is empty. The result is:
applications -> USApplication objects for the target and every related case
parent_entries -> Relationship objects for all parents of all applications
child_entries -> Relationship objects for all children of all applications
orphans -> application numbers for applications that were not found (typically an unpublished continuation application)
[5]:
visited = list()
to_visit=[target,]
applications = list()
parent_entries = list()
child_entries = list()
orphans = list()
while to_visit:
to_visit = list(sorted(to_visit))
app_no = to_visit.pop(0)
try:
app = USApplication.objects.get(app_no)
except Exception as e:
print(e, app_no)
visited.append(app_no)
orphans.append(app_no)
continue
applications.append(app)
parent_entries += app.parent_continuity
child_entries += app.child_continuity
related_app_nos = [a.parent_appl_id for a in app.parent_continuity] + [a.child_appl_id for a in app.child_continuity]
for app in related_app_nos:
if (app not in visited and app not in to_visit):
to_visit.append(app)
visited.append(app_no)
For every application, the parent records indicate a clear relationship - e.g. continuation / divisional / etc, so we’re only going to work with the parent entry data, which we can convert to a Pandas dataframe as:
[6]:
cont_df = pd.DataFrame.from_records(r.to_dict() for r in parent_entries)
cont_df.head()
[6]:
child_appl_id | parent_app_filing_date | parent_app_status | parent_appl_id | relationship | |
---|---|---|---|---|---|
0 | 15384723 | 2015-09-23 | Patented | 14862609 | is a Continuation of |
1 | 15384723 | 2014-05-19 | Patented | 14281679 | is a Continuation of |
2 | 15384723 | 2013-04-19 | Patented | 13866214 | is a Continuation of |
3 | 15384723 | 2010-05-18 | Patented | 12782413 | is a Division of |
4 | 15384723 | 2009-01-29 | Abandoned | 12322218 | is a Continuation in part of |
Filter the Relationships#
Now we have a problem. If we just graph this data, we get a total jumble. A lot of this parent data is duplicated. For example, assume we have 3 applications in the following relationship:
Application A - is a CON of - Application B - is a CON of - Application C
From the data we’ve pulled, we’ll see three separate records:
child |
relationship |
parent |
---|---|---|
A |
is a CON of |
B |
B |
is a CON of |
C |
A |
is a CON of |
C |
But we only want two records. We don’t care if A is a CON of C, because when we produce the graph, that will be shown by the fact that A is a CON of B, which is a CON of C. So we need to simplify the records. We use another algorithm. Fortunately, NetworkX can come to the rescue. In graph theory, this operation is called a “transitive reduction”, which NetworkX implements. So let’s do that!
[26]:
import networkx as nx
from networkx.algorithms import transitive_reduction
graph = nx.from_pandas_edgelist(cont_df, source="parent_appl_id", target="child_appl_id", edge_attr="relationship", create_using=nx.DiGraph)
simple_graph = transitive_reduction(graph)
simple_df = nx.to_pandas_edgelist(simple_graph, source="parent_appl_id", target="child_appl_id", edge_key="relationship")
new_cont_df = (simple_df.set_index(['parent_appl_id', 'child_appl_id'])
.join(
cont_df.set_index(['parent_appl_id', 'child_appl_id']),
how="left")
.reset_index()
)
new_cont_df
[26]:
parent_appl_id | child_appl_id | parent_app_filing_date | parent_app_status | relationship | |
---|---|---|---|---|---|
0 | 14862609 | 15384723 | 2015-09-23 | Patented | is a Continuation of |
1 | 14281679 | 14862609 | 2014-05-19 | Patented | is a Continuation of |
2 | 13866214 | 14281679 | 2013-04-19 | Patented | is a Continuation of |
3 | 12782413 | 13866214 | 2010-05-18 | Patented | is a Division of |
4 | 12322218 | 12378790 | 2009-01-29 | Abandoned | is a Continuation of |
5 | 12322218 | 12782413 | 2009-01-29 | Abandoned | is a Continuation in part of |
6 | 12380427 | 12782413 | 2009-02-26 | Patented | is a Continuation in part of |
7 | 12380427 | 12381846 | 2009-02-26 | Patented | is a Continuation of |
8 | 12380427 | 12381853 | 2009-02-26 | Patented | is a Continuation of |
9 | 12378790 | 12586493 | 2009-02-19 | Abandoned | is a Division of |
Make the Relationships Look Nice#
Now we’re going to modify the filtered_cont_df DataFrame to make it look nicer. We will simplify each of the relationship types into neat 2-3 letter codes.
[27]:
def assign_identifiers(string):
string = string.lower()
if ('continuation-in-part' in string
or 'continuation in part' in string):
return 'CIP'
elif 'continuation' in string:
return 'CON'
elif 'division' in string:
return 'DIV'
elif 'provisional' in string:
return 'PRV'
elif 'stage' in string:
return 'NS'
new_cont_df['relationship'] = new_cont_df['relationship'].apply(assign_identifiers)
Generate the Graph#
Now for the heavy-lifting. This is going to proceed in two stages.
We’re going to add in
node
objects for each application, with text that’s descriptive of the caseWe’re going to add
edge
objects for each relationship, with the relationship code
The result can then be displayed in the notebook by simply calling the Graph object (dot), or by calling dot.render() GraphViz will save a file to disk containing the image.
[29]:
from graphviz import Digraph
import textwrap
from titlecase import titlecase
from collections import OrderedDict
import datetime
# Create the Graph Object
dot = Digraph(format='svg', node_attr={'shape': 'rectangle', 'style': 'filled'})
# Stage 1 - Add the Nodes
for app in applications:
title = '\n'.join(textwrap.wrap(titlecase(app.patent_title), 30))
string = 'U.S. App. ' + app.appl_id + f"\n{title}\n(Filed {app.app_filing_date})"
color = 'orange'
# Add patent information for issued patents
if app.patent_number:
string = f'U.S. Pat. {app.patent_number}\n{title}\n(Filed {app.app_filing_date}, Issued {app.patent_issue_date})'
color = 'greenyellow'
elif app.app_early_pub_number:
status_text = app.app_status.split('-')[0].replace('Mailed', '').replace('Filed', '').replace(',', '\n').strip()
status = f"{status_text} ({app.app_status_date})"
string = string + f'\n{status}'
if 'abandon' in app.app_status.lower():
color = 'red'
else:
color = 'lightblue'
dot.node(app.appl_id, string, fillcolor=color)
# Stage 2 - add the Edges
for i, connection in new_cont_df.iterrows():
dot.edge(connection['parent_appl_id'], connection['child_appl_id'], connection['relationship'], splines='ortho')
dot#.render()
[29]:
[ ]: