Developer Overview#
The goal of this project is to provide easy-to-use access to public patent data through a simple API.
The general idea is to implement a subset of the
Django QuerySet API. functionality for accessing
the various sources of patent data. This is a form of the Active Record
pattern. To achieve this, the “record” is a Pydantic model, and it has a “manager” that is located at .objects
.
Basic Structure#
The basic structure of Patent Client API wrapper looks like this:
some_api
__init__.py
api.py
model.py
manager.py
The model.py
file should contain Pydantic v2 models representing the output of the API, using patent_client.util.pydantic_utils.BaseModel
instead of pydantic.BaseModel
. The api.py
file should contain at least one class with methods that call various API functions. The actual structure of the api.py
file does not matter, but each method should return instances of the Pydantic models defined in model.py
. The manager.py
contains a subclass of patent_client.util.manager.Manager
and serves as a wrapper over the API classes in apy.py
that implement the manager protocol below.
Other files can also be included in the API folder to support other functions. Common ones include:
session.py
- if any extensive customization of the base `PatentClientSession`` is necessary, it goes here.convert.py
- if any data conversion is necessary from the API output to the Pydantic input, put that here.query.py
- if complex logic is necessary to convert input to the manager’s.filter
method, put that here.
Each of these is discussed in more detail below.
API & PatentClientSession#
The apy.py
file should use an instance of patent_client.session.PatentClientSession
to access the methods of the API using only async
methods. The PatentClientSession
is a subclass of the hishel.AsyncCacheClient
which is itself a subclass of httpx.AsyncClient
. Patent Client uses this exclusively over the more popular
requests
library because (1) an increasing number of API’s require the use of HTTP/2, which is not supported by requests, and (2) httpx has support for asyncio
. That said, if you’re coming from a requests
background, fear not! The httpx interface is nearly (but not entirely) identical to requests
.
Models#
Models are Pydantic Models that subclass patent_client.util.pydantic_util.BaseModel
. This special version of BaseModel
automatically
detects the corresponding manager (discussed below) and adds some convenience functions. When used:
The
Model.objects
holds a manager that would retreive every Model in data sourceThe
Model
supports a.to_dict()
method to convert it to a dictionary, and ato_pandas()
method to convert it to a Pandas series.
Models can use any Pydantic features, such as computed fields for additional properties. Models may also include:
Relationships - that traverse a relationship to a related model.
Downloaders - that download some sort of content related to the model.
Relationships#
You can create properties of a Model that link to another model using patent_client.util.related.get_model
. With get_model
, you can dynamically retrieve
another model, and then use an active record call on that model. get_model
is preferred over importing the model directly to reduce the risk of circular imports.
Example:
class USApplication(BaseModel):
patent_number: str
...
@property
def patent(self):
return get_model("patent_client.Patent").objects.get(self.patent_number)
In that example, if you have a USApplication instance, you can get the corresponding patent at USApplication.patent.
Downloaders#
Some models have downloads related to them, like Assignment PDF’s or Patent and Publication documents. Downloaders should:
Be initially implemented as an asynchronous
.adownload
method that uses thesession.adownload
method on the related session.Have a companion
.download
method that simply aliases.adownload
usingpatent_client.util.asyncio_util.run_sync
Return a
pathlib.Path
object to the downloaded file.
Managers#
When filtering, ordering, or values methods are called on a Manager, it returns a new Manager object with a combination of the arguments to the old manager and the new arguments. In this way, any given Manager is immutable. The key data in the Manager is in a ManagerConfig
object stored at Manager.config
.
Managers require subclassing patent_client.util.Manager
and defining these methods:
Manager._aget_results
This method should return an AsyncIterator
across the model results, based on the contents of the ManagerConfig
object at Manager.config
.
Manager.alen
This method should be an async method that returns the number of results to be retrieved by the manager, based on the contents of the ManagerConfig
object.
Manager Discovery#
A base, blank manager (that would return all records), is attached to searchable models as Model.objects. This is done automatically when
a file is placed in a model.py
module and there is a corresponding manager in a manager.py
file. For example:
model.py
from patent_client.util.pydantic_util import BaseModel
class Model(BaseModel):
# Some fields
manager.py
from patent_client.util.manager import Manager
class ModelManager(Manager):
# an implementation
No additional configuration is needed. If the API is particularly complex, such that model
and manager
are packages and not modules, this still works as long as the manager is
listed in the __init__.py
of the manager
module. For example:
`model/submodel.py``
from patent_client.util.pydantic_util import BaseModel
class SubModel(BaseModel):
# Some fields
manager/submanager.py
from patent_client.util.manager import Manager
class SubModelManager(Manager):
# an implementation
Does not work, unless you also have this:
manager/__init__.py
from .submanager import SubModelManager
Alternatively, you can also expressly define the location of a manager with a string at __manager__
model.py
from patent_client.util.pydantic_util import BaseModel
class Model(BaseModel):
__manager__ = "patent_client.manager.ModelManager"
Relationships#
In some circumstances, it would be nice to get information related to a model class, even if it resides on another system supported by patent_client. Relationships are how you get there. For example, if you retreive a PtabTrial object from the PTAB API, it has an attribute .us_application that will return a USApplication object from the PEDS API.
The .util package also has two functions that make this possible - ‘one_to_one’ and ‘one_to_many’. Both functions work the same way - they take a first argument, which is a string locating the other object, and then a keyword argument, where the keyword is a filter criteria, and the value is an attribute on the current model to use as the value.
The only difference between the two functions is that “one_to_one” calls objects.get, returing a single object, while “one_to_many” calls objects.filter, and returns a manager of the related objects. For example, we can use these to link the Trials and Documents as below:
class PtabTrial(Model):
...
documents = one_to_many('patent_client.PtabDocument', trial_number='trial_number')
...
class PtabDocument(Model):
...
trial = one_to_one('patent_client.PtabTrial', trial_number='trial_number')
...
Once these relationships are in place, we can move from one record to the other seamlessly:
>>> from patent_client import PtabProceeding
>>> a = PtabProceeding.objects.get('IPR2017-00001') # doctest +SKIP
>>> a.documents[0] # doctest +ELLIPSIS
PtabDocument(...)
>>> a.documents[0].proceeding # doctest +ELLIPSIS
PtabProceeding(...)