Developer Overview#

The goal of this project is to provide easy-to-use access to public patent data through a simple API. The general idea is to implement a subset of the Django QuerySet API. functionality for accessing the various sources of patent data. This is a form of the Active Record pattern. To achieve this, the “record” is a Pydantic model, and it has a “manager” that is located at .objects.

Basic Structure#

The basic structure of Patent Client API wrapper looks like this:

  • some_api

    • __init__.py

    • api.py

    • model.py

    • manager.py

The model.py file should contain Pydantic v2 models representing the output of the API, using patent_client.util.pydantic_utils.BaseModel instead of pydantic.BaseModel. The api.py file should contain at least one class with methods that call various API functions. The actual structure of the api.py file does not matter, but each method should return instances of the Pydantic models defined in model.py. The manager.py contains a subclass of patent_client.util.manager.Manager and serves as a wrapper over the API classes in apy.py that implement the manager protocol below.

Other files can also be included in the API folder to support other functions. Common ones include:

  • session.py - if any extensive customization of the base `PatentClientSession`` is necessary, it goes here.

  • convert.py - if any data conversion is necessary from the API output to the Pydantic input, put that here.

  • query.py - if complex logic is necessary to convert input to the manager’s .filter method, put that here.

Each of these is discussed in more detail below.

API & PatentClientSession#

The apy.py file should use an instance of patent_client.session.PatentClientSession to access the methods of the API using only async methods. The PatentClientSession is a subclass of the hishel.AsyncCacheClient which is itself a subclass of httpx.AsyncClient. Patent Client uses this exclusively over the more popular requests library because (1) an increasing number of API’s require the use of HTTP/2, which is not supported by requests, and (2) httpx has support for asyncio. That said, if you’re coming from a requests background, fear not! The httpx interface is nearly (but not entirely) identical to requests.

Models#

Models are Pydantic Models that subclass patent_client.util.pydantic_util.BaseModel. This special version of BaseModel automatically detects the corresponding manager (discussed below) and adds some convenience functions. When used:

  • The Model.objects holds a manager that would retreive every Model in data source

  • The Model supports a .to_dict() method to convert it to a dictionary, and a to_pandas() method to convert it to a Pandas series.

Models can use any Pydantic features, such as computed fields for additional properties. Models may also include:

  • Relationships - that traverse a relationship to a related model.

  • Downloaders - that download some sort of content related to the model.

Relationships#

You can create properties of a Model that link to another model using patent_client.util.related.get_model. With get_model, you can dynamically retrieve another model, and then use an active record call on that model. get_model is preferred over importing the model directly to reduce the risk of circular imports.

Example:

class USApplication(BaseModel):
    patent_number: str
    ...
    @property
    def patent(self):
        return get_model("patent_client.Patent").objects.get(self.patent_number)

In that example, if you have a USApplication instance, you can get the corresponding patent at USApplication.patent.

Downloaders#

Some models have downloads related to them, like Assignment PDF’s or Patent and Publication documents. Downloaders should:

  • Be initially implemented as an asynchronous .adownload method that uses the session.adownload method on the related session.

  • Have a companion .download method that simply aliases .adownload using patent_client.util.asyncio_util.run_sync

  • Return a pathlib.Path object to the downloaded file.

Managers#

When filtering, ordering, or values methods are called on a Manager, it returns a new Manager object with a combination of the arguments to the old manager and the new arguments. In this way, any given Manager is immutable. The key data in the Manager is in a ManagerConfig object stored at Manager.config. Managers require subclassing patent_client.util.Manager and defining these methods:

Manager._aget_results

This method should return an AsyncIterator across the model results, based on the contents of the ManagerConfig object at Manager.config.

Manager.alen

This method should be an async method that returns the number of results to be retrieved by the manager, based on the contents of the ManagerConfig object.

Manager Discovery#

A base, blank manager (that would return all records), is attached to searchable models as Model.objects. This is done automatically when a file is placed in a model.py module and there is a corresponding manager in a manager.py file. For example:

model.py

from patent_client.util.pydantic_util import BaseModel
class Model(BaseModel):
    # Some fields

manager.py

from patent_client.util.manager import Manager
class ModelManager(Manager):
    # an implementation

No additional configuration is needed. If the API is particularly complex, such that model and manager are packages and not modules, this still works as long as the manager is listed in the __init__.py of the manager module. For example: `model/submodel.py``

from patent_client.util.pydantic_util import BaseModel
class SubModel(BaseModel):
    # Some fields

manager/submanager.py

from patent_client.util.manager import Manager
class SubModelManager(Manager):
    # an implementation

Does not work, unless you also have this: manager/__init__.py

from .submanager import SubModelManager

Alternatively, you can also expressly define the location of a manager with a string at __manager__ model.py

from patent_client.util.pydantic_util import BaseModel
class Model(BaseModel):
    __manager__ = "patent_client.manager.ModelManager"

Relationships#

In some circumstances, it would be nice to get information related to a model class, even if it resides on another system supported by patent_client. Relationships are how you get there. For example, if you retreive a PtabTrial object from the PTAB API, it has an attribute .us_application that will return a USApplication object from the PEDS API.

The .util package also has two functions that make this possible - ‘one_to_one’ and ‘one_to_many’. Both functions work the same way - they take a first argument, which is a string locating the other object, and then a keyword argument, where the keyword is a filter criteria, and the value is an attribute on the current model to use as the value.

The only difference between the two functions is that “one_to_one” calls objects.get, returing a single object, while “one_to_many” calls objects.filter, and returns a manager of the related objects. For example, we can use these to link the Trials and Documents as below:

class PtabTrial(Model):
    ...
    documents = one_to_many('patent_client.PtabDocument', trial_number='trial_number')
    ...

class PtabDocument(Model):
    ...
    trial = one_to_one('patent_client.PtabTrial', trial_number='trial_number')
    ...

Once these relationships are in place, we can move from one record to the other seamlessly:

>>> from patent_client import PtabProceeding
>>> a = PtabProceeding.objects.get('IPR2017-00001') # doctest +SKIP

>>> a.documents[0] # doctest +ELLIPSIS
PtabDocument(...)

>>> a.documents[0].proceeding # doctest +ELLIPSIS
PtabProceeding(...)