Developer Overview#

The goal of this project is to provide easy-to-use access to public patent data through a simple API. The general idea is to implement a subset of the Django QuerySet API. functionality for accessing the various sources of patent data. This is a form of the Active Record pattern. To achieve this, the “record” is a Pydantic model, and it has a “manager” that is located at .objects.

Basic Structure#

The basic structure of Patent Client API wrapper looks like this:

  • patent_client

    • _async

      • provider

        • some_api

          • __init__.py

          • api.py

          • model.py

          • manager.py

All wrappers should live in the _async subfolder, and make use of async/await syntax. The asynchronous versions of the wrapper are generated by the script unasync.py in the parallel _sync folder.

The model.py file should contain Pydantic v2 models representing the output of the API, using patent_client.util.pydantic_utils.BaseModel instead of pydantic.BaseModel.

The api.py file should contain at least one class with methods that call various API functions. The actual structure of the api.py file does not matter, but each method should return instances of the Pydantic models defined in model.py.

The manager.py contains a subclass of patent_client.util.manager.Manager and serves as a wrapper over the API classes in apy.py that implement the manager protocol below.

Other files can also be included in the API folder to support other functions. Common ones include:

  • session.py - if any extensive customization of the base `PatentClientSession`` is necessary, it goes here.

  • convert.py - if any data conversion is necessary from the API output to the Pydantic input, put that here.

  • query.py - if complex logic is necessary to convert input to the manager’s .filter method, put that here.

Each of these is discussed in more detail below.

API & PatentClientSession#

The apy.py file should use an instance of patent_client._async.http_client.PatentClientSession to access the methods of the API using only async methods. The PatentClientSession is a subclass of the hishel.AsyncCacheClient which is itself a subclass of httpx.AsyncClient. Patent Client uses this exclusively over the more popular requests library because (1) an increasing number of API’s require the use of HTTP/2, which is not supported by requests, and (2) httpx has support for asyncio. That said, if you’re coming from a requests background, fear not! The httpx interface is nearly (but not entirely) identical to requests.

Models#

Models are Pydantic Models that subclass patent_client.util.pydantic_util.BaseModel. This special version of BaseModel automatically detects the corresponding manager (discussed below) and adds some convenience functions. When used:

  • The Model.objects holds a manager that would retreive every Model in data source

  • The Model supports a .to_dict() method to convert it to a dictionary, and a to_pandas() method to convert it to a Pandas series.

Models can use any Pydantic features, such as computed fields for additional properties. Models may also include:

  • Relationships - that traverse a relationship to a related model.

  • Downloaders - that download some sort of content related to the model.

Relationships#

You can create properties of a Model that link to another model using patent_client.util.related.get_model. With get_model, you can dynamically retrieve another model, and then use an active record call on that model. get_model is preferred over importing the model directly to reduce the risk of circular imports.

Example:

class USApplication(BaseModel):
    patent_number: str
    ...
    @async_property
    def patent(self):
        return self._get_model("patent_client.Patent").objects.get(self.patent_number)

In that example, if you have a USApplication instance, you can get the corresponding patent at USApplication.patent.

Downloaders#

Some models have downloads related to them, like Assignment PDF’s or Patent and Publication documents. Downloaders should:

  • Be initially implemented as an asynchronous .download method that uses the session.download method on the related session.

  • Return a pathlib.Path object to the downloaded file.

Managers#

When filtering, ordering, or values methods are called on a Manager, it returns a new Manager object with a combination of the arguments to the old manager and the new arguments. In this way, any given Manager is immutable. The key data in the Manager is in a ManagerConfig object stored at Manager.config. Managers require subclassing patent_client.util.manager.AsyncManager and defining these methods:

Manager._get_results

This method should return an AsyncIterator across the model results, based on the contents of the ManagerConfig object at Manager.config.

Manager.count

This method should be an async method that returns the number of results to be retrieved by the manager, based on the contents of the ManagerConfig object.

Manager Discovery#

A base, blank manager (that would return all records), is attached to searchable models as Model.objects. This is done automatically when a file is placed in a model.py module and there is a corresponding manager in a manager.py file. For example:

model.py

from patent_client.util.pydantic_util import BaseModel
class Model(BaseModel):
    # Some fields

manager.py

from patent_client.util.manager import Manager
class ModelManager(Manager):
    # an implementation

No additional configuration is needed. If the API is particularly complex, such that model and manager are packages and not modules, this still works as long as the manager is listed in the __init__.py of the manager module. For example: `model/submodel.py``

from patent_client.util.pydantic_util import BaseModel
class SubModel(BaseModel):
    # Some fields

manager/submanager.py

from patent_client.util.manager import Manager
class SubModelManager(Manager):
    # an implementation

Does not work, unless you also have this: manager/__init__.py

from .submanager import SubModelManager

Alternatively, you can also expressly define the location of a manager with a string at __manager__ model.py

from patent_client.util.pydantic_util import BaseModel
class Model(BaseModel):
    __manager__ = "patent_client.manager.ModelManager"