Developer Overview#
The goal of this project is to provide easy-to-use access to public patent data through a simple API.
The general idea is to implement a subset of the
Django QuerySet API. functionality for accessing
the various sources of patent data. This is a form of the Active Record
pattern. To achieve this, the “record” is a Pydantic model, and it has a “manager” that is located at .objects
.
Basic Structure#
The basic structure of Patent Client API wrapper looks like this:
patent_client
_async
provider
some_api
__init__.py
api.py
model.py
manager.py
All wrappers should live in the _async
subfolder, and make use of async/await syntax. The asynchronous versions of the wrapper are generated by the script unasync.py
in the parallel _sync
folder.
The model.py
file should contain Pydantic v2 models representing the output of the API, using patent_client.util.pydantic_utils.BaseModel
instead of pydantic.BaseModel
.
The api.py
file should contain at least one class with methods that call various API functions. The actual structure of the api.py
file does not matter, but each method should return instances of the Pydantic models defined in model.py
.
The manager.py
contains a subclass of patent_client.util.manager.Manager
and serves as a wrapper over the API classes in apy.py
that implement the manager protocol below.
Other files can also be included in the API folder to support other functions. Common ones include:
session.py
- if any extensive customization of the base `PatentClientSession`` is necessary, it goes here.convert.py
- if any data conversion is necessary from the API output to the Pydantic input, put that here.query.py
- if complex logic is necessary to convert input to the manager’s.filter
method, put that here.
Each of these is discussed in more detail below.
API & PatentClientSession#
The apy.py
file should use an instance of patent_client._async.http_client.PatentClientSession
to access the methods of the API using only async
methods. The PatentClientSession
is a subclass of the hishel.AsyncCacheClient
which is itself a subclass of httpx.AsyncClient
. Patent Client uses this exclusively over the more popular
requests
library because (1) an increasing number of API’s require the use of HTTP/2, which is not supported by requests, and (2) httpx has support for asyncio
. That said, if you’re coming from a requests
background, fear not! The httpx interface is nearly (but not entirely) identical to requests
.
Models#
Models are Pydantic Models that subclass patent_client.util.pydantic_util.BaseModel
. This special version of BaseModel
automatically
detects the corresponding manager (discussed below) and adds some convenience functions. When used:
The
Model.objects
holds a manager that would retreive every Model in data sourceThe
Model
supports a.to_dict()
method to convert it to a dictionary, and ato_pandas()
method to convert it to a Pandas series.
Models can use any Pydantic features, such as computed fields for additional properties. Models may also include:
Relationships - that traverse a relationship to a related model.
Downloaders - that download some sort of content related to the model.
Relationships#
You can create properties of a Model that link to another model using patent_client.util.related.get_model
. With get_model
, you can dynamically retrieve
another model, and then use an active record call on that model. get_model
is preferred over importing the model directly to reduce the risk of circular imports.
Example:
class USApplication(BaseModel):
patent_number: str
...
@async_property
def patent(self):
return self._get_model("patent_client.Patent").objects.get(self.patent_number)
In that example, if you have a USApplication instance, you can get the corresponding patent at USApplication.patent.
Downloaders#
Some models have downloads related to them, like Assignment PDF’s or Patent and Publication documents. Downloaders should:
Be initially implemented as an asynchronous
.download
method that uses thesession.download
method on the related session.Return a
pathlib.Path
object to the downloaded file.
Managers#
When filtering, ordering, or values methods are called on a Manager, it returns a new Manager object with a combination of the arguments to the old manager and the new arguments. In this way, any given Manager is immutable. The key data in the Manager is in a ManagerConfig
object stored at Manager.config
.
Managers require subclassing patent_client.util.manager.AsyncManager
and defining these methods:
Manager._get_results
This method should return an AsyncIterator
across the model results, based on the contents of the ManagerConfig
object at Manager.config
.
Manager.count
This method should be an async method that returns the number of results to be retrieved by the manager, based on the contents of the ManagerConfig
object.
Manager Discovery#
A base, blank manager (that would return all records), is attached to searchable models as Model.objects. This is done automatically when
a file is placed in a model.py
module and there is a corresponding manager in a manager.py
file. For example:
model.py
from patent_client.util.pydantic_util import BaseModel
class Model(BaseModel):
# Some fields
manager.py
from patent_client.util.manager import Manager
class ModelManager(Manager):
# an implementation
No additional configuration is needed. If the API is particularly complex, such that model
and manager
are packages and not modules, this still works as long as the manager is
listed in the __init__.py
of the manager
module. For example:
`model/submodel.py``
from patent_client.util.pydantic_util import BaseModel
class SubModel(BaseModel):
# Some fields
manager/submanager.py
from patent_client.util.manager import Manager
class SubModelManager(Manager):
# an implementation
Does not work, unless you also have this:
manager/__init__.py
from .submanager import SubModelManager
Alternatively, you can also expressly define the location of a manager with a string at __manager__
model.py
from patent_client.util.pydantic_util import BaseModel
class Model(BaseModel):
__manager__ = "patent_client.manager.ModelManager"