USPTO Bulk Data Storage System#

API URL: https://developer.uspto.gov/api-catalog/bdss

Basic Usage#

This API supports several of the BDSS API endpoints. But there are no general database lookups (i.e. using filter / limit / offset / order_by), only the following methods are supported:

  • Product.objects.filter_latest() - Provides an iterator of Product objects from the /products/all/latest endpoint.

  • Product.objects.filter_by_name(name) - Provides an iterator of Product objects that contain the text provided as name.

  • Product.objects.get_by_short_name(short_name) - Gets a Product object corresponding to the BDSS short name.

  • File.objects.filter_by_short_name(short_name, from_date, to_date) - Returns an iterator of File objects corresponding to the short name, starting with from_date and ending with to_date, inclusive.

The primary endpoint to be used here is the File.objects.filter_by_short_name, which returns all files associted with the short name as provided in the table below. You can optionally include from_date and to_date if it is helpful. Otherwise, they default to the complete range of the related product.

Once you have a File object, you can call either File.download to download the file, or await File.adownload to download it asynchronously.

If you want to view metadata related to the product, then use the Product endpoints.

Supported Products#

Short Name

Title

Product Description

PTOFFACT

Patent application Office actions data (stata (.dta) and MS Excel (.csv))

Contains detailed information on 4.4 million Office actions mailed from 2008 through June 2017 for 2.2 million publicly viewable patent applications. The data are sourced from the text of Non-Final Rejection and Final Rejection Office actions issued by patent examiners to applicants during the patent examination process. The data files include information on grounds for rejection raised, the claims in question, and pertinent prior art.

PTMNFEE2

Patent maintenance fee events and description files

Contains recorded maintenance fee events for patents granted from September 1, 1981 to present. Each new weekly file (Tuesday) is cumulative with a file format of ASCII.

PTAPOATH

Patent and patent application Oath Signature data (JSON and PNG)

Contains a research dataset of images of signatures extracted from patent inventor oath documents used for validation of micro entity certifications ranging from SEP 1998 to SEP 2022. It includes 883,811 documents and oath document signature images, is 40.5 GB of total size, and is broken into 8 zip files for the Patent Application Series Codes 12 to 17, 29, and 35. Each of these zip files contain folders for each application number in a given series. The application folders contain the oath document identifier that includes the image(s) of the signature(s) as PNG, and JSON file that contains the application number, the inventor name(s), and confidence level of the signature extraction algorithm.

MCFAPPL

MCF patent application (patent application sequence)

Current U.S. classification information for all patent applications (non-provisional utility and plant) published by the USPTO from March 15, 2001 to present. Approx. 450 main divisions of technology, called classifications/classes, broken into approx. 150,000 subdivisions, called subclassifications/subclasses. Provided in published patent application number sequence with the current U.S. original classification/subclassification and any cross-reference classification/subclassifications with the format of ASCII text.

CPCMCPT

CPC MCF for U.S. patent grants

Contains Cooperative Patent Classification (CPC) classification information for all Utility patent grants issued by the U.S. Patent and Trademark Office (USPTO) from 1790 to present. It is available as XML with schemas or text monthly (usually by the 15th of the month).

CPCMCAPP

CPC MCF for U.S. patent applications

Contains Cooperative Patent Classification (CPC) classification information for all Utility patent applications published by the U.S. Patent and Trademark Office (USPTO) from March 15, 2001 to present. It is available as XML with schemas or text monthly (usually by the 15th of the month).

PASDL

Patent assignment daily XML (front file)

Contains daily (front file) patent assignment text (no drawings/images) for CY2023 derived from patent assignment recordations made at the USPTO. The file format is eXtensible Markup Language (XML) in accordance with the Patent Assignment Daily XML (PADX) Version 0.3 Document Type Definition (DTD).

PASYR

Patent assignment annual XML (backfile)

Contains (backfile - August 1980 - 2022) patent assignment text (no drawings/images) derived from patent assignment recordations made at the USPTO. The file format is eXtensible Markup Language (XML) in accordance with the Patent Assignment Daily XML (PADX) Version 0.3 Document Type Definition (DTD).

ECORSEXC

Patent assignment economics data (stata (.dta) and MS excel (.csv))

Contains detailed information on roughly 9.6 million patent assignments and other transactions recorded at the USPTO since 1970 and involving roughly 16.5 million patents and patent applications. For more information: http://www.uspto.gov/learning-and-resources/electronic-data-products/data

PTGRDT

Patent grant data/XML

Contains (JAN 2002 - present) the full text, images/drawings, and complex work units (tables, mathematical expressions, chemical structures, and genetic sequence data) of each patent grant (excludes reexaminations) issued weekly (Tuesdays). The file format is extensible Markup Language (XML) in accordance with the Patent Grant Version 4.5 International Common Element (ICE) Document Type Definition (DTD).

PTGRDSGM

Patent grant data/SGML

Contains the full text, images/drawings, and complex work units (tables, mathematical expressions, chemical structures, and genetic sequence data) of each patent grant (excludes reexaminations) issued weekly (Tuesdays) in CY2001. The file format is Standard Generalized Markup Language (SGML) in accordance with the U.S. Patent Grant Version 2.4 Document Type Definition (DTD).

PTBLXML

Patent grant bibliographic data/XML

Contains (JAN 2002 - present) the bibliographic text (front page) of each patent grant issued weekly (Tuesdays) (excludes images/drawings and reexaminations). The file format is eXtensible Markup Language (XML) in accordance with the Patent Grant International Common Element (ICE) Document Type Definition (DTD).

PTBLAPS

Patent grant bibliographic data/APS

Contains (JAN 1976 - DEC 2001) the bibliographic text (front page) of each patent grant issued weekly (Tuesdays) (excludes images/drawings and reexaminations). The file format is a subset of the Green Book, ASCII text.

PTBLSGM

Patent grant bibliographic data/SGML

Contains (JAN 2001 - DEC 2001) the bibliographic text (front page) of each patent grant issued weekly (Tuesdays) in CY2001 (excludes images/drawings and reexaminations). The file format is Standard Generalized Markup Language (SGML) in accordance with the U.S. Patent Grant Version 2.4 Document Type Definition (DTD).

PTGRXML

Patent grant full text data/XML

Contains (JAN 2002 - present) the full text of each patent grant issued weekly (Tuesdays) (excludes images/drawings and reexaminations). The file format is eXtensible Markup Language (XML) in accordance with the Patent Grant International Common Element (ICE) Document Type Definition (DTD).

PTGRAPS

Patent grant full text data/APS

Contains (JAN 1976 - DEC 2001) the full text of each patent grant issued weekly (Tuesdays) (excludes images/drawings and reexaminations). The file format is Green Book, ASCII text and includes tables and “in-line” mathematical equations.

PTGRSGM

Patent grant full text data/SGML

Contains (JAN 2001 - DEC 2001) the full text of each patent grant issued weekly (Tuesdays) in CY2001 (excludes images/drawings and reexaminations). The file format is Standard Generalized Markup Language (SGML) in accordance with the U.S. Patent Grant Version 2.4 Document Type Definition (DTD).

PTGRMP2

Patent grant multi-page PDF images

Contains the images of each patent grant issued weekly (Tuesdays) from July 31, 1790 to present in Portable Document Format (PDF) created from the Patent Grant Single-Page TIFF Images. Also included are older grants that have new Certificates-of-Correction (C-of-C) and rescanned images of older patent grants. Each weekly file contains approx. 6,000 patent grants. Approx. 9 GB (compressed) per week.

SPTIFF

Patent grant single-page TIFF images

Contains the images of each patent grant issued weekly (Tuesdays) from July 31, 1790 to present in Tagged Image File Format (TIFF) Revision 6.0 with CCITT Group 4 Compression (single-page TIFFs). Includes a separate weekly Certificates-of-Correction (C-of-C) file and a daily Certificates file.

PTADJ

Patent term adjustments

The Patent Term Adjustment (PTA) statute at issue is 35 USC 154(b)(2)©, which provides for a deduction from any PTA award “equal to the period of time during which the applicant failed to engage in reasonable efforts to conclude prosecution of the application.” The statute also expressly delegates to the USPTO the authority to “prescribe regulations establishing the circumstances that constitute a failure of an applicant to engage in reasonable efforts to conclude processing or examination of an application.”

REPLACM

Replacement images

These documents replace the original data disseminated by the Electronic Information Products Division (EIPD). For more information on the data, contact ipd@uspto.gov (link sends e-mail).

COFC

Certificates of Correction

Certificates-of-Correction (C-of-C) File (2010 - present) contains a listing of all Certificates of Correction for US patent documents and lists the following fields: Document Number; Publication Date; and Kind of Document Code (always “X6”). Layout information for the Certificates of Correction File is provided and additional information regarding Certificates of Correction can be found on the list at the following location https://www.uspto.gov/patents-getting-started/general-information-concerning-patents#heading-26

REEXAM

Reexaminations

These historical statistics are data for requests for reexamination filed since 7/1/1981 (for ex parte) or since 11/29/1999 (for inter partes). They are updated weekly.

CERTS

Certificates

Certificates include post issuance documents, e.g., ex parte and inter partes reexamination documents. These were weekly and are daily starting on Ocotober of 2012.

RESCANS

Rescans

Bnnnnn.tar: Rescans of miscellaneous documents from miscellaneous years (1790 - present). The filename relates to the Digital Linear Tape (DLT) cartridge.

PTLITIG

Patent Litigation data (stata (.dta) and MS Excel (.csv))

Contains detailed U.S. District Courts patent litigation data on 74,623 unique court cases filed during the period 1963 - 2016. The data was collected from the Public Access to Court Electronic Records (PACER) and RECAP as sources for all of the content. The final output datasets, provided in five different files, include information on the litigating parties involved and their attorneys; the cause of action; the court location; important dates in the litigation history; and, covering over 5 million document level information from the docket reports, descriptions of all documents submitted in a given case.

APPDT

Patent application data/XML

Contains (MAR 15, 2001 - present) the full text, images/drawings, and complex work units (tables, mathematical expressions, chemical structures, and genetic sequence data) of each patent application (non-provisional utility and plant) published weekly (Thursdays). The file format is eXtensible Markup Language (XML) in accordance with the Patent Application Version 4.4 International Common Element (ICE) Document Type Definition (DTD).

APPBLXML

Patent application bibliographic data/XML

Contains (MAR 15, 2001 - present) the bibliographic text (front page) of each patent application (non-provisional utility and plant) published weekly (Thursdays) (excludes images/drawings). The file format is eXtensible Markup Language (XML) in accordance with the Patent Application Version 4.4 International Common Element (ICE) Document Type Definition (DTD).

APPXML

Patent application full text data/XML

Contains (MAR 15, 2001 - present) the full text of each patent application (non-provisional utility and plant) published weekly (Thursdays) (excludes images/drawings). The file format is eXtensible Markup Language (XML) in accordance with the Patent Application Version 4.4 International Common Element (ICE) Document Type Definition (DTD).

APPMP2

Patent application multi-page PDF images

Contains the images of each patent application (non-provisional utility and plant) published weekly (Thursdays) from March 15, 2001 to present in Portable Document Format (PDF) created from the Patent Application Single-Page TIFF Images. Each weekly file contains approx. 6,000 patent applications. Approx. 11 GB (compressed) per week.

APPSP2

Patent application single-page TIFF images

Contains the images of each patent application (non-provisional utility and plant) published weekly (Thursdays) from March 15, 2001 to present in Tagged Image File Format (TIFF) Revision 6.0 with CCITT Group 4 Compression (single-page TIFFs). Each weekly file contains approximately 6,000 published patent applications.

HISTEXC

Historical patent data files (stata (.dta) and MS excel (.csv))

Contains four research datasets containing time series and micro-level data by National Bureau of Economic Research (NBER) technology sub-category on applications, grants, and in-force patents spanning two centuries of innovation. For more information: https://www.uspto.gov/learning-and-resources/ip-policy/economic-research/research-datasets

HISTMST

Historical masterfile

The historical_masterfile contains micro-level application, NBER sub-category, and prosecution data on 2.2 million patent applications filed from 1981 to 2015 and 8.9 million patents issued through 2014.

OUTMN

Monthly

The monthly file contains a monthly count of applications, issued patents, and in-force patents by application status, disposal type (abandoned, issued, or pending), and NBER sub-category from 1981 to 2015.

OUTMND

Monthly disposal

The monthly_disposal dataset contains counts of application by disposal type for each monthly application cohort by NBER sub-category from 1981 to 2015.

ORDERS

Orders

This is one of three Orders intermediate files used to generate the four datasets are also available for download.

ORDCLS

Orders class

This is one of three Orders intermediate files used to generate the four datasets are also available for download.

ORDSBCL

Orders sub-class

This is one of three Orders intermediate files used to generate the four datasets are also available for download.

ECOPAIR

Patent examination research dataset (stata (.dta) and MS excel (.csv))

Contains detailed information on more than 12.5 million publicly viewable patent applications filed with the USPTO along with more than 1 million PCT applications through June 2022. The data files include information on each application’s characteristics, prosecution history, continuation history, claims of foreign priority, patent term adjustment history, publication history, and correspondence address information.

GZLST

Patent official gazettes listing

Published each Tuesday, the Patent Official Gazette contains bibliographic (front page) information, a representative claim, and a drawing (if applicable) of each patent grant issued that week. Includes U.S. Patent and Trademark Office (USPTO) Notices which provide important information and changes in rules concerning both patents and trademarks.

PTAPPCLM

Patent and patent application Claims data (Stata (.dta) and MS Excel (.csv))

Contains detailed information on claims from U.S. patents granted between January 1976 and December 2014 and U.S. patent applications published between March 15, 2001 and December 2014. The dataset is derived from the Patent Grant Full Text and Patent Application Full Text bulk data files. The Office of Chief Economist (OCE) applied a Python algorithm to identify individual claims as well as the dependency relationship between claims. From the parsed claims text, OCE created six data files containing individually-parsed claims, claim-level statistics, and document-level statistics, including newly-developed measures of patent scope.

ECOPATAI

The Artificial Intelligence Patent Dataset (Stata (.dta) and MS Excel (.tsv))

Contains Artificial Intelligence Patent Landscape data classifying 13,244,037 granted patents and PGPubs published from 1976 through 2020 in eight AI component technologies using state-of-the art machine learning based models.

MOONSHOT

Cancer Moonshot data (MS excel (.csv))

This curated dataset consists of 269,353 patent documents (published patent applications and granted patents) spanning the 1976 to 2016 period and is intended to help identify promising R&D on the horizon in diagnostics, therapeutics, data analytics, and model biological systems.

TRASECO

Trademark assignment economics data (stata (.dta) and MS excel (.csv))

Contains detailed information on 1.19 million assignments and other transactions recorded at the USPTO between 1952 and 2022 and involving 2.16 million unique trademark properties. For more information: https://www.uspto.gov/learning-and-resources/ip-policy/economic-research/research-datasets

TRTDXFAP

Trademark daily XML file (TDXF) applications

Trademark Applications: Pending and registered trademark text data (no images) to include word mark, serial number, registration number, filing date, registration date, goods and services, classification number(s), status code(s), design search code(s), pseudo mark(s) in CY2022. The file format is eXtensible Markup Language (XML) in accordance with the U.S. Trademark Assignments Version 2.0 Document Type Definition (DTD).

TRTYRAP

Trademark annual XML applications

Contains (backfile) pending and registered trademark text data (no images) to include word mark, serial number, registration number, filing date, registration date, goods and services, classification number(s), status code(s), design search code(s), pseudo mark(s) from (APR 1884 - DEC 2022). The file format is eXtensible Markup Language (XML) in accordance with the U.S. Trademark Applications Version 2.0 Document Type Definition (DTD).

TTABTDXF

Trademark daily XML file (TDXF) trademark trial and appeal board (TTAB)

Trademark Trial and Appeal Board (TTAB): TTAB text data (no images) in CY2023. The file format is eXtensible Markup Language (XML) in accordance with the U.S. Trademark Trial and Appeal Board Version 1.0 Document Type Definition (DTD).

TTABYR

Trademark annual XML TTAB

Contains (backfile) Trademark Trial and Appeal Board text data (OCT 2, 1951 - DEC 31, 2022). The file format is eXtensible Markup Language (XML) in accordance with the U.S. Trademark Trial and Appeal Board Version 1.0 Document Type Definition (DTD).

TRTDXFAG

Trademark daily XML file (TDXF) assignments

Trademark Assignments: Assignment text data (no images) in CY2023. The file format is eXtensible Markup Language (XML) in accordance with the U.S. Trademark Assignments Version 0.4 Document Type Definition (DTD).

TRTYRAG

Trademark annual XML assignments

Contains (backfile) Trademark Assignment text data (JAN 3, 1955 - DEC 31, 2022). The file format is eXtensible Markup Language (XML) in accordance with the U.S. Trademark Assignments Version 0.4 Document Type Definition (DTD).

TR24BOX

Trademark 24 hour box and supplemental

The USPTO Trademark Daily Application Image 24 Hour Box (XML/TIFF/JPEG) consists of two datasets: (A) 24 Hour Box file, hryymmdd.zip & (B) 24 Hour Box Supplemental file, hrsyymmdda.zip, with optional files, hrsyymmddb.zip, hrsyymmddc.zip, etc. (A) The 24 Hour Box zipfile contains images of daily Trademark applications in either JPG format (black and white, grayscale, or color) or TIFF format (black and white) processed through the Trademark Image Capture and Retrieval System (TICRS) and viewable using any standard image viewer. (B) The 24 Hour Box Supplemental zipfile(s) contain Trademark cropped TIFF (black and white) or cropped JPG (color) image files from paper submissions of Trademark applications.

TRCFECO2

Trademark case file economics data (stata (.dta) and MS excel (.csv))

Contains detailed information on 11.6 million trademark applications filed with or registrations issued by the USPTO between 1870 and February 2022. For more information: https://www.uspto.gov/learning-and-resources/ip-policy/economic-research/research-datasets