Citations from patents to other patents have frequently been employed in studies of innovation, but these citations have many limitations. By contrast, citations from patents to non-patent materials—especially scientific articles—promise to be more useful but are much more difficult to discern given that they appear in patent documents as unstructured text. We present methods for automatically linking patents to scientific papers from 1800-2018 and share the results publicly. Moreover, we characterize the performance of our algorithms and present ROC curves so that researchers can select data according to their sensitivity to false positives vs. false negatives. Our hope is that publicly-available patent citations to science fuel research on innovation, knowledge diffusion, technology commercialization, and other topics. Download at https://zenodo.org/record/3238722.
Journal Commercial Impact Factor
Journals are commonly ranked based on Impact Factor, calculated for year t as the number of times articles from years t-1 and t-2 were cited during year t, divided by the number of articles published during years t-1 and t-2. We introduce a complementary measure of commercial impact by counting citations from patents instead of from papers, using the data from Marx & Fuegi (2019). Download at https://github.com/mattmarx/jcif.
"Hubs" of commercial R&D
In Bikard & Marx (2019) we find that academic discoveries conducted in proximity to "hubs" of commercial R&D in the same field are much more likely to be built upon by firms (as measured by citations from their patents, as calculated in Marx & Fuegi (2019). This dataset defines those hubs of commercial R&D, listing for each USPTO subclass the latitude and longitude coordinates the act as the centroids of hubs for those subclasses. Download at https://archive.org/details/hubstopost_201903.
Government-funded research increasingly fuels innovation
Replication code and data, including for our Science article, including a) nature of reliance on government support for all patents 1926-2017; b) counts of novel words per patent; c) OCRed patents from 1926-1975, including non-patent references; d) code for classifying patents as government-owned or acknowledging direct government support e) CPC classifications for USPTO patents 1926-2017. Download at https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/DKESRC.