Università Commerciale Luigi Bocconi
Facoltà di economia



Economia e management dei mercati internazionali
e delle nuove tecnologie (CLEMIT-LS)

 


Anno accademico 2005-2006

 

 

The link between science and technology:
exploring the network of inventors
and scientific authors
in the semiconductor industry

 

 

 

 

Relatore   

Prof. Stefano Breschi                                                                  

 

Controrelatore

Prof. Fabio Montobbio

 



Christian Catalini – 996868



 

Index

Introduction………………………………………………………………………5

Chapter I
1.0 Science and technology: waiting for gatekeepers………9

1.1 Tracking knowledge spillovers…………………………………..13

1.2 Knowledge spillovers: going local……………………………..15

1.3 Mobility and the “invisible hand”……………………………….16

1.4 Social proximity………………………………………………………….19


Chapter II

2.0 Patents and non-patent literature..…………………………..23

2.1 Science intensity and dynamism: semiconductors…..24

2.2 Matching NPL citations and ISI titles…………………………27

2.3 Identifying highly cited articles………………………………...28

2.4 Building the network of inventors……………………………..31

 

Chapter III

3.0 Real gatekeepers?……………………………………………………..37

3.1 Degree centrality……………………………………….………………38

3.2 Betweenness centrality…………………………………………..…40

 

Chapter IV

4.0 Who produces and who exploits science?................43

 

Chapter V

5.0 Patents and NPL co-location: the tip of an iceberg….47

 

 

 

 

 

Chapter VI

6.0 Down to the waterline………………………………………………..51

6.1 Social proximity in numbers………………………………………52

6.2 Drawing a control sample………………………………………….55

6.3 Geographical proximity with geocoding……………….……57

6.4 Social proximity, reloaded….………………………………….….59

 

 

Chapter VII

7.0 Wrapping up……………….……………………………………….……67

 

 

 

 


Introduction

 

The interaction between science and technology (S&T) is a complex and heterogeneous process. While it is clear enough that science contributes to the development of technology, the mechanism behind this relationship remains largely unexplored. Economists have tried to interpolate S&T with varying degrees of success, using different approaches and levels of focus.

The main stylized fact that can be inferred from Romer’s model of endogenous growth (1990) is the role played by knowledge spillovers: since knowledge is considered a public good, its creation implies a spillover. Science generates a positive externality, regarded as crucial[1] for long-run macroeconomic growth (Grossman and Helpman, 1991). At the same time the non-rival, public-good nature of knowledge poses incentive and contractual problems[2]: inefficiency (Romer, 1990) and the risk of market failure require public intervention for the provision of basic research and human capital. In this framework, science acts on the supply side of the linear model of creation, transfer and diffusion of innovation (Kline and Rosenberg, 1986). Technical knowledge is seen as non-excludable information and is supposed to flow between different economic actors and places.

Finding a way to track and quantify knowledge flows is therefore a key step in verifying the link between science, technology and economic growth. Krugman’s assumption (1991) that these kinds of flows are invisible and leave no “paper trail” is questioned by Jaffe et al. (1993), who first used citations to patents as a measure of spillovers. Jaffe’s experiment and its improvements (Thompson, 2005) show not only that knowledge flows are indeed an observable fact, but also a local one in terms of both technological and geographical proximity. Co-localization increases the probability of information exchange, since spillovers are actually bounded in space. Citations between patents, even if useful for the generic identification of a flow, contain little information on the relationship between S&T, as patents are mainly an input indicator of the innovative activity carried out by private firms. Although universities have increased their patenting activity, their share is still marginal and is more likely to represent the results of applied research than of basic science.

Patents also contain information which has remained relatively unexploited so far, i.e. citations to non-patent literature (NPL[3]). These actually represent a more credible means of tracing knowledge flow between S&T than citations among patents. One of the original aspects of this research is exactly that of testing the extent of this interaction and checking if it is local in terms of technological, geographical and social proximity. While measuring the first two types of distance requires heavy parsing and data mining procedures, introducing the concept of social distance adds further complexity and methodological issues.

Patent documents and cited scientific articles are used here to build a comprehensive social network of the authors and inventors in the semiconductor industry. The networks of discovery and invention are then compared in order to analyze the mechanisms underlying the diffusion of new information. The aim is to test if individuals who are both scientific authors and inventors act as technological gatekeepers, reconciling the different incentive schemes and interests of S&T. If authors-inventors are indeed influential people in the network, they are likely to control and bridge most of the knowledge flows: by studying them, we should be able to shed some light on the actual functioning of the link between the two communities of S&T.

This microeconomic approach would also allow us to further analyze the interplay between geographical and social distance, in order to estimate the relative effects of social ties[4] and geography on the probability of a citation from a patent to a scientific article. If linked individuals are able to exchange complex and tacit knowledge even without being co-located in the same place, then social networks become an effective diffusion vehicle for new knowledge.

“Open Science” and “Proprietary technology” could actually relate using the inherent qualities of gatekeepers, a common vocabulary and the underlying social network, which make a “community of experts” a tangible reality.

The work is organized as follows: the first chapter presents a brief overview of the relevant literature; the second chapter lays out the methodological framework, data collection, processing and elaboration steps then introduces an analysis of the network of inventors; the third illustrates a test for the relative positions of authors-inventors, cited and highly-cited scientific authors in terms of centrality and betweenness; the fourth looks in detail at the citation paths (patent-NPL) between different countries, focusing on the role of European science in worldwide patents; the fifth tests if knowledge flows between science (NPL articles) and technology (citing patents) are indeed localized, first at the country level and then at the US state level; the sixth chapter deals with the network of authors-inventors: data on actual citations is used together with a control sample to test the role of social and geographical proximity between citing-cited couples; the seventh chapter concludes.



1.0 Science and technology: waiting for gatekeepers.

 

The history of technical change offers an insight into the relationship between science and technology. According to Mokyr (2002), the turning point was the Industrial Enlightenment and the diffusion of what he defines as “useful arts”. The exploration and cataloguing of artisan practices, the writing of the first encyclopaedias and technical publications reduced the cost of accessing knowledge. This process of rationalization and codification introduced the scientific and experimental method to the fabricants: science, defined in a broader sense as general knowledge, descended from the laurels of the savants and started to interact with the less esteemed, but no less valuable, useful knowledge. The feedback between techniques and theory led to the development of new technical languages, standards (like weights and measures) and incentives to support innovation. The institutionalization of R&D came only much later, when German chemical firms and the first major US corporations of the 20th century started to innovate systematically. Individual inventors who had until then determined the irregular pace of technological progress, were substituted by large and organized R&D labs. Freeman (1994) goes even further, identifying the turning point in the Manhattan Project with its frightful outcome on the Second World War and the Big Science projects of the post-war period. Science had persuaded even the sceptical that a strong belief in material progress, R&D and human reason could ensure endless prosperity.

At that point, the dynamics of S&T intertwined, while the two communities behind them started to interact and exchange information. Different evolutionary logics (Gittelman and Kogut, 2001), incentive structures and appropriability regimes still separate “Open Science” from “Proprietary technology”. Dasgupta and David (1994) theorize a “New Economics of Science”, starting from the assumption that S&T do not differ in terms of output: codified, non-rival information can be a substitute for tacit and excludable knowledge. Codification is always an expensive process, even if the costs associated with it are quite heterogeneous across fields. Following Polanyi’s analogy (1966), what is brought into focus (codified) and what remains in the background (tacit) depends on incentives.

Scientific and industrial research are very different in their reward mechanisms, disclosure rules and epistemic cultures: the networks of discovery and invention are still separate, even if distinctive signs of convergence can be traced in fields where the S&T interaction is particularly evident. Like a “pair of dancers”, S&T are moving to the same music, but each one follows its own steps (Toynbee, 1963 and De Solla Price, 1965).

Scientists share their results in order to gather feedback and credit: this cumulative development minimizes the duplication of effort (Nelson, 1959) and fosters diffusion. Public interest for disclosure is combined with the individual search for recognition and prestige. Merton (1957), who introduced the concept of science as an institution, identifies in the priority-based credit mechanism, the rule of sharing and the continuous reliability check (e.g. through established empirical procedures) some of the distinctive characteristics of the scientific community.

 

"...essentially, scholarship is a conspiracy to pool the capabilities of many men, and science is an even more radical conspiracy that structures this pooling so that the totality of this sort of knowledge can grow more rapidly than any individual can move by himself" (De Solla Price, 1971)

 

Proprietary technology must deal instead with the commercial interests of its sponsors and is usually shared only inside the boundaries of the firm or the project. This intramural nature is often defended by postponing part of the codification to the last phases, through active monitoring, secrecy or contractual obligations. The decision to patent an invention is taken with great care and involves a careful estimate of costs and benefits. Science is usually costly for the firm (Stern, 1999), but it is a necessary evil to motivate and attract scientific researchers. Basic research implies an expensive externality, as the positive outcomes can never be entirely appropriated within the boundaries of the firm that financed it and some degree of spillover to competitors will be inevitable.

Rosenberg (1990) shows that firms need to share at least part of their research in order to enhance their absorptive capability (Cohen & Levinthal, 1990), monitor competitors and evaluate their own applied results. Basic research keeps the firm plugged into the scientific network and sustains its reputation. Furthermore science acts as a map (Fleming and Sorenson, 2004), allowing more than a local search for a local maximum, while technology provides a focusing device for scientific research (Rosenberg, 1982).

The interaction between S&T is multifaceted: the linear model used by Vannevar Bush (1945) to influence Roosevelt’s technology policy is a naive description of today’s reality. Kline and Rosenberg are actually closer (1986), as they add feedback between the different phases of innovation to their chain-linked model. The creation and diffusion of knowledge cannot be separated from the social network underpinning them. The links between the various institutions and economic actors affect transaction costs and the flow of information, rendering trivial any economic analysis that ignores the social embeddedness of human behaviour. Science is moving from a personal, field-based and geographically localized activity (Calero et al., 2006) to a collective and cross-organizational one. At the same time, technology requires the ability to combine and reconfigure knowledge from different sources, thus giving rise to partnership opportunities and university-industry interactions. The resulting structure is network-embedded (Verbeek, 2003) and makes the study of the individuals who act as bridges between the two different cultures worthwhile.

Technological gatekeepers (Katz and Tushman, 1982) funnel selected information and contacts, reconciling diverging incentives and interests. As Gittelman and Kogut (2001) clearly state “Science and inventions do not follow the same selection logics, but scientists produce both”. In the third chapter we will deliberately test this hypothesis, using social network analysis as a tool.

A competing explanation has been proposed by Sorenson and Fleming (2004), who argue that networks are only an imperfect vehicle for the transmission of knowledge, since they are bound in physical and social space. The cost of maintaining a network tie makes network diffusion slow and imperfect. Scientific publication on the other hand, increases the spatial reach of spillovers by permitting a broadcast propagation of information. To prove their hypothesis, the authors look at citations received by patents that reference non-patent literature (NPL) in their application. Patents based on NPL receive on average 4.86 citations (5 year window), compared to a mean of 3.57 for the others.

Sorenson’s assumption is that since NPL provides a codified base of accessible knowledge, it is more likely that someone else will become aware of it and improve on the same ground (making a citation to the originating patent more likely). Since publication of background information spreads diffusion, patents based on NPL receive citations from more distant inventors. Distance is here intended in a broad sense: geographically, socially and technologically.

Sorenson, Singh and Fleming (2005) add a test to discriminate between the effects of publication and social proximity: their results assign a crucial role to science in the diffusion of knowledge, as if it were possible to access technical knowledge ignoring geography or localized social networks.



1.1 Tracking knowledge spillovers

 

Science stimulates knowledge flows, but what is the exact propagation mechanism? Romer’s model of endogenous growth (1990) is based on the assumption that knowledge spillovers exist and are a remarkable determinant of growth[5]. Are these spillovers bounded in space or do they require a social network to support diffusion?

Jaffe (1989) finds a positive relation between university R&D (federal research funding) and neighbouring firms patenting, which seems to confirm the presence of spillovers. It is the first attempt to adapt the knowledge production function approach of Griliches (1979), in a way which takes geography into account. These results are later questioned in Krugman’s “Geography and trade”:

 

“knowledge flows, by contrast, are invisible; they leave no paper trail by which they may be measured and tracked, and there is nothing to prevent the theorist from assuming anything about them that she likes.” (Krugman, 1991)

 

Krugman was specifically targeting one of the core concepts of the “New Industrial Geography” (NIG)[6], the “Marshallian externalities of the third kind”[7] or knowledge spillovers: firms located close to relevant knowledge sources are able to innovate faster than the others, since they benefit from this special kind of knowledge externalities bounded in space.

Looking at the complexity of the innovation process it becomes obvious how co-location improves the exploration process (March, 1991): it reduces search costs, facilitates coordination, allows the transfer of tacit and sticky knowledge (Von Hippel, 1994), enables learning by doing and by interacting (Feldman, 1994).

The overall effect of co-location is a tangible reduction in uncertainty (Arrow, 1962), confirmed by the economic literature on small firms and start-ups. Large firms rely more on self-sufficiency, are usually able to pursue an exploitation strategy and profit from a routinized regime (Winter, 1984). Therefore location seems more relevant to small innovative firms, typically more focused and applied in their research, than to large ones (basic research is on average more codified and exchangeable over long distances).

Acs and Audretsch (1990) show that in terms of R&D spending, small firms are more productive in their innovative process. Even if the relationship between size and innovation is far from linear (Cohen and Levin, 1989) and successful innovation usually changes the size of a firm, evidence proves that small firms are indeed more tied into regional networks. Almeida and Kogut (1997), when dealing with the semiconductor industry, find higher levels of localization in their start-up sample. Acs, Audretsch and Feldman (1994) link all these factors, inferring that small firms are more productive because they rely on knowledge spillovers from local university R&D.

 

 

1.2 Knowledge spillovers: going local

 

A direct reply to Krugman comes in 1993 from Jaffe, Trajtenberg and Henderson (JTH). The authors start from the basic assumption that “knowledge flows do sometimes leave a paper trail, in the form of citations in patents”. Their analysis focuses therefore explicitly on the particular subset of spillovers which leave an official trace[8]. Their aim is to test at the same time the existence and the localization of spillovers, using forward citations to two cohorts of originating patents (1975, 1980) as evidence. The underlying assumption is that proximity favours knowledge flow, so that knowledge can be partly considered a local public good. The main technical problem JTH face when trying to measure the degree of spillovers localization is to define a baseline, a reference value that accounts for the existing agglomeration of production. Even if “the ability to receive spillovers is probably one reason for this pre-existing concentration of activity”, JTH estimate what they define a “conservative measure” of localization: they draw a control sample of patents as close as possible to the originating ones (in application date and technological proximity) and compare the co-localization of each group of patents (originating, controls) with the forward citations to the originating group[9]. JTH’s experiment successfully proves that knowledge spillovers are indeed highly localized at the country, state and SMSA[10] level, even if the effect fades slowly over time.

Thompson and Fox-Kean (2005) identified a spurious component in the localization estimates provided by JTH and improved their methodology[11]. These changes generate a much finer coupling between originating patents and controls, ensuring that the sample group acts in fact as a credible baseline for evaluating the existing concentration of production. Thompson’s controls are more “similar” to the originating patents. The results show that only national borders act as a barrier to knowledge flow: localization at the state and metropolitan level loses its strength. There is no equivalent research effort for the co-localization effect between scientific articles and relative citing patents: in the fifth chapter we will adapt JTH’s methodology to measure the role of geography in the knowledge flows between S&T.

 

 

1.3 Mobility and the “invisible hand”

 

Thompson and Fox-Kean’s conclusion introduces another possible explanation for knowledge flows: labour mobility. Localization at the state and metropolitan area level (above all in the US), is likely to be influenced and reduced by the relocation of inventors. Intranational mobility is a relevant shift factor for knowledge diffusion. According to this view, geography would matter simply because of its effects on the labour market. The unit of analysis should then become the knowledge worker (scientist or inventor), a carrier of complex, systemic, cumulative and often unobservable knowledge (tacit, as defined by Winter, 1987).

The issue of appropriability is entirely reversed: it is now the worker who can leverage her skills in order to profit from her talent. Furthermore if this holds, pure spillovers disappear: the assumed externality is completely absorbed by the labour market and becomes a pecuniary one. Implications for policy are clear-cut: science parks, incentives for attracting innovative firms, direct public intervention lose part of their role if the externality is already internalized and private investment optimal. A broader institutional framework, capable of attracting star scientists is required to promote innovation and growth. Furthermore a national innovation system (Nelson, 1993) must be able to channel talent towards productive and entrepreneurial activities, as defined by Murphy, Shleifer and Vishny (1990).

Zucker et al. (1994, 1998, 2001) provide convincing evidence about mobility, by tracking co-authorships and links between university star scientists and new biotechnology enterprises (NBEs). The kind of knowledge mastered by star scientists in a still emerging field is characterized by natural excludability. The simple idea that being located close to a major university is sufficient for gaining access to research outcomes is completely flawed. Again there is no spillover, the labour market internalizes the externality reducing it to a rent. The authors explicitly test the positive effects of star scientists on new firm entry and growth, the development of new products and their introduction into the market. They divide their sample into:

 

·        affiliated stars, those who publish with an NBE as affiliation;

·        linked stars, those who have co-authorship relations with NBE researchers, but are able to maintain their university link (proof of their high quality and contractual power);

·        unlinked stars, who work only for a university.

 

The most influential stars are the linked ones, proving that no indiscriminate spillover takes place: unlinked scientists do not have any positive effect on the performance of NBEs. Of course, since bright scientists prefer to maintain their university affiliation (for passion, prestige or access to the scientific network) and avoid relocation costs, geography does have an influence on innovation. It must be taken into account that Zucker analyses a case of breakthrough technology, where the natural excludability of knowledge is undoubted: as technology moves into a more mature stage of development, the economic impact of star scientists is likely to decline, together with their wages (inducing a probable reallocation to other projects).

Almeida and Kogut (1999), studying the localization of highly cited patents in the semiconductor industry, add an insight into the relationship between mobility and localization of knowledge. Institutions that favour intraregional mobility have a pivotal role in the transfer of knowledge, since these kind of flows are naturally embedded in regional labour markets. Localized knowledge spillovers are not linked to a particular technology and differ widely between regions. Silicon Valley is the most prominent example of how the mobility of engineers, supported by social institutions, can diffuse knowledge and induce growth.

Mobility can escape local boundaries, as an epistemic community is able to survive the end of co-localization. The time spent building a common “vocabulary” becomes crucial when a knowledge-worker relocates, even to another country. Agrawal, Cockburn and McHale (2003) apply JTH’s methodology and also control for previous co-location between inventors of cited and citing patents. The empirical evidence demonstrates that having worked together in the past has an influence on the probability of a knowledge flow (citation). Mobility in space and between assignees introduces a further level of analysis: the role of social capital. According to Breschi and Lissoni (2003): “knowledge flows, as evidenced by patent citations, are strongly localized to the extent that labour mobility and network ties also are”.

 

 

1.4 Social proximity

 

Granovetter (2005) studies the link between social structure and economic outcomes. Social networks define the path and bandwidth of knowledge transmission. Network actors can implement forms of reward and punishment by enhancing their social monitoring role. Trust, which takes time and effort to build, acts as a commitment device: the risk of being excluded from a component and of losing access to references, information or markets is often a credible one.

Worker mobility between different firms and regions creates more connected and dense networks, bridging components that were isolates. Networks of inventors can be built by considering all the various co-inventors of a given patent as a link: this has been shown to represent a real tie between the people involved (Fleming, 2003 and 2004). The equivalent discovery network is created by taking into account the co-authorship relations between scientists. Some structural differences, due to the type of communities behind them, make the network of inventors and authors dissimilar. The former is usually less connected, since most of the information exchange is intramural: “organizational boundaries serve as informational envelopes within which valuable information characterized by natural excludability is much more likely to be diffused than outside the organization” (Zucker et al., 1994). Corporate inventors move frequently between employers only in regions where there is a flexible and developed labour market (e.g. Silicon Valley) and where firms don’t have the contractual power to control flows and spin-offs. Mobile inventors have a central role, as they drastically reduce the distance (in terms of “handshakes”) between distinct groups. They find themselves in-between all the knowledge flows appertaining to the groups they bridge[12]. Other inventors have a relatively high number of co-inventors (links)[13] as they have signed a large number of patents. They are usually chief researchers or senior scientists (Balconi, Breschi and Lissoni 2004).

On the other hand, scientific authors are divided into tightly connected epistemic communities. Their network resembles the typical “small world”[14] structure: a few scientists act as hubs, connecting many nodes and funnelling (Watts and Strogatz, 1998) most of the information. The productivity of scientists (Lotka, 1928) and inventors  (Shockley, 1957) is particularly skewed.  Milgram (1967) named outliers “sociometric superstars”: they are the result of the organizing principle of preferential attachment (Newman, 2004 and Furukawa, 2006), also known in sociology as “Matthew effect” (Merton, 1968). Highly connected nodes are more likely to increase their connectivity faster than less connected ones, in a sort of positive feedback that resembles the externalities of network goods (Shapiro, 1999).

Jasjit Singh (2003), after building a social proximity graph of co-inventor teams, studies the effect of social distance[15] on the probability of a citation between patents over a nine-year period (1986-1995). In teams that have a close social link, the effect of geographic co-localization on the probability of a knowledge flow becomes less relevant. In this framework, localized knowledge spillovers are explained and confined by interpersonal networks. Singh (2005) compares this effect with knowledge diffusion within a firm, proving again that once social distance has been accounted for, neither geographical nor firm boundaries have a strong effect on the probability of a patent citation[16]. Singh considers these results to be a fairly conservative estimate, as co-inventorship captures only a small fraction of all the possible ties between teams. Balconi et al. (2004), by showing that academic inventors have a central position in inventor networks (both in terms of degree centrality and betweenness), provide an additional hint: sectors characterized by a close science-technology interaction should be analysed bridging together authors and inventors networks. Singh concludes that “geography matters for knowledge diffusion, at least in part because interpersonal networks tend to be regional in nature” (p. 768, 2005).

The “social network hypothesis”[17] is further tested by Breschi and Lissoni (2003): they add social proximity measures to JTH’s experiment, after studying the evolution of the Italian inventors network (1978-1995). The outcome suggests that labour mobility (perfect link[18]), ceteris paribus, has an effect on the probability of citation 20 times stronger than an indirect link. Knowledge flows are bounded by the social network that allows them to diffuse. Another paper by the two authors (2004) investigates the relation with a more detailed measure of social distance and finds out that spatial proximity is not a prerequisite for knowledge transfer: social networks can help overcome geographic distance and act as diffusion vehicles. Indirect social chains can act as conduits for technical information, even if the final inventor is unaware of the source of the knowledge to which she has been exposed[19].

One of the research targets of this work is to check the exact role of social and geographical proximity in an area of close interaction between science and technology: the semiconductor industry. The existing literature deals with knowledge flows as revealed by citations between patents: Sorenson et al. (2005) investigate the role of science by looking at citations received by patents that build on NPL. This link is too remote and indirect, in particular because we can exploit directly the information that makes a patent similar to the scientific articles it cites. The patent-NPL relationship is a fairer indicator of the underlying interplay between S&T and might offer, together with social network analysis, an effective way of opening what is still, to a great extent, a “black box”.

 

 


2.0 Patents and non-patent literature


In this desert of data, patent statistics loom up as a mirage of wonderful plenitude and objectivity” (Griliches, 1990)

 

Patents have been increasingly used as a source for relational data, as they include all the necessary information for building a comprehensive network of inventors, their affiliations and localization. Forward patent citations have been widely used as a proxy for the economic value and quality of a patent (Trajtenberg, 1990), or to track knowledge flows across space. Scientific articles and co-authorship relations are commonly used in bibliometrics to build indicators and statistics, to track international and multidisciplinary research (Narin et al., 1991), to study social networks (Newman, 2004). There are some similarities between patents and peer-reviewed scientific articles: both require some degree of novelty, are an output of research, disclose relevant information and codify part of their background knowledge. The aim and the extent of the review process is however different (for a review: Meyer et al., 2004).

In order to study the interaction between science and technology, citations to scientific articles coming from patent documents offer a unique opportunity. The search for “prior art” by patent offices is usually of high quality and provides a starting point for identifying the codified knowledge on which an innovation builds.

There are relevant differences between US and European patent documents, due to dissimilar examination procedures and requirements. US patent law requires an inventor to disclose all relevant documents for the “prior art” and patentability search (duty of candour). Failure to comply can interfere with the granting procedure, therefore inventors provide extensive lists of references. The US Patent Office (USPO) does a broad “documentary search”, has lower patentability requirements and has a strong home bias in the search for “prior art”. The European Patent Office (EPO) does a “patentability search” instead, uses fairly standard examination procedures and is known for a quite low miss rate for novelty destroying subject matter (Wartburg, 2005). EPO patents are unlikely to cite all relevant prior work, as the patent office guidelines clearly require the examiner to include only the most relevant document when there are various articles of similar content. We decided to choose EPO patents, as they return a lower number of NPL citations per patent, but more selected and relevant ones.

 

 

2.1 Science intensity and dynamism: semiconductors

 

If the objective is to investigate the knowledge flows between science and technology, focusing on a single, self-contained field reduces noise and increases the probability of finding a coherent subset of social links and knowledge flows. If there is an area where scientists and inventors are likely to behave in a similar way and exchange relevant knowledge, it will probably be exactly where science matters most for patenting activity. Collins and Wyatt (1988) show that patents cite NPL when they belong to active, rapidly developing fields.

The semiconductor field shows a growth rate (in terms of patenting) of 4,3% at EPO, where it comes after ICT (11%), biotech (9%), drugs and medical technology (8%). Most of the increase derives from a surge in European patenting, which has caught up with US and Japanese rivals. This should be taken into account when dealing with the “European paradox” thesis.

 

 

Figure 2: EPO patenting trends in the field of semiconductors (IPC subclass H01L) between 1990 and 2001.

If we look at USPO patent data (1990-2003), growth rate for semiconductors is the highest (14%): Kortum and Lerner (1999) and later Hall and Ziedonis (2001) have shown that this patenting rush is due to strategic reasons. After Polaroid’s successful infringement suit against Kodak (1986), the aggressive IPR enforcement policy of Texas Instruments[20] and institutional changes that brought about stronger protection[21], semiconductor firms increased patenting exponentially. The role of patents changed from simple protection and source of revenues (licensing royalties) to legal bargaining chips. Cross-licensing became a common practice: no single firm was able to produce without incurring the risk of infringing neighbouring rights, as innovative activity in the field is highly cumulative and standard-dependent. The cost of interrupting a manufacturing process once started was simply economically unbearable. Large firms built huge patent portfolios and surrounded their core technologies with defensive patenting (blocking, fencing and surrounding: for a review, see Verspagen, 2004). Small design firms used patents to attract venture capital investments and support their entry into specialized market niches. Furthermore, in a sector where technology advances rapidly, the cost of disclosing information (patenting) is more than compensated by first mover advantages. What remains to be quantified is if the benefits from specialization and the creation of a market for knowledge (Arora and Gambardella, 1994) exceeded the costs of these “bargaining chips”. Looking at the history of the semiconductor industry and its incredible dynamism it is possible to answer in a positive way, even if we don’t have a point of reference for the comparison.

Going back to NPL, it is important to stress that 7% of all IPC[22] subclasses accounts for 80% of overall NPL citations (INCENTIM, 2003): the S&T interaction is local in terms of technology subclasses and highly skewed. Science intensity (Van Looy et al., 2003) defined as number of NPL citations per 100 patents, can be used as a proxy for the strength of the S&T linkage. This second test confirms the hypothesis: the semiconductor field is active and strongly science based (second only to biotech). More than 20% of patents have more than one link to NPL. 

 

 

 

 

Table 1: Science intensity across fields (EPO patents)

 

Technology fields

% of all patents citing scientific ISI articles

Average science intensity

(P-NPL[23])

Average science intensity

(all patents)

Biotechnology

78.2

382.1

298.9

Semiconductors

21.8

181.3

39.5

Information Technology

19.9

145.9

29

Telecommunications

19.6

142

27.8

Control Technology

17.8

209

37.2

Optics

16.2

194.1

31.4

Medical Technology

5

162

8.1

Environmental technology

4.2

135.6

5.7

 

To understand which subjects match these citations we can look at the four most cited ones using ISI-SCI data on journals. For semiconductors they are: applied physics (42,6% of total share), electrical and electronic engineering (21,1%), condensed matter physics (7,6%) and multidisciplinary materials science (6,4%). Also the distribution across journals is concentrated: “Applied Physics Letters” accounts for 19,4%, followed by its Japanese companion Japanese Journal of applied physics” (9%), the “IEEE Transactions on electronic devices” (6,9%) and the “Journal of the electrochemical society” (5,7%). This confirms the idea of a sector that relies heavily on basic science (physics) and on the capacity to transform discovery and invention into market innovation.

 

 

2.2 Matching NPL citations and ISI titles

 

In the previous sections we have seen why we decided to rely on data coming from EPO patents in the semiconductor subclass (H01L) in order to investigate the S&T link.

Our final dataset contained 38,761 inventors, coming out from 22,475 single patents registered between 1990 and 2003. The total number of NPL citations considered was 5797[24], corresponding to 4481 single ISI articles (based on unique UT code[25]), 328 journals and 12,023 cited authors.

The first technical problem to solve was the parsing of the NPL citations from each patent document. For the analysis, the EPO-CESPRI[26] database was used, which contains all EPO applications between 1990 and 2003. From a first selection of 15,702 potential NPL references coming from H01L patents, a selection of 8879 citations to scientific papers published in journals covered by ISI-Thomson Science Citation Index (SCI) was taken. These citation were then matched[27] with the cited ISI journals dataset[28], in order to obtain a final dataset linking NPL references and article information[29] as provided by ISI-SCI.

 

 

2.3 Identifying highly cited articles

 

        The number of citations received by an article is usually an approximate measure of its impact, inherent quality or degree of novelty. Authors who receive more citations are therefore likely to be more influential members of our social network (we will test this hypothesis in Chapter 3). As we are looking at scientific publications cited in patents, it is useful to identify those articles that can be defined as “highly cited” in this particular context. What the scientist may consider to be a major article will not necessary contain relevant information for the inventor and vice versa.

Gittelman and Kogut (2001) found a negative correlation between highly cited patents (usually considered high-impact innovations) and highly cited publications. Even if technology benefits from spillovers from science, the two different “evolutionary logics” seem to select different types of knowledge. The rule of preferential attachment (or “Matthew effect”) is less pronounced in the inventor community, which is more focused on the market opportunities behind a given discovery. Excellent science seems to influence negatively the ability of a firm to produce high impact patents.

To identify what can be considered a highly cited article inside the NPL sample, we had to count citations received from patents inside a given time frame. We tested three different windows: 3, 5 and 7 years after the publication date of the cited article. Since the last application date (APDT) in our sample was from 2003, in order to avoid any truncation bias we selected only articles that had the full window to be cited as NPL. The results suggest that the distribution of citations is highly skewed and similar across different sectors (see Figure 1). After comparing the three solutions we chose a 5-year window, as it is a fair compromise between an excessive truncation (7 years) and a time which is too short for citation (3 years).

 

 

 

 


Figure 1: distribution of articles according to the number of citations received by patents. The value axis indicates the percentage of articles kept by cutting the distribution at the given number of citations per single NPL. The sectors correspond to the following 4-digits IPC subclasses: C12Q and G01N33 (biotech), H01L (semiconductors), H04L (transmission of digital information), G10L and G06T (speech recognition), H01S (lasers).

 

       

The graph shows a regular[30] pattern across all these science-intensive sectors: considering as highly cited those articles that receive at least 3 citations from different patents leads to a sample of between 4,4% and 7% of the overall population. Most of the articles receive just one citation: lower level 70% of total NPLs for transmission of digital information, higher 87% for biotech. Articles with more than 5 citations in the semiconductor field are just 0,74% of all cited ones. From now on we will consider as highly cited articles those that receive at least 3 citations after 5 years from their publication. According to this criteria, in the semiconductor field 11% of authors (1334) and 7% of articles (340) belonged to the “highly cited” class.

 

 

2.4 Building the network of inventors

       

Inventors who are also authors of articles cited by other patents[31] should act as gatekeepers, as conduits between S&T. Since their publications have been cited by other inventors from the same technological subclass, it is credible that their scientific research has an intrinsic and direct economic value for technology. In these people, Gittelman and Kogut (2001) observe the singular ability to identify the elements of scientific knowledge which could have an economic impact and to apply them to industrial research. This process requires translation, in terms of vocabulary, reference schemes and incentives. By building the network of invention and localizing authors-inventors, we wanted to explore their social function and to control for any structural difference between them and the rest of the population.

We created the network structure[32] using patent data: each co-inventorship was considered on the graph as a link, a tie between two nodes (individuals). The same table was used to calculate the degree centrality[33] of all inventors. Using Pajek[34], we were also able to determine the betweenness centrality[35] for all the nodes belonging to the largest component[36] of the network: 4433 inventors, with an average of 7.8 ties each. The largest component represents only 12.7% of the total population[37], which confirms the highly fragmented nature of inventor networks (see Table 2). In a typical “small world” case, it would encompass up to 80-90% of the total population. As we will see in section 6.1, introducing the ties that derive from co-authorship relations drastically reduces the average distance inside the network and increases the dimension of the largest component. We should therefore consider this representation as incomplete, as it only considers a small part (co-inventorship) of all the effective links between the nodes.

 

Table 2: inventors in the first components and their relative share25

Component

Inventors

% of total

1st

4433

12.7%

2nd

419

1.2%

3rd

341

0.9%

4th

237

0.7%

5th

217

0.6%

Number of nodes

Count

% of total

2

2256

35%

3

1546

24%

4

942

15%

5

516

8%

 

 

 



Table 3: relative weight of dyads, triads etc. on the overall network

 

 

If we examine the largest component we can see that it is fairly international, even if the US accounts for 59% of the total number of inventors. Fragmentation between components does not appear to preclude geographical dispersion inside.

 

Figure 3: country distribution of the inventors inside the largest component

We can further describe this component by looking at the inventors’ assignees: at first sight, three players seem to exist but we should note that Lucent Technologies was spun-off from AT&T in 1996, while Agere System is a spin-off of Lucent (2000). This substantiates the strong “intramural” nature of the component.

 

Figure 4: assignees of the inventors inside the largest component

The situation does not change if we look at other major components: the second is entirely localized in Japan and is made up of different divisions of Hitachi, the third represents ST-Microelectronics and Italy.

        To spot authors-inventors we ran a match between the names of semiconductor inventors and authors cited as NPL. As a condition we used the surname and first three initial letters of the names (when available) as extracted from the full name[38]. The results were then manually checked and returned 1,626 matches (297 are highly cited authors-inventors). The graph of the largest component (AT&T and its spin-offs) shows that all the inventors are tightly connected with the rest of their community, even if some peripheral groups exist.

 

Figure 5: graph of the largest component (4,433 inventors, over 30,000 ties). Cited authors-inventors are green, highly-cited ones are yellow.

Which firms are also engaged in scientific research and publication? To answer this question we simply have to check where our authors-inventors come from (Table 4). Even if 37,6% of the sample is highly fragmented (assignees with less than 1% of the total number of inventors), we can clearly see the impact of high-tech, large innovative firms: both AT&T and IBM have more than twice as many gatekeepers as universities that patent in the same field. Even if we consider the bias generated by their higher propensity to patent, we cannot ignore the scientific attitude of these firms. They are able to manage basic and applied research, relying on huge R&D budgets and internal development. Their inclination to patent goes hand-by-hand with the ability to publish in peer-reviewed scientific journals[39] and stay plugged into the scientific network. S&T interact profitably, at least in these examples. The European share is quite low (below 7%), compared to an overall share of 24% of inventors, due in part to a different industrial structure (smaller firms).

 

Table 4: assignees of authors-inventors

Assignee

% of total authors-inventors population

AT & T

11.2

IBM

9.6

LUCENT TECHNOLOGIES

5.1

TEXAS INSTRUMENTS

4.5

PHILIPS ELECTRONICS

3.7

Other Universities

3.4

EASTMAN KODAK

3.1

MOTOROLA

2.9

APPLIED MATERIALS

2.9

XEROX

2.5

HEWLETT-PACKARD

2.4

GENERAL ELECTRIC

1.9

SIEMENS

1.7

SHARP

1.5

ENERGY CONVERSION DEVICES

1.4

Princeton University

1.3

SGS-THOMSON MICROELECTRONICS

1.1

AGILENT TECHNOLOGIES

1.1

SAMSUNG ELECTRONICS

1.0

Others (<1%)

37.6

 

The highly cited authors-inventors are a few leading researchers. It is therefore useful to compare their localization (figure 7) with that of the overall population (figure 6). The links between S&T are strongest in the US (77% of the highly cited sample is American, compared to a share of 31% of total population); we can highlight a good performance of Great Britain, the Netherlands and Korea. Germany appears to be one of the worst-off: a global share of 10% of inventors translates into a 0,9% slice of the highly cited.

 

Figure 6: country distribution of all inventors in the semiconductor field (H01L)



Figure 7: country distribution of highly cited authors-inventors[40]

3.0 Real gatekeepers?

 

        In this chapter we will verify if authors-inventors are indeed more highly connected (hypothesis I) and influential (hypothesis II) individuals in our network. Gatekeepers are usually more valuable people, as they offer access to other groups and relevant information. If authors-inventors really bridge the two communities of S&T, they should be less likely to end up as isolates and should be found inside the largest components more frequently (in relative terms) than simple inventors. We could actually expect them to be the glue that holds the bigger components together. 

This is exactly what table 5 reveals: the percentage of authors-inventors in the 10 largest components is almost double, while the effect is reversed in the smaller groups. These values moreover exclude isolated inventors, about 9% of the total population, who are all simple inventors.

 

Table 5: Comparison between inventors and authors-inventors (components)

Component/s

% of tot. Inventors

% of tot. Authors-inventors

delta

1st

12.4

18.5

6.1

2nd

0.8

10.2

9.5

10 largest

18.2

32.4

14.2

 

10 smallest

63.8

49

-14.8

2-nodes

12.9

9.6

-3.3

3-nodes

13.4

10.3

-3.1

4-nodes

11.0

7.0

-4.0

 

Social network analysis offers some powerful tools for evaluating the functional role of a node inside a graph. We have already introduced degree centrality and betweenness (see 2.4), now we will compare these values to check if authors-inventors differ systematically from other inventors. We used a T-test to evaluate the difference between the means of the two populations relative to the variability of their scores.[41] In each case, SAS also calculates a folded F-test[42] for equality of variances, in order to identify which type of T-test is more appropriate (Pooled[43] in case of equal variances, Satterthwaite for unequal[44]).

 

 

3.1 Degree centrality

 

Hypothesis I: Scientific research and publishing makes authors-inventors more “valuable”, they should be able to build more connections: degree centrality should be higher than the average.

 

A first T-test compares the degree centrality of inventors and cited authors-inventors. The statistics for the two populations are reported in table 6.

 

Table 6: degree centrality statistics for cited authors-inventors (1) and simple inventors (0).

 

Cited

N

Maximum

Mean

Std Dev

Std Err

Inventors

0

33331

72

3.9319

3.3374

0.0183

Authors-inventors

1

1626

71

4.4071

4.4242

0.1097

Difference (0-1)

 

-0.475

3.3956

0.0862

 

Both groups have some relevant outliers (a maximum of 72 ties for a single individual): they are likely to be chief researchers (Balconi et al., 2004), who sign a large number of patents. Before proceeding with the T-Test we found out with a folded F-Test[45] that the variances of the two groups were unequal.

 

Table 7: T-Test for degree centrality (inventors vs cited)

 

Variances

DF

t Value

Pr > |t|

Satterthwaite

Unequal

1716

-4.27

<.0001

 

The t value is negative[46] and highly significant (Pr > |t| = 0.001): this confirms that, on average, cited authors-inventors have more ties (higher degree centrality) than simple inventors.

As a check we repeated the same procedure comparing inventors to highly cited authors-inventors (as defined in section 2.3). This second test requires that the authors-inventors involved have published at least on article cited by 3 or more different patents. Statistics for the groups are reported in table 8.

 

Table 8: degree centrality statistics for highly cited authors-inventors (1) and simple inventors (0).

 

HC

N

Maximum

Mean

Std Dev

Std Err

Inventors

0

34660

72

3.9464

3.3835

0.0182

HC Authors-inventors

1

297

34

4.8418

4.6374

0.2691

Difference (0-1)

 

-0.895

3.3961

0.1979

 

Even if the highly cited group does not have comparable outliers (the maximum number of ties is 34), there is still a negative difference between the means that favours the centrality of authors-inventors.

 

Table 9: T-Test for degree centrality (inventors vs highly cited)

 

Variances

DF

t Value

Pr > |t|

Satterthwaite

Unequal[47]

299

-3.32

0.001

 

T value is again negative and highly significant. We can safely affirm that our first hypothesis is confirmed again: highly cited authors-inventors have more ties than simple inventors and therefore have access to more groups and sources of knowledge.

 

 

 

3.2 Betweenness centrality

 

Hypothesis II: authors-inventors act as gatekeepers, funnelling most of the knowledge flows that span the various groups: betweenness centrality should be higher than the average.

 

We start directly with the more restrictive case, the one that compares inventors to cited authors-inventors. As the betweenness measure can only be calculated within a component, we chose to run the test on the largest one (4433 nodes, 301 are cited authors-inventors): if gatekeepers have a crucial role in funnelling knowledge flows we can expect this effect to be more relevant inside a larger, more heterogeneous component.

 

Table 10: betweenness statistics for cited authors-inventors (1) and simple inventors (0).

 

Cited

N

Maximum

Mean

Std Dev

Std Err

Inventors

0

4132

0.2517

0.0023

0.0115

0.0002

Authors-inventors

1

301

0.0963

0.004

0.0129

0.0007

Difference (0-1)

 

-0.002

0.0116

0.0007

 

Table 11: T-Test for betweenness (inventors vs cited)

 

Variances

DF

t Value

Pr > |t|

Satterthwaite

Unequal[48]

336

-2.13

0.0342

 

Even if the inventors count some outliers (maximum betweenness 0.25), the mean for authors-inventors is almost double. The folded F-Test rejects equality of variances: the Satterthwaite T-Test returns a negative value for t, significant at 3% level.

        Hypothesis II is confirmed: authors-inventors are on average more influential people in the network. They can control the flow of information between most others: if they were artificially removed from the network, most of the largest components would probably split into smaller ones, disconnecting entire groups of people. If S&T interact it is likely that authors-inventors are a key part of the process.


 


4.0 Who produces and who exploits science?

 

Looking at the interface between S&T can shed some light on the nature of the “European paradox”: does European science lag behind its American counterpart, is its industry unable to absorb and apply its scientific output or is it a problem of organizational structures (e.g. transfer groups)?

Dosi et al. (2005) present detailed evidence against the supposed paradox. Europe’s superiority in science cannot be taken for granted and indeed the US performs better under many research output indicators (published papers adjusted for population, share of highly cited authors and publications, Nobel prize winners). Gross domestic expenditure in R&D[49] as % of GDP for the EU-15 (1.93 in 2000) is still far below that of Japan (2.98) or the US (2.69). Private investment is even lower: 1.26% of GDP, compared to 2.03% in the US.

Two fields where Europe seems to be catching-up successfully are physical sciences and engineering: as the semiconductor industry relies heavily on basic and applied research in physics, we should find some support for this trend by looking at European’s share in NPL citations. As we have seen from patenting statistics (figure 2), Europe’s role in this field has been increasing since 1994: did the industry rely on foreign science or on its own innovation system? By looking at NPL, we are exactly focusing on research that plays a role in applied research or, at least, invention.

 

 

 

 

 

 

Figure 8: Country distribution of all scientific articles cited by patents (H01L).

Figure 8 shows that Europe holds a satisfactory share (22%) of citations, even if we have to consider that the Japanese one is probably underestimated because of language barriers in the search process for “prior art”. Historically the semiconductor industry has been always dominated by US firms, while Japan has followed with alternating fortunes. An additional home bias, as we are using data from EPO patents, is probably partially inflating EU results. To obtain a more impartial outlook, we can turn to relative shares by country of origin of the citing patent. Which countries produce and which countries exploit scientific research?

 

Figure 9: Users (category axis) and suppliers of scientific research (NPL cited by patents in the field of semiconductors).

European patents rely in almost the same proportion on domestic and US science. The home bias (Narin et al., 1997) is clearly visible for all three patenting countries when figures 8 and 9 are compared. US leadership is undoubted: both Japan and Europe depend on American scientific research, while the US is almost self-sufficient. This result can be upturned if we consider that the followers seem to be able to profit from the leader’s research efforts. The European percentages (39% of NPLs are domestic, 38% from the US) seem to back up Cohen and Levinthal’s (1990) theory of absorptive capacity: almost every single NPL citation borrowed from the outside is “paid for” with a domestic one. These results are similar to those found by Verbeek et al. (2003) for the information technology field, but differ for the biotechnology one where, according to the authors, EU-articles find their way into US patents more often than the opposite.

The only conclusion that can be drawn is that knowledge flows from science to technology actually cross the boundaries of their originating countries and reach foreign patents. At first sight, the role of publication and scientific disclosure seems to overcome the limits of geography and social networks. The remaining chapters will try to quantify this effect exactly and separate it from other measures of proximity. Where do these citations to articles really come from? Are followers simply trying to imitate and absorb the leader’s output or is there a more subtle explanation, one that accounts for the real distance that separates an inventor from her colleague abroad?


 


5.0 Patents and NPL co-location: the tip of an iceberg

 

        The aim of this chapter is to test whether knowledge flows from science to technology are geographically localized. In a connected economy, where ICT permits a fast, cheap transfer of large chunks of data, how much do local specificities matter? When an inventor develops her concept, is she more influenced by codified information available in scientific journals or by the knowledge she obtains through a direct chain of social contacts? How much does the national system of innovation influence her sources?

        JTH’s methodology (1993) can be adapted to NPL citations: are patents more likely to cite scientific articles published in the same country (region) or is geography irrelevant to this special kind of dialogue between S&T? Do national (local) boundaries limit the diffusion of scientific information? If the probability of a citation depends only on the quality of the codified content of an article, it should not be bounded in space. Observed co-localization between patents and NPL citations should then be justified only by a pre-existing agglomeration of production. It is possible to control for this effect by building a set of control patents and using them as a baseline. The level of co-localization between originating patents and NPLs, can then be compared with that between controls and NPLs: if the originating patent appears to be more co-localized than the controls, then citations to scientific articles are indeed geographically localized with their citing patents.

        To ensure that the control patents actually “do their job” of representing base concentration, it is vital to select ones which are as close as possible to the originals (Thompson, 2005). We created all possible couples between semiconductor patents applied for in the same year that did not cite the same NPL and then ran a stratified sampling procedure (by patent and by year) to extract one control for each originating patent. As we are working only with patents from the same 4-digit subclass (H01L), controls are by design similar from a technological point of view. The next step was to create two contingency tables, one for the country match between patents and NPLs, the other for controls and NPLs. From these tables, we divided the data into co-localized or not, first at country level and then, on US data only, at state level. To test whether patents were more co-localized than controls and therefore that knowledge spillovers between S&T exist and are geographically bounded, we used odds ratios. An odds ratio[50] is a way of measuring effect size[51] (the relation) between two binary probability variables[52]. A value greater than 1 suggests that the event is more likely to happen in the first sample (citing patents in our case).

 

Table 12: Co-localization percentages for citing and control patents

 

Citing

Controls

Odds Ratio

95% Wald
 confidence limits

Country

37,9%

7,1%

7.87

7.085-8.759

State (only US)

20,2%

7,8%

2.99

2.757-3.256

 

        The results prove that citations to NPL display quite strong co-localization with their citing patents. The effect is stronger at country level, as evidenced also by Thompson (2005): a patent is almost 8 times more likely to cite an NPL from the same country once the existing agglomeration of production has been accounted for (using controls as a reference). Country boundaries are a tangible obstacle to knowledge spillovers, which makes Sorenson’s theory (2005) about the role of codified knowledge and publication in the indiscriminate diffusion of information less likely. At US state level, a citation is 3 times more likely to come from the same area: geographical regions still play a central role in the innovation process, as they influence the “accessibility” of scientific knowledge. Geography matters, even if we don’t yet know what kind of causal relation links geographical and social proximity.

        Our US state co-localization percentage (20,2%) is confirmed by similar results found by Hicks et al. (2001). It is also possible to compare the actual number of in-state NPL citations with an expected value that takes into consideration the relative amount of industrially relevant public R&D expenditure. Total US R&D expenditure was $265 billion in 2000, of which $247 billion within states: the first six states (California, Michigan, New York, New Jersey, Massachusetts, and Illinois) accounted for more than half the overall investment, while California alone represented almost 20% of the total.[53] As the distribution is highly skewed, when comparing the actual in-state citations we have to control for the share of R&D that goes to that state. Hicks suggests using received NPL citations as a proxy for public R&D relevant to industry. The expected number of in-state citations can then be calculated as the product of:

 

·        a state’s share on overall US incoming citations (from US patents to NPLs)

and

·        the total number of NPL citations made in the patents of that state

 

The ratio between actual and expected citations is an explicit test of the local nature of the S&T interaction: patents cite scientific articles coming from the same state more frequently, even if we control for the existing concentration of R&D expenditure. As citations to articles could arise even without any contact between inventors and authors (codified, available knowledge), this test is even more reliable: NPL citations are likely to be only the tip of an iceberg, as they capture only very formal and selected links between local science and technology. Investigating the social ties of inventors and authors is exactly what is below the waterline.

 

Figure 10: US in-state co-location effect between patents and cited NPLs.

 


6.0 Down to the waterline[54]

 

        The previous chapter left us with evidence about the co-localization between citing patents and cited scientific articles. Without any further investigation we could conclude that science and technology interact locally, where particular conditions create the right incentives for the diffusion of knowledge. Innovation is a cumulative and collective effort, therefore it is likely that local specificities encourage a positive feedback mechanism, reinforcing established positions. Breshnahan, Gambardella and Saxenian (2001) separate the effects that sustain a going cluster (agglomeration effects, spillovers, increasing social returns)  from those that create the conditions for its formation. All the “old-economy” inputs they list have a connection to human capital: firm-building capabilities, which are usually part of a broader institutional framework (North, 1990), managerial skills, skilled labour, access to markets. Individuals and their social connections have a key role in bridging knowledge and enabling access to these inputs.

Science-based sectors (Pavitt, 1984) require by definition access to frontier scientific research, but we know from Narin (1991) that this process is becoming increasingly international. Leading institutions in the US collaborate with their European and Japanese counterparts, supported by a rediscovered, ICT-enhanced proximity: the social one. Complex exchanges of partially codified knowledge are possible because tacitness is heavily dependent on the characteristics of the sender and receiver of the message (Nonaka, 1994). Scientists were the first to develop a vocabulary, a set of common epistemological rules for the creation and exploitation of knowledge. Inventors likewise are a “community of experts” (Breschi and Lissoni, 2004), which speaks a common language and is able to innovate incrementally on its member’s efforts. What we are going to test in this chapter is the exact effect of geographical proximity, once social proximity has been accounted for.

 

 

6.1 Social proximity in numbers

 

        One of the first problems to solve before trying to test the “social network hypothesis” was how to construct a comprehensive network of the inventors and authors in the semiconductor industry. In section 2.4 we derived the invention network from patent data, and later added the information on authors-inventors. However, if we want to make a reliable estimate of the social distance between any two individuals (author or inventor), we have to include the layer with all the ties between scientific authors (network of discovery). Since considering all the co-authorship relations across fields is prohibitive and creates a huge list of false positives (due to homonymous names), we decided to limit the scope of our author network to people who have published articles cited in patents in the field[55]. This is a decent trade-off between ignoring the discovery network entirely and introducing excessive noise in the estimates.     The basis for the “double-layer” network was: 28,998 single inventors, coming from 20,155 patents applied between 1978 and 2003; 7967 individual authors, matching 3263 ISI articles (1975-2003); 967 authors-inventors[56], acting as flyovers between the two levels of invention and discovery.

 

 

To quantify the effect of co-authorship links, we can compare the structure of the network before and after their introduction. Isolates represent 44% of the inventor network[57], but account only for 26% of the joined networks. Co-authorship specifically alters the upper tail of the distribution, making the network closer to a “small world”. The largest component, which accounts only 15,8% of the inventors’ network, increases up to 45,5% of the whole net, taking up all the bigger components of the previous network. The second largest, which represents 33% of the first (INV), shrinks to 0,7% (INV&AUT). This outcome would probably even more evident if we had used the total co-authorship relations and not only those arising from NPL citations. Figure 11 gives us the growth of the largest component: new nodes are added and others are drawn from previously isolated components.

 

 

 

Table 13: Structural change in the first components after the introduction of co-authorship ties (reference year 2000). INV&AUT refers to the “double layer” network, INV to the network built considering only co-inverthorship.

 

Component

Nodes (INV&AUT)

Nodes (INV)

Share (INV&AUT)

Share (INV)

1st

12055
(6976 are inventors)

3094

45.5%

15.8%

2nd

94

1044

0.35%

5.3%

3rd

67

363

0.25%

1.9%

4th

65

218

0.25%

1.1%

5th

62

183

0.23%

0.9%

 

 

 

 

 

 

 

Figure 11: Evolution of the largest components considering co-inventorship relations (green) and also co-authorship ties (blue).

 

Authors seem to have less influence on the lower part of the distribution (except for isolates): adding them does not increase the presence of small groups in the total population, the shares remain practically unchanged.

 

Table 14: Structural change in the lower part of the distribution: dyads, triads… (reference year 2000)

Number of nodes

Count (INV&AUT)

Count
(INV)

Share (INV&AUT)

Share (INV)

2

1647

1603

43.5%

44.2%

3

908

843

24.0%

23.3%

4

457

434

12.1%

12.0%

5

240

223

6.3%

6.2%

 

Compared to the network analysed in section 2.4, we now have a much clearer outlook on the role of science in the field of semiconductors.  Looking inside the most radically altered part of the net, the largest component, we can search for country differences, so as to estimate their relative advantage in innovation-related science (figure 12).

 

Figure 12: Country distribution of the largest component (reference year 2000)

 

Even if the results are not directly comparable[58], the US slightly increases its share, while Germany drops from 23% to 11%. There are two possible explanations: either it is lagging behind in terms of scientific research or its quota in the field has been growing in recent years (therefore taking older patents penalizes its share of inventors). The largest component (12,055 nodes in the year 2000) is divided into 6,976 inventors (57,9%) and 5,079 authors (42,1%). Almost all the authors-inventors (88% of the total, or 853 individuals) are found in the largest component, which strongly confirms their role as gatekeepers between S&T.

 

 

6.2 Drawing a control sample

 

The social network hypothesis (Singh, 2003; Breschi and Lissoni, 2004) can be tested by estimating a citation function Pr(P,A), that specifies the probability that a patent P cites a given scientific article A and by testing the influence of social proximity on this probability, after controlling for geographical and other factors that could affect Pr.

The citation function has a logistic functional form[59], but cannot be correctly estimated when drawing a random sample, as citations are very rare cases in the overall population of all possible pairs patent-articles. As Sorenson (2005) points out: “logistic regression yields biased estimates when the proportion of positive outcomes in the sample does not match the proportion in the population”. Furthermore, using all possible pairs for the logistic regression instead of a sample, is practically unworkable, since the data matrix would be huge[60].

Since the cases of citation (y=1) are more informative and relevant for the regression, the strategy is to retain all of them, while sampling a smaller proportion of no-citation couples (choice-based sampling procedure). As the stratification intervenes on the dependent variable it is necessary to proceed with the estimate by using a weighted exogenous sampling maximum-likelyhood (WESML) estimator.

The WESML alters the logistics maximum likelihood function by weighting each observation by the number of elements it represents of the overall population: a weight of 1 is assigned to all the citation cases (as we retain them all), while controls receive as weight the inverse of the sampling probability of pairs with that particular combination of years (year of the citing patent, year of the cited article). In our case for each citation (y=1) we selected two “control” pairs, and repeated the sampling procedure for each cohort of originating patents (from 1990 to 2000).

The final dataset contained 1,987 citations and 4,224 non-citations (years for the cited articles starting from 1985, as defined by our citation window), covering 1,620 single articles and 4,366 unique patents.

The next step was to calculate the values of our explanatory variables, geographical and social distance, for each patent-NPL pair.

 

 

6.3 Geographical proximity with geocoding

 

If we want to test the relative roles of geography and social proximity we need a precise estimate for the physical distance between any two individuals in our net. Dummies for regional co-localization are an acceptable start, but we can actually do better by fully exploiting the information available in our dataset. By parsing street addresses of all authors and inventors and geocoding[61] them, we were able to associate each individual with a set of latitude and longitude coordinates. From a technical point of view the process was separated for worldwide and US inventors. In the first case, we used a combination of city name, region or provinces (NUTS2[62] and NUTS3) and country, while for the US we were able to retrieve all the 5-digit ZIP codes[63]. The retrieval process was based on Php and Javascript code we wrote as interface to the freely available Google Maps API[64] (Figure 13 and 14).

 

Figure 13: a screenshot of our Google Maps implementation (the numbers inside each balloon represent authors or inventors located in the any one place).

 

Figure 14: a second screenshot (the lines connecting two different points represent co-authorship or co-inventorship relations).

 

            The next step was to compute the distance between each author and each inventor for every patent-NPL couple (cases and controls). Using the latitude and longitude data, we applied the Haversine formula[65] which is particularly suitable, compared to the spherical law of cosines, for numerical computation even at small distances (Sinnott, “Sky and Telescope”, 1984).

 

 

6.4 Social proximity, reloaded

 

        Social distance changes over time, as new teams are formed for a project or scientific collaboration. Individuals who were unconnected or only indirectly connected, can grow closer during the evolution of their careers. If we want to measure exactly the social distance between citing patents and cited NPLs, we must refer every time to the network existing at the time the paper trail was created. Hence we cannot simply calculate social proximity once, but must create a network for every year in the dataset: for each patent-NPL pair at time t (where t refers to the application date of the citing patent, e.g. 1999), we have to use the relative network of inventors and authors at time t-1 (e.g. 1998). We computed social distance using SAS 9 and Moody’s add-on SPAN.

Estimating the effects of geography and social distance on the citation probability function is the final step of our research. We ran a series of logistic regressions, starting from a very simple model that accounts only for geography, to one that introduces our measures of social proximity. As a value for geographical distance we tried both the logarithm of the average and the minimum distance between all couples of inventors and authors for a given citing patent-NPL link. To all regressions we added a set of fixed effects for the year of the citing patents and the time lag between citing patent and cited NPL. Since focal patents might enter the regression more than once (if they cite more than one NPL), we report robust standard errors with clustering on the citing patent. We also added a control for the type of journal cited (from very basic to very applied research, on a scale from 1 to 4), using CHI’s classification of scientific journals[66].

Table 15 reports estimation results for the first case: geographical distance (glog) has a negative effect on the probability of a knowledge flow (citation).

 

Table 15: the probability of a citation decreases with geographical distance. The table reports odds-ratios, not Logit coefficients. The estimation includes fixed effects for the year of citing patents and time lag (lag).

 

Odds ratios

Robust Std. Err.
(adjusted for 4366 clusters in citing)

glog   0.7798528a

0.0139077

lag   0.9936209ns

0.0167381

a significant at the 1% level; ns not significant

Number of observations

6211

Log-likelihood

-46.469358

Pseudo-R2

0.0285

 

 

To introduce social distance we created a complete set of mutually exclusive dummy variables:

 

·        dd0: takes value 1 in case of personal self-citation (author-inventor) -> minimum geodesic distance between patent and article is 0

·        dd1: at least one author and one inventor have previously worked together (past collaborator), either on a patent or a scientific article -> minimum geodesic distance is 1;

·        dd2: inventors and authors have at least one common collaborator, which reduces the geodesic distance to 2 “handshakes”;

·        dd3: geodesic distance is 3;

·        dd6: geodesic distance is either 4,5 or 6;

·        ddc: connected, but at a distance larger than 6;

·        dnc: no social link, all inventors and authors belong to different components of the network.

 

As we can see in Table 16, the existence of a tie is associated with a higher probability of knowledge flow, with the probability sharply decreasing as the geodesic distance increases. Once social proximity has been accounted for, the negative effect of geographical distance falls[67] (odds ratio increases from 0.77 to 0.83).

Collaborative networks are a way of overcoming geographic distance and having access to relevant knowledge flows between S&T. Ceteris paribus, a past collaboration increases the probability of a knowledge flow more than 16 times. Also indirect social links can play a role: having a common acquaintance (dd2), increases the chances of acquiring useful knowledge more than 4 times; at geodesic distance 3, the probability is 2.5 times more, while a quite long path shows (below or equal to 6 degrees of separation) still a 46% premium.

 

Table 16: Social proximity is highly significant and reduces the negative effect of geographical distance. The table reports odds-ratios, not Logit coefficients. The estimation includes fixed effects for the year of citing patents, time lag (lag) and type of journal cited (L), which turns out to be not significant.

 

 

Odds ratios

Robust Std. Err.
(adjusted for 4366 clusters in citing)

glog  0.836214 a

0.0152984

dd0  275.1944 a

164.894

dd1  16.78185 a

9.859315

dd2  4.269552 a

1.30857

dd3  2.520375 a

0.6084489

dd6  1.463143 b

0.2831672

lag   1.009529 ns

0.0207985

L     0.958468 ns

0.042366

 

a significant at the 1% level; b significant at the 5% level;
ns not significant

 

Number of observations

6211

Log-likelihood

-43.473071

Pseudo-R2

0.0911

 

Tables 16 tells us that geography matters for knowledge diffusion, but social networks can help overcome its boundaries. We did not find any relation between the type of journal cited (from basic to applied research) and knowledge flows: this further restricts Sorenson’s hypothesis on the role of public disclosure and codified scientific knowledge in the indiscriminate diffusion of innovation. We will now look again at the interplay between geography and knowledge flows by creating 4 new dummy variables:

 

·        usus:  takes value 1 if both the citing patent and the cited article are assigned to the US[68];

·        eueu: same as the previous dummy, for Europe;

·        useu: is equal to 1 where a US patent cites an European article;

·        euus: just the opposite, EU patent citing US article.

 

As expected, the first estimation results prove a positive effect of co-location: an American patent is 37% more likely to cite a scientific article written in the same country, while for Europe the mark-up is 43%. The reverse holds for cross-citations, which are less likely to appear (odds ratio lower than one), particularly in the case of a US invention citing European science (table 17).

 

Table 17-18: Co-location and home bias comparison for US an EU. The first table does not include measures of social proximity. The table report odds-ratios, not Logit coefficients. Both estimations includes fixed effects for the year of citing patents, time lag (lag).

 

Odds ratios

Robust Std. Err. (adjusted for 4366 clusters in citing)

usus   1.376774 a

0.1199584

eueu   1.436757 a

0.1568459

euus   0.791294 b

0.0805618

useu   0.739480 a

0.0830658

lag      0.992782 ns

0.0154311

 

a significant at the 1% level; b significant at the 2% level;
ns not significant

 

Number of observations

6211

Log-likelihood

-47.409265

Pseudo-R2

0.0088

 

 

Odds ratios

Robust Std. Err. (adjusted for 4366 clusters in citing)

usus   0.862342 ns

0.0828987

eueu  1.404109 a

0.1643845

useu  0.834504 ns

0.0864882

euus  0.688375 a

0.0792875

dd0  498.1717 a

292.3483

dd1  28.45921 a

15.60164

dd2    5.31456 a

1.232905

dd3    2.90833 a

0.5096658

dd6    1.54277 a

0.1642231

lag     0.99846 ns

0.0215568

 

a significant at the 1% level; ns not significant

 

Number of observations

6211

Log-likelihood

-43.807041

Pseudo-R2

0.0841

Controlling for social proximity leaves the premium for EU patents citing EU articles almost unchanged and significant, and decreases, ceteris paribus, the cross-citation probability between European patents and American ones. On the other hand, the usus dummy turns out to be negative (OR<1) and non significant: there is no tangible effect of being co-located in the United States if we account for social distance, while having a prior collaboration makes a knowledge flow 28 times more likely. A social network is indeed a very efficient diffusion vehicle for knowledge, particularly in the US. Adding social distance also renders the useu dummy insignificant: if we control for social ties, the home-bias effect disappears. In the US, the link between S&T relies heavily on social proximity, making country boundaries less relevant.

        Table 19 shows once more that connectedness leads to a greater probability of citation: citations have closer ties (dd0-dd3) and are less likely to arise between unconnected groups. Connected groups are also geographically closer (table 20): geography matters, because it makes the creation of social ties more likely.

 

 

Table 19: Percentage distribution of groups at different social distances across controls and citations.

 

Citations (n=1987)

Controls (n=4224)

Author-inventor (dd0)

11%

0%

Past collaborator (dd1)

2%

1%

Common collaborator (dd2)

3%

1%

Collaborator with ties (dd3)

5%

2%

Six degrees (dd6)

11%

10%

Indirect social link (ddc)

4%

5%

No social link (dnc)

64%

81%

 

 

 

 

Table 20: Interplay between geography and social proximity.

 

Average distance (km)

Author-inventor (dd0)

248

Past collaborator (dd1)

1321

Common collaborator (dd2)

2736

Collaborator with ties (dd3)

3103

Six degrees (dd6)

3981

Indirect social link (ddc)

4483

No social link (dnc)

4814

 



7.0 Wrapping-up

 

        The dynamics of science, technology and growth are naturally intertwined. To understand the mechanisms behind each of these interactions, we have to focus on the innovation process and the way new knowledge diffuses throughout the economy.

Patent citation analysis has already offered relevant insights into the nature and localization of knowledge flows. The aim of this research was to exploit a relatively unexplored source of data: scientific articles cited in patents from the semiconductor industry. The outcomes show that NPL citations are a promising and reliable research trajectory.

A first major result we obtained was the snapshot of author/inventors: in an area of close interaction between “Open Science” and “Proprietary technology”, we found that the exchange of information actually takes place at the level of these individual researchers. As knowledge workers, author/inventors assume a central role in the funnelling of information between groups with diverging interests and incentives. Acting as a gatekeepers, they are able to reconcile the different “evolutionary logics” of S&T, since they know at least part of both worlds. Without them, the diffusion of innovation would be slower and less efficient.

Social network analysis has proved to be a decisive tool, since it allows to look into the black box of S&T and discover regularities and exceptions in the “community of experts”. Identifying influential people in a network of discovery and invention can provide some valuable policy hints: remaining plugged-in and gaining access to scarce resources is as crucial a task for firms as for countries.

In this context, the mobility of inventors and scientific authors between regions, countries and organizations is a fundamental shift factor for knowledge spillovers. Local institutions that favour cross-fertilization between S&T or a developed labour market actually support innovation.

A natural extension of our current results, would be to consider institutional affiliations of authors-inventors: which types of firms or universities have an active role in the diffusion of knowledge between S&T? What kind of organizational structures favour innovation and the exploitation of scientific research?

Inside these organizations, are authors-inventors able to outclass their colleagues or does their role as gatekeepers absorb most of their resources? Is higher centrality the result of opportunistic behaviour or of a serious involvement in applied research?

Furthermore, our analysis has shown that neither scientific publication nor geographic co-location alone are sufficient for diffusion: tacit and excludable knowledge travels through direct and indirect social contact chains using the common vocabulary that characterizes each epistemic community. Social networks determine most of the observed patterns of knowledge diffusion and help overcome the negative effect of geographic distance.

Geography, as emphasized by an extensive literature, clearly matters, but institutions, by creating the right incentives for inventors and scientists, are able to bypass its natural constraints.  Results also indicate that the US relies more heavily on social proximity than the EU in the diffusion of new knowledge. If the interaction between S&T is enhanced by gatekeepers and social ties, part of the European backwardness could also be due to a less connected research area. Adding measures of organizational proximity could improve our analysis, as it would allow us to discriminate not only at the country, but also at the institutional level.

The interpolation of science and technology is a complex task, but also a very promising one.

 

 

References


Acs Z.J., Audretsch D.B., Feldman M.P.
(1994), “R&D spillovers and recipient firm size”, Review of Economics and Statistics, 76(2): 336-340.

 

Acs, Z.J. and Audretsch, D.B. (1990), “Innovation and Small Firms”, Cambridge, MA: MIT Press.

 

Aghion, P. & Howitt, P. (1990) "A Model Of Growth Through Creative Destruction," DELTA Working Papers 90-12, DELTA (Ecole normale supérieure)

 

Agrawal A.K., Cockburn I.M., McHale J. (2003), “Gone But Not Forgotten: Labor Flows, Knowledge Spillovers, and Enduring Social Capital”, NBER Working Paper 9950

 

Almeida P., Kogut B. (1999), “Localisation of knowledge and the mobility of engineers in regional networks”, Management Science, 45(7): 905-917.

 

Almeida, P., Kogut, B. (1997) “The Exploration of Technological Diversity and Geographic Localization in Innovation: Start-Up Firms in the Semiconductor Industry”, Small Business Economics, Volume 9, Number 1, pp. 21-31(12)

 

Arora, A. and Gambardella, A., (1994). "The changing technology of technological change: general and abstract knowledge and the division of innovative labour," Research Policy, Elsevier, vol. 23(5), pages 523-532, September.

 

Arrow K.J. (1962),“Economic welfare and the allocation of resources for invention”, in R.R. Nelson (ed.), The Rate and Direction for Inventive Activity. Economic and Social Factors, Princeton University Press, Princeton.

 

Balconi M., Breschi S., Lissoni F. (2004), “Networks of inventors and the location of academic research: An exploration of Italian data”, Research Policy 33(1): 127-45.

 

Breschi S., Lissoni F. (2004), “Knowledge networks from patent data: Methodological issues and research targets”, Cespri WP n. 150.

 

Breschi S., Lissoni F. (2003), “Mobility and social networks: Localised knowledge spillovers revisited”, Cespri WP n. 142.

 

Breschi S., Lissoni F. (2001), “Localised knowledge spillovers and local innovation systems: A critical survey”, Industrial and Corporate Change, 10(4), 975-1005.

 

Bresnahan T.F, Gambardella A., Saxenian, A. (2001), "'Old Economy' Inputs for 'New Economy' Outcomes: Cluster Formation in the New Silicon Valleys," Industrial and Corporate Change, 10(4): 835-60.

 

Bush, V. (1945) “Science The Endless Frontier”, a Report to the President by the