Università
Commerciale Luigi Bocconi
Facoltà di
economia
Economia e management dei mercati internazionali
e delle nuove tecnologie (CLEMIT-LS)
Anno accademico 2005-2006
The
link between science and technology:
exploring the network of inventors
and scientific authors
in the semiconductor industry
Relatore
Prof. Stefano Breschi
Controrelatore
Prof. Fabio Montobbio
Christian Catalini –
996868
Index
Introduction………………………………………………………………………5
Chapter I
1.0 Science and
technology: waiting for gatekeepers………9
1.1 Tracking knowledge spillovers…………………………………..13
1.2 Knowledge spillovers: going local……………………………..15
1.3 Mobility and the “invisible hand”……………………………….16
1.4 Social proximity………………………………………………………….19
Chapter II
2.0 Patents and non-patent literature..…………………………..23
2.1 Science intensity and dynamism:
semiconductors…..24
2.2 Matching NPL citations and ISI
titles…………………………27
2.3 Identifying highly cited articles………………………………...28
2.4 Building the network of inventors……………………………..31
Chapter III
3.0 Real gatekeepers?……………………………………………………..37
3.1 Degree centrality……………………………………….………………38
3.2 Betweenness centrality…………………………………………..…40
Chapter IV
4.0 Who produces and who exploits science?................43
Chapter V
5.0 Patents and NPL co-location: the
tip of an iceberg….47
Chapter VI
6.0 Down to the
waterline………………………………………………..51
6.1 Social proximity in numbers………………………………………52
6.2 Drawing a control sample………………………………………….55
6.3 Geographical proximity with
geocoding……………….……57
6.4 Social proximity, reloaded….………………………………….….59
Chapter VII
7.0 Wrapping
up……………….……………………………………….……67
Introduction
The interaction between science and
technology (S&T) is a complex and heterogeneous process. While it is clear
enough that science contributes to the development of technology, the mechanism
behind this relationship remains largely unexplored. Economists have tried to
interpolate S&T with varying degrees of success, using different approaches
and levels of focus.
The main stylized fact that can be
inferred from Romer’s model of endogenous growth (1990) is the role played by
knowledge spillovers: since knowledge is considered a public good, its creation
implies a spillover. Science generates a positive externality, regarded as
crucial[1]
for long-run macroeconomic growth (Grossman and Helpman, 1991). At the same
time the non-rival, public-good nature of knowledge poses incentive and
contractual problems[2]:
inefficiency (Romer, 1990) and the risk of market failure require public
intervention for the provision of basic research and human capital. In this
framework, science acts on the supply side of the linear model of creation,
transfer and diffusion of innovation (Kline and Rosenberg, 1986). Technical knowledge
is seen as non-excludable information and is supposed to flow between different
economic actors and places.
Finding a way to track and quantify
knowledge flows is therefore a key step in verifying the link between science,
technology and economic growth. Krugman’s assumption (1991) that these kinds of
flows are invisible and leave no “paper trail” is questioned by Jaffe et al.
(1993), who first used citations to patents as a measure of spillovers. Jaffe’s
experiment and its improvements (Thompson, 2005) show not only that knowledge
flows are indeed an observable fact, but also a local one in terms of both technological
and geographical proximity. Co-localization increases the probability of information
exchange, since spillovers are actually bounded in space. Citations between
patents, even if useful for the generic identification of a flow, contain
little information on the relationship between S&T, as patents are mainly
an input indicator of the innovative activity carried out by private firms. Although
universities have increased their patenting activity, their share is still marginal
and is more likely to represent the results of applied research than of basic
science.
Patents
also contain information which has remained relatively unexploited so far, i.e.
citations to non-patent literature (NPL[3]).
These actually represent a more credible means of tracing knowledge flow
between S&T than citations among patents. One of the original aspects of
this research is exactly that of testing the extent of this interaction and
checking if it is local in terms of technological, geographical and social
proximity. While measuring the first two types of distance requires heavy
parsing and data mining procedures, introducing the concept of social distance
adds further complexity and methodological issues.
Patent documents and cited scientific
articles are used here to build a comprehensive social network of the authors
and inventors in the semiconductor industry. The networks of discovery and
invention are then compared in order to analyze the mechanisms underlying the
diffusion of new information. The aim is to test if individuals who are both
scientific authors and inventors act as technological gatekeepers, reconciling
the different incentive schemes and interests of S&T. If authors-inventors
are indeed influential people in the network, they are likely to control and
bridge most of the knowledge flows: by studying them, we should be able to shed
some light on the actual functioning of the link between the two communities of
S&T.
This microeconomic approach would also allow
us to further analyze the interplay between geographical and social distance,
in order to estimate the relative effects of social ties[4]
and geography on the probability of a citation from a patent to a scientific
article. If linked individuals are able to exchange complex and tacit knowledge
even without being co-located in the same place, then social networks become an
effective diffusion vehicle for new knowledge.
“Open Science” and “Proprietary
technology” could actually relate using the inherent qualities of gatekeepers,
a common vocabulary and the underlying social network, which make a “community
of experts” a tangible reality.
The work is organized as follows: the
first chapter presents a brief overview of the relevant literature; the second
chapter lays out the methodological framework, data collection, processing and
elaboration steps then introduces an analysis of the network of inventors; the
third illustrates a test for the relative positions of authors-inventors, cited and highly-cited scientific authors in terms
of centrality and betweenness; the fourth looks in detail at the citation paths
(patent-NPL) between different countries, focusing on the role of European
science in worldwide patents; the fifth tests if knowledge flows between
science (NPL articles) and technology (citing patents) are indeed localized,
first at the country level and then at the US
state level; the sixth chapter deals with the network of authors-inventors: data on actual citations is used together with a
control sample to test the role of social and geographical proximity between
citing-cited couples; the seventh chapter concludes.
1.0 Science and technology: waiting for gatekeepers.
The history of technical change offers
an insight into
the relationship between science and
technology. According to Mokyr (2002), the turning point was the Industrial
Enlightenment and the diffusion of what he defines as “useful arts”. The
exploration and cataloguing of artisan practices, the writing of the first
encyclopaedias and technical publications reduced the cost of accessing
knowledge. This process of rationalization and codification introduced the
scientific and experimental method to the fabricants: science, defined in a
broader sense as general knowledge,
descended from the laurels of the savants and started to interact with the less
esteemed, but no less valuable, useful knowledge. The feedback between
techniques and theory led to the development of new
technical languages, standards (like weights and measures) and incentives to
support innovation. The institutionalization of R&D came only much later,
when German chemical firms and the first major
At that
point, the dynamics of S&T intertwined,
while the two communities behind them started to interact and exchange
information. Different evolutionary logics (Gittelman and Kogut, 2001),
incentive structures and appropriability regimes still separate “Open Science”
from “Proprietary technology”. Dasgupta and David (1994) theorize a “New
Economics of Science”, starting from the assumption that S&T do not differ
in terms of output: codified, non-rival information can be a substitute for tacit and excludable
knowledge. Codification is always an expensive process, even if the costs associated
with it are quite heterogeneous across fields. Following Polanyi’s analogy
(1966), what is brought into focus (codified) and what remains in the
background (tacit) depends on incentives.
Scientific and industrial research are very different in their reward mechanisms, disclosure rules and
epistemic cultures: the networks of discovery and invention are still separate,
even if distinctive signs of convergence can be traced in fields where the
S&T interaction is particularly evident. Like a “pair of dancers”, S&T
are moving to the same music, but each one follows its own steps (Toynbee, 1963
and De Solla Price, 1965).
Scientists share their results in order to gather feedback and credit: this cumulative development
minimizes the duplication of effort (Nelson, 1959) and fosters diffusion. Public
interest for disclosure is combined with the individual
search for recognition and prestige. Merton (1957), who introduced the concept
of science as an institution, identifies in the priority-based credit
mechanism, the rule of sharing and the continuous reliability check (e.g.
through established empirical procedures) some of the distinctive characteristics of the scientific community.
"...essentially,
scholarship is a conspiracy to pool the capabilities of many men, and science
is an even more radical conspiracy that structures this pooling so that the
totality of this sort of knowledge can grow more rapidly than any individual
can move by himself" (De Solla Price, 1971)
Proprietary technology must deal
instead with the commercial interests of its sponsors and is usually shared
only inside the boundaries of the firm or the project. This intramural nature
is often defended by postponing part of the
codification to the last phases, through active monitoring, secrecy or
contractual obligations. The decision to patent an invention is taken with
great care and involves a careful estimate of costs and benefits. Science is
usually costly for the firm (Stern,
1999), but it is a necessary evil to motivate and attract scientific researchers.
Basic research implies an expensive externality, as the positive outcomes can
never be entirely appropriated within the boundaries of the firm that financed
it and some degree of spillover to competitors will be inevitable.
The interaction between S&T is multifaceted:
the linear model used by Vannevar Bush (1945) to influence
Technological gatekeepers (Katz and Tushman,
1982) funnel selected information and contacts, reconciling diverging
incentives and interests. As Gittelman and Kogut (2001) clearly state “Science and inventions do not follow the
same selection logics, but scientists produce both”. In the third chapter
we will deliberately test this hypothesis, using social network analysis as a
tool.
A competing explanation has been
proposed by Sorenson and Fleming (2004), who argue that networks are only an
imperfect vehicle for the transmission of knowledge, since they are bound in
physical and social space. The cost of maintaining a network tie makes network
diffusion slow and imperfect. Scientific publication on the other hand,
increases the spatial reach of spillovers by permitting a broadcast propagation
of information. To prove their hypothesis, the authors look at citations
received by patents that reference non-patent literature (NPL) in their
application. Patents based on NPL receive on average 4.86 citations (5 year
window), compared to a mean of 3.57 for the others.
Sorenson’s assumption is that since NPL
provides a codified base of accessible knowledge, it is more likely that
someone else will become aware of it and improve on the same ground (making a
citation to the originating patent more likely). Since publication of
background information spreads diffusion, patents based on NPL receive
citations from more distant inventors. Distance is here intended in a broad
sense: geographically, socially and technologically.
Sorenson, Singh and Fleming (2005) add
a test to discriminate between the effects of publication and social proximity:
their results assign a crucial role to science in the diffusion of knowledge,
as if it were possible to access technical knowledge ignoring geography or localized
social networks.
1.1 Tracking knowledge spillovers
Science stimulates knowledge flows, but
what is the exact propagation mechanism? Romer’s model
of endogenous growth (1990) is based on the assumption that knowledge
spillovers exist and are a remarkable determinant of growth[5].
Are these spillovers bounded in space or do they require a social network to support
diffusion?
Jaffe (1989) finds a positive relation
between university R&D (federal research funding) and neighbouring firms
patenting, which seems to confirm the presence of spillovers. It is the first attempt
to adapt the knowledge production function approach of Griliches (1979), in a way which takes geography into account. These results are later questioned in Krugman’s
“Geography and trade”:
“knowledge flows, by contrast, are invisible; they leave no
paper trail by which they may be measured and tracked, and there is nothing to
prevent the theorist from assuming anything about them that she likes.”
(Krugman, 1991)
Krugman was specifically targeting one
of the core concepts of the “New Industrial Geography” (NIG)[6],
the “Marshallian externalities of the third kind”[7]
or knowledge spillovers: firms located close to relevant knowledge sources are
able to innovate faster than the others, since they benefit from this special
kind of knowledge externalities bounded in space.
Looking at the complexity of the
innovation process it becomes obvious how co-location improves the exploration process (March, 1991): it reduces
search costs, facilitates coordination, allows the transfer of tacit and sticky
knowledge (Von Hippel, 1994), enables learning by doing and by interacting
(Feldman, 1994).
The overall effect of co-location is a tangible reduction in uncertainty (Arrow, 1962), confirmed by the
economic literature on small firms and start-ups. Large firms rely more on self-sufficiency,
are usually able to pursue an exploitation strategy and profit from a
routinized regime (Winter, 1984). Therefore location seems more relevant to small innovative firms, typically more focused and
applied in their research, than to large
ones (basic research is on average more codified and exchangeable over long distances).
Acs and Audretsch (1990) show that in terms of R&D
spending, small firms are more
productive in their innovative process. Even if the relationship between size
and innovation is far from linear (Cohen and Levin, 1989) and successful
innovation usually changes the size of a firm, evidence proves that small firms
are indeed more tied into regional networks. Almeida and Kogut (1997), when
dealing with the semiconductor industry, find higher levels of localization in
their start-up sample. Acs, Audretsch and Feldman (1994) link all these
factors, inferring that small firms are more productive because they rely on
knowledge spillovers from local university R&D.
1.2 Knowledge spillovers: going local
A direct reply to Krugman comes in 1993
from Jaffe, Trajtenberg and Henderson (JTH). The authors start from the basic
assumption that “knowledge flows do
sometimes leave a paper trail, in the form of citations in patents”. Their
analysis focuses therefore explicitly on the particular subset of spillovers
which leave an official trace[8].
Their aim is to test at the same time the existence and the localization of
spillovers, using forward citations to two cohorts of originating patents
(1975, 1980) as evidence. The underlying assumption is that proximity favours
knowledge flow, so that knowledge can be partly considered a local public good. The main technical problem JTH face
when trying to measure the degree of spillovers localization is to define a
baseline, a reference value that accounts for the existing agglomeration of
production. Even if “the ability to
receive spillovers is probably one reason for this pre-existing concentration
of activity”, JTH estimate what they define a “conservative measure” of
localization: they draw a control sample of patents as close as possible to the
originating
ones (in application date and
technological proximity) and compare the co-localization of each group of
patents (originating, controls) with the forward citations to the originating
group[9].
JTH’s experiment successfully proves that knowledge spillovers are indeed
highly localized at the country, state and SMSA[10]
level, even if the effect fades slowly over time.
Thompson and Fox-Kean (2005) identified
a spurious component in the localization estimates provided by JTH and improved
their methodology[11].
These changes generate a much finer
coupling between originating patents and controls, ensuring that the sample group acts in fact as
a credible baseline for evaluating the existing concentration of production.
Thompson’s controls are more “similar” to the originating patents. The results
show that only national borders act as a barrier to knowledge flow: localization at the state and metropolitan level loses
its strength. There is no equivalent research effort for the co-localization
effect between scientific articles and relative citing patents: in the fifth
chapter we will adapt JTH’s methodology to measure the role of geography in the
knowledge flows between S&T.
1.3 Mobility and the “invisible hand”
Thompson and Fox-Kean’s conclusion
introduces another possible explanation for knowledge flows: labour mobility. Localization
at the state and metropolitan area
level (above all in the
The issue of appropriability is
entirely reversed:
it is now the worker who can leverage her skills in order to profit from her talent. Furthermore if this holds, pure spillovers
disappear: the assumed externality is completely absorbed by the labour market
and becomes a pecuniary one. Implications for policy are clear-cut: science parks,
incentives for attracting innovative firms, direct public intervention lose
part of their role if the externality is already internalized and private
investment optimal. A broader institutional framework, capable of attracting
star scientists is required to promote innovation and growth. Furthermore a
national innovation system (Nelson, 1993) must be able to channel talent
towards productive and entrepreneurial activities, as defined by Murphy,
Shleifer and Vishny (1990).
Zucker et al. (1994, 1998, 2001) provide convincing evidence about mobility, by tracking co-authorships
and links between university star scientists and new biotechnology enterprises
(NBEs). The kind of knowledge mastered by star scientists in a still emerging
field is characterized by natural excludability. The simple idea that being located close to a
major university is sufficient for gaining access
to research outcomes is completely flawed. Again there is no spillover, the
labour market internalizes the externality reducing it to a rent. The authors
explicitly test the positive effects of star scientists on new firm entry and
growth, the development of new products and their introduction into the market. They divide their sample into:
·
affiliated stars, those who publish with an NBE as
affiliation;
·
linked stars, those who have co-authorship
relations with NBE researchers, but are able to maintain their university link
(proof of their high quality and contractual power);
·
unlinked stars, who work only for a university.
The most influential stars are the
linked ones, proving that no indiscriminate spillover takes place: unlinked
scientists do not have any positive effect on the performance of NBEs. Of
course, since bright scientists prefer to maintain their university affiliation
(for passion, prestige or access to the scientific network) and avoid
relocation costs, geography does have an influence on
innovation. It must be taken into account that Zucker analyses a case of breakthrough technology, where the natural excludability of knowledge is undoubted: as technology
moves into a more mature stage of development, the economic impact of star
scientists is likely to decline, together with their wages (inducing a probable
reallocation to other projects).
Almeida and Kogut (1999), studying the
localization of highly cited patents in the semiconductor industry, add an
insight into the relationship between mobility and
localization of knowledge. Institutions that favour intraregional mobility have
a pivotal role in the transfer of knowledge, since these kind of flows are
naturally embedded in regional labour markets. Localized knowledge spillovers
are not linked to a particular technology and differ widely between regions.
Mobility can escape local boundaries,
as an epistemic community is able to survive the end of co-localization. The
time spent building a common “vocabulary” becomes crucial when a knowledge-worker relocates, even to another country. Agrawal, Cockburn and McHale (2003) apply JTH’s
methodology and also control for previous co-location between inventors of cited
and citing patents. The empirical evidence demonstrates that having worked together in the past has an
influence on the probability of a knowledge flow (citation). Mobility in space
and between assignees introduces a further level of analysis: the role of
social capital. According to Breschi and Lissoni (2003): “knowledge flows, as evidenced by patent citations, are strongly
localized to the extent that labour mobility and network ties also are”.
1.4 Social proximity
Granovetter (2005) studies the link
between social structure and economic outcomes. Social networks define the path
and bandwidth of knowledge transmission. Network actors can implement forms of
reward and
punishment by enhancing their social
monitoring role. Trust, which takes time and effort to build, acts as a commitment device: the risk of being
excluded from a component and of losing
access to references, information or markets is often a credible one.
Worker mobility between different firms and regions
creates more connected and dense networks, bridging components that were isolates. Networks of inventors can be built by considering all the various
co-inventors of a given patent as a link: this has been shown to represent a real tie
between the people involved (Fleming, 2003 and 2004). The equivalent discovery network is created by taking into account the co-authorship
relations between scientists. Some structural differences, due to the type of
communities behind them, make the network of inventors and authors dissimilar. The
former is usually less connected,
since most of the information exchange is intramural: “organizational boundaries serve as informational envelopes within which
valuable information characterized by natural excludability is much more likely
to be diffused than outside the organization” (Zucker et al., 1994). Corporate
inventors move frequently between employers only in regions where there is a
flexible and developed labour market (e.g.
On the
other hand, scientific authors are
divided into tightly connected epistemic communities. Their network resembles
the typical “small world”[14]
structure: a few scientists act as hubs, connecting many nodes and funnelling (Watts
and Strogatz, 1998) most of the information. The productivity of scientists (Lotka, 1928) and inventors (Shockley, 1957) is particularly skewed. Milgram (1967) named outliers “sociometric
superstars”: they are the result of the organizing principle of preferential
attachment (Newman, 2004 and Furukawa, 2006), also known in sociology as
“Matthew effect” (Merton, 1968). Highly connected nodes are more likely to
increase their connectivity faster than less connected ones, in a sort of
positive feedback that resembles the externalities of network goods (Shapiro,
1999).
Jasjit Singh (2003), after building a
social proximity graph of co-inventor teams, studies the effect of social distance[15]
on the probability of a citation between patents over a nine-year period (1986-1995). In teams that have a close social link, the effect of
geographic co-localization on the probability of a knowledge flow becomes less relevant.
In this framework, localized knowledge spillovers are explained and confined by
interpersonal networks. Singh (2005) compares this effect with knowledge diffusion within a firm, proving again
that once social distance has been accounted for, neither geographical nor firm boundaries have a strong effect on the
probability of a patent citation[16].
Singh considers these results to be a fairly conservative estimate, as co-inventorship captures
only a small fraction of all the possible
ties between teams. Balconi et al. (2004), by showing that academic inventors
have a central position in inventor networks (both in
terms of degree centrality and betweenness), provide an additional hint:
sectors characterized by a close science-technology interaction should be analysed bridging together authors and inventors networks.
Singh concludes that “geography matters
for knowledge diffusion, at least in part because interpersonal networks tend
to be regional in nature” (p. 768, 2005).
The “social network hypothesis”[17]
is further tested by Breschi and Lissoni (2003): they add social proximity
measures to JTH’s experiment, after studying the evolution of the Italian
inventors network
(1978-1995). The outcome suggests that
labour mobility (perfect link[18]),
ceteris paribus, has an effect on the
probability of citation 20 times stronger than an indirect link. Knowledge
flows are bounded by the social network that allows them to diffuse. Another
paper by the two authors (2004) investigates the relation with a more detailed measure of social distance and
finds out that spatial proximity is not a prerequisite for knowledge transfer:
social networks can help overcome geographic distance and act as diffusion
vehicles. Indirect social chains can act as conduits for technical information,
even if the final inventor is unaware of the source of the knowledge to which she has been
exposed[19].
One of the research targets of this
work is to check the exact role of social and
geographical proximity in an area of close interaction between science and
technology: the semiconductor industry. The existing
literature deals with knowledge flows as revealed by citations between patents:
Sorenson et al. (2005) investigate the role of science by looking at citations
received by patents that build on NPL. This link is too remote and indirect, in
particular because we can exploit directly the information that makes a patent
similar to the scientific articles it cites. The patent-NPL relationship is a
fairer indicator of the underlying interplay between S&T and might offer,
together with social network analysis, an effective way of opening what is
still, to a great extent, a “black box”.
2.0 Patents and non-patent literature
“In this
desert of data, patent statistics loom up as a mirage of wonderful plenitude
and objectivity” (Griliches, 1990)
Patents have been increasingly used as
a source for relational data, as they include all the necessary information for building a comprehensive network of inventors, their affiliations and
localization. Forward patent citations have been widely used as a proxy for the economic value and quality of a patent
(Trajtenberg, 1990), or to track knowledge flows across space. Scientific
articles and co-authorship relations are commonly used in bibliometrics to
build indicators and statistics, to track international and multidisciplinary
research (Narin et al., 1991), to study social networks (Newman, 2004). There
are some similarities between patents and peer-reviewed scientific articles:
both require some degree of novelty, are an output of research, disclose relevant
information and codify part of their background knowledge. The aim and the
extent of the review process is however different (for a review: Meyer et al.,
2004).
In order to study the interaction
between science and technology, citations to scientific articles coming from
patent documents offer a unique opportunity. The search for “prior art” by patent offices is usually
of high quality and provides a starting point for identifying the codified knowledge on which an innovation
builds.
There are relevant differences between
US and European patent documents, due to dissimilar examination procedures and
requirements.
2.1 Science intensity and dynamism: semiconductors
If the objective is to investigate the knowledge flows between science and
technology, focusing on a single, self-contained
field reduces noise and increases the probability of finding a coherent subset of social links and knowledge
flows. If there is an area where scientists and inventors are likely to behave
in a similar way and exchange relevant knowledge, it will probably be exactly where science matters most for
patenting activity. Collins and Wyatt (1988) show that patents cite NPL when
they belong to active, rapidly developing fields.
The semiconductor field shows a growth
rate (in terms of patenting) of 4,3%
at EPO, where it comes after ICT (11%), biotech (9%), drugs and medical
technology (8%). Most of the increase derives from a surge in European
patenting, which has caught up with US and
Japanese rivals. This should be taken into account when dealing with the “European paradox” thesis.
Figure 2: EPO patenting
trends in the field of semiconductors (IPC subclass H01L) between 1990 and 2001.

If we look at USPO patent data
(1990-2003), growth rate for semiconductors is the highest (14%): Kortum and
Lerner (1999) and later Hall and Ziedonis (2001) have shown that this patenting
rush is due to strategic reasons. After Polaroid’s successful infringement suit
against Kodak (1986), the aggressive IPR enforcement policy of Texas
Instruments[20]
and institutional changes that brought about stronger
protection[21],
semiconductor firms increased patenting exponentially. The role of patents
changed from simple protection and source of revenues (licensing royalties) to
legal bargaining chips. Cross-licensing became a common practice: no single
firm was able to produce without incurring the risk of infringing neighbouring
rights, as innovative activity in the field is highly cumulative and
standard-dependent. The cost of interrupting a manufacturing process once started
was simply economically unbearable. Large firms built huge patent portfolios
and surrounded their core technologies with defensive patenting (blocking,
fencing and surrounding: for a review, see
Verspagen, 2004). Small design firms used patents to attract venture capital
investments and support their entry into specialized market niches. Furthermore, in a sector where technology advances rapidly, the cost of disclosing information (patenting) is
more than compensated by first mover advantages. What remains to be quantified
is if the benefits from specialization and the creation of a market for
knowledge (Arora and Gambardella, 1994) exceeded the costs of these “bargaining
chips”. Looking at the history of the semiconductor industry and its incredible
dynamism it is possible to answer in a positive way, even if we don’t have a
point of reference for the comparison.
Going back to NPL, it is important to
stress that 7% of all IPC[22] subclasses accounts for 80% of overall NPL
citations (INCENTIM, 2003): the S&T interaction is local in terms of
technology subclasses and highly skewed. Science intensity (Van Looy et al.,
2003) defined as number of NPL citations per 100 patents, can be used as a
proxy for the strength of the S&T linkage. This second test confirms the
hypothesis: the semiconductor field is
active and strongly science based (second only to biotech). More than
20% of patents have more than one link to NPL.
Table 1: Science
intensity across fields (EPO patents)
|
Technology fields |
% of
all patents citing scientific ISI articles |
(P-NPL[23]) |
Average
science intensity (all
patents) |
|
Biotechnology |
78.2 |
382.1 |
298.9 |
|
Semiconductors |
21.8 |
181.3 |
39.5 |
|
Information Technology |
19.9 |
145.9 |
29 |
|
Telecommunications |
19.6 |
142 |
27.8 |
|
Control Technology |
17.8 |
209 |
37.2 |
|
Optics |
16.2 |
194.1 |
31.4 |
|
Medical Technology |
5 |
162 |
8.1 |
|
Environmental technology |
4.2 |
135.6 |
5.7 |
To understand which subjects match these citations
we can look at the four most cited ones using ISI-SCI data on journals. For
semiconductors they are: applied physics (42,6% of total share), electrical and
electronic engineering (21,1%), condensed matter physics (7,6%) and
multidisciplinary materials science (6,4%). Also the distribution across
journals is concentrated: “Applied Physics Letters” accounts for 19,4%,
followed by its Japanese companion Japanese Journal of
applied physics” (9%), the “IEEE Transactions on electronic devices” (6,9%) and
the “Journal of the electrochemical society” (5,7%). This confirms the idea of a sector that relies heavily on basic
science (physics) and on the capacity to transform discovery and invention into
market innovation.
2.2 Matching NPL citations and ISI titles
In the
previous sections we have seen why we decided to rely on data coming from EPO
patents in the semiconductor subclass (H01L) in order to investigate the
S&T link.
Our final dataset contained 38,761
inventors, coming out from 22,475 single patents registered between 1990 and
2003. The total number of NPL citations considered was 5797[24],
corresponding to 4481 single ISI articles (based on unique UT code[25]),
328 journals and 12,023 cited authors.
The first technical problem to solve was
the parsing of the NPL citations from each patent document. For the analysis, the EPO-CESPRI[26] database was used,
which contains all EPO applications
between 1990 and 2003. From a first selection of 15,702 potential NPL
references coming from H01L patents, a selection of 8879 citations to
scientific papers published in journals covered by ISI-Thomson Science Citation Index (SCI) was taken. These citation were then matched[27]
with the cited ISI journals dataset[28],
in order to obtain a final dataset linking NPL references and article
information[29]
as provided by ISI-SCI.
2.3 Identifying highly cited articles
The number of citations received by an article is usually an approximate measure of its impact, inherent quality or degree of novelty. Authors who receive more citations are therefore likely to be more influential members of our social network (we will test this hypothesis in Chapter 3). As we are looking at scientific publications cited in patents, it is useful to identify those articles that can be defined as “highly cited” in this particular context. What the scientist may consider to be a major article will not necessary contain relevant information for the inventor and vice versa.
Gittelman and Kogut (2001) found a
negative correlation between highly cited patents (usually considered
high-impact innovations) and highly cited publications. Even if technology
benefits from spillovers from science, the two different “evolutionary logics”
seem to select different types of knowledge. The rule of preferential
attachment (or “Matthew effect”) is
less pronounced in the inventor community, which is more focused on the market
opportunities behind a given discovery. Excellent science seems to influence negatively the ability of a firm to produce high impact
patents.
To identify what can be considered a highly cited article inside the NPL sample, we had
to count citations received from patents inside a given time frame. We tested
three different windows: 3, 5 and 7 years after the publication date of the cited
article. Since the last application date (APDT) in our sample was from 2003, in
order to avoid any truncation bias we selected only articles that had the full
window to be cited as NPL. The results suggest that the distribution of citations is highly skewed
and similar across different sectors (see Figure
1). After comparing the three solutions we
chose a 5-year window, as it is a fair
compromise between an excessive truncation (7 years) and a time which is too
short for citation (3 years).
Figure
1: distribution of articles according to the number of citations
received by patents. The value axis indicates the percentage of articles kept
by cutting the distribution at the given number of citations per single NPL. The
sectors correspond to the following 4-digits IPC subclasses: C12Q and G01N33
(biotech), H01L (semiconductors), H04L (transmission of digital information),
G10L and G06T (speech recognition), H01S (lasers).

The graph shows a regular[30]
pattern across all these science-intensive sectors: considering as highly cited
those articles that receive at least 3 citations from different patents leads to a sample of between
4,4% and 7% of the overall population. Most of the articles receive just one
citation: lower level 70% of total NPLs for transmission of digital
information, higher 87% for biotech. Articles with more than 5 citations in the
semiconductor field are just 0,74% of all cited ones. From now on we will consider
as highly cited articles those that receive at least 3 citations after 5 years from their publication. According to this
criteria, in the semiconductor field 11% of authors (1334) and 7% of articles
(340) belonged to the “highly cited” class.
2.4 Building the network of inventors
Inventors who
are also authors of articles cited by
other patents[31]
should act as gatekeepers, as conduits between S&T. Since their
publications have been cited by other inventors from the same technological subclass, it is credible that their scientific
research has an intrinsic and direct economic value for technology. In these people, Gittelman and Kogut (2001) observe the singular ability to
identify the elements of scientific
knowledge which
could have an economic impact and to apply them to industrial research. This process requires translation,
in terms of vocabulary, reference schemes and incentives. By building the
network of invention and localizing authors-inventors, we wanted to explore their social function and to control for any structural difference between them and the
rest of the population.
We created the network structure[32]
using patent data: each co-inventorship was considered on the graph as a link,
a tie between two nodes (individuals). The same table was used to calculate the
degree centrality[33]
of all inventors. Using Pajek[34], we were also able
to determine the betweenness centrality[35]
for all the nodes belonging to the largest
component[36]
of the network: 4433 inventors, with an average of 7.8 ties each. The largest
component represents only 12.7% of the total population[37],
which confirms the highly fragmented nature of inventor
networks (see Table 2). In a typical
“small world”
case, it would encompass up to 80-90%
of the total population. As we will see in section 6.1, introducing the ties that derive from co-authorship relations
drastically reduces
the average distance inside the
network and increases the dimension of the largest component. We should therefore consider
this representation as incomplete, as it only considers a small part (co-inventorship) of all the effective
links between the nodes.
Table 2: inventors in the
first components and their relative share25
|
Component |
Inventors |
% of total |
|
1st |
4433 |
12.7% |
|
2nd |
419 |
1.2% |
|
3rd |
341 |
0.9% |
|
4th |
237 |
0.7% |
|
5th |
217 |
0.6% |
|
Number of nodes |
Count |
% of total |
|
2 |
2256 |
35% |
|
3 |
1546 |
24% |
|
4 |
942 |
15% |
|
5 |
516 |
8% |
Table 3: relative weight
of dyads, triads etc. on the overall network
If we examine
the
largest component we can see that it
is fairly international, even if the
Figure 3: country distribution of the inventors inside the largest component

We can further describe this component by looking
at the inventors’ assignees: at first sight, three players seem to exist but we should note
that Lucent Technologies was spun-off from AT&T in 1996, while Agere System
is a spin-off of Lucent (2000). This substantiates the strong “intramural”
nature of the component.
Figure 4: assignees of the
inventors inside the largest component

The situation does not change if we look at other
major components: the second is entirely localized in
To
spot authors-inventors
we ran a match between the names of semiconductor inventors and authors cited
as NPL. As a condition we used the surname
and first three initial letters of the names (when available) as extracted from
the full name[38]. The results were then manually checked and returned 1,626 matches (297
are highly cited authors-inventors). The graph
of the largest component (AT&T and its spin-offs) shows that all the inventors are tightly connected with the rest of
their community, even if some peripheral groups exist.
Figure 5: graph of the largest
component (4,433 inventors, over 30,000 ties). Cited authors-inventors are green, highly-cited ones are yellow.

Which firms are also engaged in scientific research and publication? To
answer this question we simply have to check where our authors-inventors come from (Table
4). Even if 37,6% of the sample is highly fragmented (assignees with less
than 1% of the total number of inventors), we can clearly see the impact of
high-tech, large innovative firms: both AT&T and IBM have more than twice
as many gatekeepers as
universities that patent in the same field.
Even if
we consider the bias generated by their higher propensity to patent, we cannot ignore the scientific
attitude of these firms. They are able
to manage basic and applied research, relying on huge R&D budgets and
internal development. Their inclination to patent goes hand-by-hand with the ability to
publish in peer-reviewed scientific
journals[39]
and stay plugged into the scientific network. S&T interact profitably, at
least in these examples. The European share is quite low (below 7%), compared to an overall
share of 24% of inventors, due in part to a different
industrial structure (smaller firms).
Table 4: assignees of authors-inventors
|
% of total authors-inventors
population |
|
|
AT &
T |
11.2 |
|
IBM |
9.6 |
|
LUCENT
TECHNOLOGIES |
5.1 |
|
|
4.5 |
|
PHILIPS
ELECTRONICS |
3.7 |
|
Other Universities |
3.4 |
|
EASTMAN
KODAK |
3.1 |
|
MOTOROLA |
2.9 |
|
APPLIED
MATERIALS |
2.9 |
|
XEROX |
2.5 |
|
HEWLETT-PACKARD |
2.4 |
|
GENERAL
ELECTRIC |
1.9 |
|
SIEMENS |
1.7 |
|
SHARP |
1.5 |
|
ENERGY
CONVERSION DEVICES |
1.4 |
|
|
1.3 |
|
SGS-THOMSON
MICROELECTRONICS |
1.1 |
|
AGILENT
TECHNOLOGIES |
1.1 |
|
SAMSUNG
ELECTRONICS |
1.0 |
|
Others (<1%) |
37.6 |
The highly cited authors-inventors are a few leading researchers. It is therefore useful to compare their localization (figure 7) with that of the overall
population (figure 6). The links between S&T are strongest in the
Figure 6: country
distribution of all inventors in the semiconductor field (H01L)

Figure 7: country
distribution of highly cited authors-inventors[40]

3.0 Real gatekeepers?
In this chapter we
will verify if authors-inventors are indeed more highly connected (hypothesis
I) and influential (hypothesis II) individuals in our network.
Gatekeepers are usually more valuable people, as they offer access to other
groups and relevant information. If authors-inventors really bridge the two
communities of S&T, they should be less likely to end up as isolates and
should be found inside the largest components more frequently (in relative
terms) than simple inventors. We could actually expect them to be the glue that
holds the bigger components together.
This is exactly what table 5 reveals: the percentage of authors-inventors in the 10 largest components is almost double,
while the effect is reversed in the smaller groups. These values moreover
exclude isolated inventors, about 9% of the total population, who are all
simple inventors.
Table 5: Comparison
between inventors and authors-inventors (components)
|
Component/s |
% of tot. Inventors |
% of tot. Authors-inventors |
delta |
|
1st |
12.4 |
18.5 |
6.1 |
|
2nd |
0.8 |
10.2 |
9.5 |
|
10
largest |
18.2 |
32.4 |
14.2 |
|
|
|||
|
10
smallest |
63.8 |
49 |
-14.8 |
|
2-nodes |
12.9 |
9.6 |
-3.3 |
|
3-nodes |
13.4 |
10.3 |
-3.1 |
|
4-nodes |
11.0 |
7.0 |
-4.0 |
Social network analysis offers some powerful tools for evaluating the functional role of a node inside a graph. We have already introduced degree centrality and
betweenness (see 2.4), now we will
compare these values to check if authors-inventors differ systematically from other inventors. We used a T-test to evaluate the difference between the means of the
two populations relative to the variability of their scores.[41]
In each case, SAS also calculates
a folded F-test[42]
for equality of variances, in order to identify which type of T-test is more
appropriate (Pooled[43]
in case of equal variances, Satterthwaite
for unequal[44]).
3.1 Degree centrality
Hypothesis
I: Scientific research and
publishing makes authors-inventors more
“valuable”, they should be able to build more connections: degree centrality
should be higher than the average.
A first T-test compares the degree centrality of inventors and cited
authors-inventors. The statistics for the two populations are
reported in table 6.
Table 6: degree
centrality statistics for cited authors-inventors
(1)
and simple inventors (0).
|
|
Cited |
N |
Maximum |
Mean |
Std Dev |
Std Err |
|
|
Inventors |
0 |
33331 |
72 |
3.9319 |
3.3374 |
0.0183 |
|
|
Authors-inventors |
1 |
1626 |
71 |
4.4071 |
4.4242 |
0.1097 |
|
|
Difference
(0-1) |
|
-0.475 |
3.3956 |
0.0862 |
|||
Both groups have some relevant outliers
(a maximum of 72 ties for a single individual): they are likely to be chief
researchers (Balconi et al., 2004), who sign a large number of patents. Before
proceeding with the T-Test we found out with a folded F-Test[45]
that the variances of the two groups were unequal.
Table 7: T-Test for degree centrality (inventors vs cited)
|
|
Variances |
DF |
t Value |
Pr > |t| |
|
Satterthwaite |
Unequal |
1716 |
-4.27 |
<.0001 |
The t value
is negative[46]
and highly significant (Pr > |t| = 0.001):
this confirms that, on average, cited authors-inventors
have
more ties (higher degree centrality) than simple inventors.
As a check we repeated the same
procedure comparing inventors to highly cited authors-inventors (as defined in section 2.3). This second test requires that the
authors-inventors involved have published at least on article cited by 3 or
more different patents. Statistics for the groups are reported in table 8.
Table 8: degree
centrality statistics for highly cited authors-inventors
(1)
and simple inventors (0).
|
|
HC |
N |
Maximum |
Mean |
Std Dev |
Std Err |
|
|
Inventors |
0 |
34660 |
72 |
3.9464 |
3.3835 |
0.0182 |
|
|
HC Authors-inventors |
1 |
297 |
34 |
4.8418 |
4.6374 |
0.2691 |
|
|
Difference
(0-1) |
|
-0.895 |
3.3961 |
0.1979 |
|||
Even if the highly cited group does not have
comparable outliers
(the maximum number of ties is 34), there is still a negative difference between the means that favours the centrality of authors-inventors.
Table 9: T-Test for degree centrality (inventors vs highly cited)
|
|
Variances |
DF |
t Value |
Pr > |t| |
|
Satterthwaite |
Unequal[47] |
299 |
-3.32 |
0.001 |
T value is again negative and highly significant. We
can safely affirm that our first hypothesis is confirmed again: highly cited authors-inventors have more ties than simple inventors
and therefore have access to more groups and sources of knowledge.
3.2 Betweenness centrality
Hypothesis
II: authors-inventors act as gatekeepers, funnelling most of the
knowledge flows that span the various groups:
betweenness centrality should be higher than the average.
We start directly with the more
restrictive case, the one that compares inventors to cited authors-inventors. As the betweenness measure can only be calculated within a component, we chose to run the test on the largest one (4433 nodes,
301 are cited authors-inventors): if
gatekeepers have a crucial role in funnelling knowledge flows we can expect
this effect to be more relevant inside a larger, more heterogeneous component.
Table 10: betweenness
statistics for cited authors-inventors (1) and simple
inventors (0).
|
|
Cited |
N |
Maximum |
Mean |
Std Dev |
Std Err |
|
|
Inventors |
0 |
4132 |
0.2517 |
0.0023 |
0.0115 |
0.0002 |
|
|
Authors-inventors |
1 |
301 |
0.0963 |
0.004 |
0.0129 |
0.0007 |
|
|
Difference
(0-1) |
|
-0.002 |
0.0116 |
0.0007 |
|||
Table 11: T-Test for betweenness (inventors vs cited)
|
|
Variances |
DF |
t Value |
Pr > |t| |
|
Satterthwaite |
Unequal[48] |
336 |
-2.13 |
0.0342 |
Even if the inventors count some outliers (maximum
betweenness 0.25), the mean for authors-inventors is almost double. The folded F-Test rejects equality of variances:
the Satterthwaite T-Test returns a negative value for t, significant at 3% level.
Hypothesis
II is confirmed: authors-inventors are on
average more influential people in the network. They can control the flow of
information between most others: if they were artificially removed from the
network, most of the largest components would probably split into smaller ones,
disconnecting entire groups of people. If S&T interact it is likely that authors-inventors are a key part of the process.
4.0 Who produces and who exploits science?
Looking at the interface between
S&T can shed some light on the nature of the “European paradox”: does European
science lag behind its American counterpart, is its industry unable to absorb and apply its
scientific output or is it a problem of organizational structures (e.g.
transfer groups)?
Dosi et al. (2005) present detailed
evidence against the supposed paradox. Europe’s superiority
in science cannot be taken for granted and indeed the
Two fields where
Figure 8: Country
distribution of all scientific articles cited by patents (H01L).

Figure 8 shows that Europe holds a satisfactory share (22%)
of citations, even if we have to consider that the Japanese one is probably underestimated
because
of language barriers in the search process for “prior art”. Historically the semiconductor
industry has been always dominated by US firms, while
Figure 9: Users (category
axis) and suppliers of scientific research (NPL cited by patents in the field
of semiconductors).

European patents rely in almost the same proportion on domestic and US
science. The home bias (Narin et al., 1997) is clearly visible for all three
patenting countries when figures 8 and
9 are compared.
The only conclusion that can be drawn
is that knowledge flows from science to technology actually cross the
boundaries of their originating countries and reach foreign patents. At first sight, the role of publication and
scientific disclosure seems to overcome
the limits of geography and social networks. The remaining chapters will try to
quantify this effect exactly and separate it from
other measures
of proximity. Where do these citations
to articles really come from? Are followers simply trying to imitate and absorb
the leader’s output or is there a more subtle explanation, one that accounts
for the real distance that separates an inventor from her colleague abroad?
5.0 Patents and NPL co-location: the tip of an iceberg
The
aim of this chapter is to test whether knowledge flows from science to
technology are geographically localized. In a connected economy, where ICT permits a fast,
cheap transfer of large chunks of
data, how much do local specificities matter? When an inventor develops her concept, is she more influenced by codified
information available in scientific journals or by the knowledge she obtains
through a direct chain of social contacts? How much does the national system of
innovation influence her sources?
JTH’s
methodology (1993) can be adapted to NPL citations: are patents more likely to
cite scientific articles published in the same country (region) or is geography
irrelevant to this special kind of dialogue between S&T? Do
national (local) boundaries limit the diffusion of scientific information? If
the probability of a citation depends only on the quality of the codified
content of an article, it should not be bounded in space. Observed co-localization
between patents and NPL citations should then be justified only by a pre-existing agglomeration of production. It is possible to control
for this effect by building a set of control
patents and using them as a baseline.
The level of co-localization between originating patents and NPLs, can then be
compared with that between controls and NPLs: if the originating patent appears to be more co-localized than the controls,
then citations to scientific articles are indeed geographically localized with
their citing patents.
To ensure that the control
patents actually “do their job” of representing base concentration, it is vital to select ones which are as close as possible to the originals (Thompson,
2005). We created all possible couples between semiconductor patents applied for in the same year that did not cite the same NPL and
then ran a stratified sampling procedure (by patent and by year) to extract one
control for each originating patent. As we are working only with patents from
the same 4-digit subclass (H01L), controls are by design similar from a
technological point of view. The next
step was to create two contingency tables, one for the country match between
patents and NPLs, the other for controls and NPLs. From these tables, we divided the data into co-localized or not,
first at country level and then, on US data only, at state level. To test whether patents were more co-localized than
controls and therefore that knowledge spillovers between S&T exist and are geographically
bounded, we used odds ratios. An odds ratio[50]
is a way of
measuring effect size[51]
(the relation) between two binary probability variables[52].
A value greater than 1 suggests that
the event is more likely to happen in the first sample (citing patents in our
case).
Table 12: Co-localization percentages for
citing and control patents
|
|
Citing |
Controls |
Odds Ratio |
95%
Wald |
|
Country
|
37,9% |
7,1% |
7.87 |
7.085-8.759 |
|
State
(only US) |
20,2% |
7,8% |
2.99 |
2.757-3.256 |
The
results prove that citations to NPL display quite strong co-localization with their citing patents. The effect is stronger at country level, as
evidenced also by Thompson (2005): a patent is almost 8 times more likely to cite an NPL from the same country once the
existing agglomeration of production has been accounted for (using controls as
a reference). Country boundaries are a tangible obstacle to knowledge
spillovers, which makes Sorenson’s theory (2005) about the role of codified
knowledge and publication in the indiscriminate diffusion of information less likely. At
Our
·
a state’s
share on overall
and
·
the total number of NPL citations made in the patents of that state
The ratio between actual and expected
citations is an explicit test of the local nature of the S&T interaction:
patents cite scientific articles coming from the same state more frequently, even if we control
for the existing concentration of R&D expenditure. As citations to articles could arise even without any contact
between inventors and authors (codified, available knowledge), this test is even
more reliable: NPL citations are likely to be only the tip of an iceberg, as
they capture only very formal and selected links between local science and
technology. Investigating the social ties of inventors and authors is exactly what is below the waterline.
Figure 10:

The
previous chapter left us with evidence about the co-localization between citing
patents and cited scientific articles. Without any further
investigation we could conclude that science and technology interact locally,
where particular conditions create the right incentives for the diffusion of
knowledge. Innovation is a cumulative and collective effort, therefore it is
likely that local specificities encourage a positive feedback mechanism,
reinforcing established positions. Breshnahan, Gambardella and Saxenian (2001)
separate the effects that sustain a going cluster (agglomeration effects,
spillovers, increasing
social returns) from those that create the conditions for its
formation. All the “old-economy” inputs they list have a connection to human
capital: firm-building capabilities, which are usually part of a broader institutional framework
(North, 1990), managerial skills, skilled labour, access to markets. Individuals and their social connections have a
key role in bridging knowledge and enabling access to these inputs.
Science-based sectors (Pavitt, 1984) require by definition
access to frontier scientific research, but we know from Narin (1991) that this
process is becoming increasingly international. Leading institutions in the
6.1 Social proximity in numbers
One of
the first problems to solve before trying to test
the “social network hypothesis” was how to construct a comprehensive network of the inventors and authors in the semiconductor
industry. In section 2.4 we derived the invention network
from patent data, and later added the
information on authors-inventors. However, if
we want to make
a reliable estimate of the social
distance between any two individuals (author or inventor), we have to include
the layer with all the ties between scientific authors (network of discovery).
Since considering all the co-authorship relations across fields is prohibitive
and creates a huge list of false positives (due to homonymous names), we
decided to limit the scope of our author
network to people who have published articles cited in
patents in the field[55].
This is a decent trade-off between ignoring the discovery network entirely and introducing excessive noise in the estimates. The basis
for the “double-layer” network was: 28,998
single inventors, coming from 20,155 patents applied between 1978 and 2003;
7967 individual authors, matching 3263 ISI articles (1975-2003);
967 authors-inventors[56], acting as
flyovers between the two levels of invention and discovery.
To quantify the effect of co-authorship
links, we can compare the structure of the network before and after their
introduction. Isolates represent 44% of the inventor
network[57],
but account only for 26% of the joined networks. Co-authorship specifically alters the upper tail of the distribution, making the
network closer to a “small world”. The largest component, which accounts only 15,8% of the inventors’ network, increases up to 45,5% of the whole net, taking up all the
bigger components of the previous network. The second largest, which represents
33% of the first (INV), shrinks to 0,7% (INV&AUT). This outcome would probably even more evident if we had used the total
co-authorship relations and not only those arising from NPL citations. Figure 11 gives us the growth of the
largest component: new nodes are added and others are drawn from previously
isolated components.
Table 13: Structural change in the first components after the introduction of
co-authorship ties (reference year 2000). INV&AUT
refers to the “double layer” network, INV
to the network built considering only co-inverthorship.
|
Component |
Nodes (INV&AUT) |
Nodes (INV) |
Share (INV&AUT) |
Share (INV) |
|
1st |
12055 |
3094 |
45.5% |
15.8% |
|
2nd |
94 |
1044 |
0.35% |
5.3% |
|
3rd |
67 |
363 |
0.25% |
1.9% |
|
4th |
65 |
218 |
0.25% |
1.1% |
|
5th |
62 |
183 |
0.23% |
0.9% |
Figure 11: Evolution of the
largest components considering co-inventorship relations (green) and also
co-authorship ties (blue).

Authors seem to have less influence on the lower part of the distribution (except for
isolates): adding them does not increase the presence of small groups in the total population, the shares remain practically
unchanged.
Table 14: Structural
change in the lower part of the distribution: dyads, triads… (reference year
2000)
|
Number of nodes |
Count (INV&AUT) |
Count |
Share (INV&AUT) |
Share (INV) |
|
2 |
1647 |
1603 |
43.5% |
44.2% |
|
3 |
908 |
843 |
24.0% |
23.3% |
|
4 |
457 |
434 |
12.1% |
12.0% |
|
5 |
240 |
223 |
6.3% |
6.2% |
Compared to the network analysed in section 2.4,
we now have a much clearer outlook on the role of science in the field of
semiconductors. Looking inside the most
radically altered part of the net, the largest component, we can search for
country differences, so as to estimate their
relative advantage in innovation-related science (figure 12).
Figure 12: Country
distribution of the largest component (reference year 2000)

Even if the results are not directly
comparable[58],
the
6.2 Drawing a control sample
The social network hypothesis (Singh,
2003; Breschi and Lissoni, 2004) can be tested by estimating a citation
function Pr(P,A), that specifies the
probability that a patent P cites a
given scientific article A and by
testing the influence of social proximity on this probability, after controlling for
geographical and other factors that
could affect Pr.
The citation function has a logistic
functional form[59],
but cannot be correctly estimated when drawing
a random sample, as citations are very rare cases in
the overall population of all possible pairs patent-articles. As Sorenson
(2005) points out: “logistic regression
yields biased estimates when the proportion of positive outcomes in the sample
does not match the proportion in the population”. Furthermore, using all
possible pairs for the logistic regression instead of a sample, is practically
unworkable, since the data matrix would be huge[60].
Since the cases of citation (y=1) are
more informative and relevant for the regression, the strategy is to retain all
of them, while sampling a smaller proportion of no-citation couples (choice-based sampling procedure). As the
stratification intervenes on the dependent variable it is necessary to proceed
with the estimate by using a weighted exogenous
sampling maximum-likelyhood (WESML) estimator.
The WESML alters the logistics maximum
likelihood function by weighting each observation by the number of elements it
represents of the overall population: a weight of 1 is assigned to all the
citation cases (as we retain them all), while controls receive as weight the
inverse of the sampling probability of pairs with that particular combination
of years (year of the citing patent, year of the cited article). In our case
for each citation (y=1) we selected two “control” pairs, and repeated the
sampling procedure for each cohort of originating patents (from 1990 to 2000).
The final dataset contained 1,987
citations and 4,224 non-citations (years for the cited articles starting from
1985, as defined by our citation window), covering 1,620 single articles and
4,366 unique patents.
The next
step was to calculate the values of our explanatory variables, geographical and
social distance, for each patent-NPL pair.
6.3 Geographical proximity with geocoding
If we want to test the relative roles of geography and social proximity we need a precise estimate for the physical distance
between any two individuals in our net. Dummies for regional co-localization
are an acceptable start, but we can actually do better by fully exploiting the
information available in our dataset. By parsing street addresses of all
authors and inventors and geocoding[61]
them, we were able to associate each individual with a set of latitude and longitude coordinates. From a technical point of
view the process was separated for worldwide and US inventors. In the first
case, we used a combination of city name, region or
provinces (NUTS2[62]
and NUTS3) and country, while for the
Figure 13: a screenshot of
our Google Maps implementation (the numbers inside each balloon represent
authors or inventors located in the any one place).

Figure 14: a second
screenshot (the lines connecting two different points represent co-authorship
or co-inventorship relations).

The next step was to compute the distance between each author and each inventor for every patent-NPL couple (cases and controls). Using the latitude and longitude data, we applied the Haversine formula[65] which is particularly suitable, compared to the spherical law of cosines, for numerical computation even at small distances (Sinnott, “Sky and Telescope”, 1984).
6.4 Social proximity, reloaded
Social
distance changes over time, as new teams are
formed for a project or scientific collaboration. Individuals who were unconnected or only indirectly connected, can grow closer during the evolution of their careers. If we want to measure exactly the social distance
between citing patents and cited NPLs, we must refer
every
time to the network existing at the
time the
paper trail was created. Hence we
cannot simply calculate social proximity once, but must create a network for every year in the dataset:
for each patent-NPL pair at time t (where
t refers to the application date of
the citing patent, e.g. 1999), we have to use the relative network of inventors
and authors at time t-1
(e.g. 1998). We computed social distance using SAS 9 and Moody’s add-on SPAN.
Estimating the effects of geography and
social distance on the citation probability function is the final step of our
research. We ran a series of logistic regressions, starting from a very simple
model that accounts only for geography, to one that introduces our measures of
social proximity. As a value for geographical distance we tried both the logarithm of the average and the minimum distance
between all couples of inventors and authors for a given citing patent-NPL
link. To all regressions we added a set of fixed effects for the year of the
citing patents and the time lag between citing patent and cited NPL. Since
focal patents might enter the regression more than once (if they cite more than
one NPL), we report robust standard errors with clustering on the citing
patent. We also added a control for the type of journal cited (from very basic
to very applied research, on a scale from 1 to 4), using CHI’s classification
of scientific journals[66].
Table
15 reports estimation results for
the first case: geographical distance (glog)
has a negative effect on the probability of a knowledge flow (citation).
Table 15: the probability
of a citation decreases with geographical distance. The table reports odds-ratios,
not Logit coefficients. The estimation includes fixed effects for the year of
citing patents and time lag (lag).
|
Odds ratios |
Robust Std. Err. |
|
glog 0.7798528a |
0.0139077 |
|
lag 0.9936209ns |
0.0167381 |
|
a significant at the 1% level;
ns not
significant |
|
|
Number
of observations |
6211 |
|
Log-likelihood |
-46.469358 |
|
Pseudo-R2 |
0.0285 |
To introduce social distance we created a complete set
of mutually exclusive dummy variables:
·
dd0: takes value 1 in
case of personal self-citation
(author-inventor) -> minimum geodesic distance between patent and article is
0
·
dd1: at least one author
and one inventor have previously worked together (past collaborator), either on a patent or a scientific
article -> minimum geodesic distance is 1;
·
dd2: inventors and
authors have at least one common collaborator,
which reduces the geodesic distance to 2 “handshakes”;
·
dd3: geodesic distance is
3;
·
dd6: geodesic distance is
either 4,5 or 6;
·
ddc: connected, but at a
distance larger than 6;
·
dnc: no social link, all inventors and authors belong to
different components of the network.
As we can see in Table 16, the existence of a tie is associated with a higher
probability of knowledge flow, with the probability sharply decreasing as the
geodesic distance increases. Once social
proximity has been accounted for, the negative effect of geographical distance
falls[67]
(odds ratio increases from 0.77 to 0.83).
Collaborative networks are a way of overcoming
geographic distance and having access
to relevant knowledge flows between S&T. Ceteris paribus, a past collaboration increases the probability of
a knowledge flow more than 16 times. Also indirect social links can play a
role: having a common acquaintance (dd2),
increases the chances of acquiring useful knowledge more than 4 times; at
geodesic distance 3, the probability is 2.5 times more, while a quite long path
shows (below or equal to 6 degrees of separation) still a 46% premium.
Table 16: Social
proximity is highly significant and reduces the negative effect of geographical
distance. The table reports odds-ratios, not Logit coefficients. The estimation
includes fixed effects for the year of citing patents, time lag (lag) and type of journal cited (L),
which turns out to be not significant.
|
Odds ratios |
Robust Std. Err. |
|
glog 0.836214 a |
0.0152984 |
|
dd0 275.1944 a |
164.894 |
|
dd1 16.78185 a |
9.859315 |
|
dd2 4.269552 a |
1.30857 |
|
dd3 2.520375 a |
0.6084489 |
|
dd6 1.463143 b |
0.2831672 |
|
lag 1.009529 ns |
0.0207985 |
|
L 0.958468 ns |
0.042366 |
|
a significant at the 1% level;
b significant at the 5% level; |
|
|
Number
of observations |
6211 |
|
Log-likelihood |
-43.473071 |
|
Pseudo-R2 |
0.0911 |
Tables
16 tells us that geography matters for knowledge diffusion,
but social networks can help overcome its boundaries. We did not find any
relation between the type of journal cited (from basic to applied research) and
knowledge
flows: this further restricts Sorenson’s hypothesis on the role of public
disclosure and codified scientific knowledge in the indiscriminate diffusion of
innovation. We will now look again at
the interplay between geography and knowledge flows by creating 4 new dummy
variables:
·
usus: takes value 1 if both the citing patent and
the cited article are assigned to the
·
eueu:
same as the previous dummy, for
·
useu:
is equal to 1 where a
·
euus: just the opposite,
EU patent citing
As
expected, the first estimation results
prove a positive effect of co-location: an American patent is 37% more likely
to cite a scientific article written in the same country, while for
Table 17-18: Co-location and
home bias comparison for US an EU. The first table does not include measures of
social proximity. The table report odds-ratios, not Logit coefficients. Both
estimations includes fixed effects for the year of citing patents, time lag (lag).
|
Odds ratios |
Robust Std. Err. (adjusted for 4366 clusters in citing) |
|
usus 1.376774 a |
0.1199584 |
|
eueu 1.436757 a |
0.1568459 |
|
euus 0.791294 b |
0.0805618 |
|
useu 0.739480 a |
0.0830658 |
|
lag 0.992782 ns |
0.0154311 |
|
a significant at the 1% level;
b significant at the 2% level; |
|
|
Number
of observations |
6211 |
|
Log-likelihood |
-47.409265 |
|
Pseudo-R2 |
0.0088 |
|
Odds ratios |
Robust Std. Err. (adjusted for 4366 clusters in citing) |
|
usus 0.862342 ns |
0.0828987 |
|
eueu 1.404109 a |
0.1643845 |
|
useu 0.834504 ns |
0.0864882 |
|
euus 0.688375 a |
0.0792875 |
|
dd0 498.1717 a |
292.3483 |
|
dd1 28.45921 a |
15.60164 |
|
dd2
5.31456
a |
1.232905 |
|
dd3 2.90833 a |
0.5096658 |
|
dd6 1.54277 a |
0.1642231 |
|
lag 0.99846 ns |
0.0215568 |
|
a significant at the 1% level;
ns not
significant |
|
|
Number
of observations |
6211 |
|
Log-likelihood |
-43.807041 |
|
Pseudo-R2 |
0.0841 |
Controlling for social proximity leaves
the premium for EU patents citing EU articles almost unchanged and significant,
and decreases, ceteris paribus, the cross-citation probability between European
patents and American ones. On the other hand,
the usus dummy turns out to be negative (OR<1) and non significant: there is no
tangible effect of being co-located in the United States if we account for
social distance, while having a prior collaboration makes a knowledge flow 28
times more likely. A social network is indeed a very efficient diffusion vehicle for
knowledge, particularly in the
Table 19 shows once more that
connectedness leads to a greater probability of citation: citations have closer
ties (dd0-dd3) and are less likely to arise
between unconnected groups. Connected
groups are also geographically closer (table 20): geography matters, because it
makes the creation of social ties more likely.
Table 19: Percentage
distribution of groups at different social distances across controls and citations.
|
|
Citations (n=1987) |
Controls (n=4224) |
|
Author-inventor
(dd0) |
11% |
0% |
|
Past
collaborator (dd1) |
2% |
1% |
|
Common
collaborator (dd2) |
3% |
1% |
|
Collaborator
with ties (dd3) |
5% |
2% |
|
Six
degrees (dd6) |
11% |
10% |
|
Indirect
social link (ddc) |
4% |
5% |
|
No
social link (dnc) |
64% |
81% |
Table 20: Interplay
between geography and social proximity.
|
|
Average distance (km) |
|
Author-inventor
(dd0) |
248 |
|
Past
collaborator (dd1) |
1321 |
|
Common
collaborator (dd2) |
2736 |
|
Collaborator
with ties (dd3) |
3103 |
|
Six
degrees (dd6) |
3981 |
|
Indirect
social link (ddc) |
4483 |
|
No
social link (dnc) |
4814 |
7.0 Wrapping-up
The
dynamics of science, technology and growth are naturally intertwined. To
understand the mechanisms behind each of these interactions, we have to focus
on the innovation process and the way new knowledge diffuses throughout the
economy.
Patent citation analysis has already
offered relevant insights into the nature and localization of knowledge flows.
The aim of this research was to exploit a relatively
unexplored source of data: scientific articles cited in patents from the semiconductor industry. The outcomes show that NPL citations are a promising
and reliable research trajectory.
A first major result we obtained was
the snapshot of author/inventors: in an area of close interaction between “Open
Science” and “Proprietary technology”, we found that the exchange of
information actually takes place at the level of these individual researchers.
As knowledge workers, author/inventors assume a central role in the funnelling
of information between groups with diverging interests and incentives. Acting
as a gatekeepers, they are able to reconcile the different “evolutionary
logics” of S&T, since they know at least part of both worlds. Without them,
the diffusion of innovation would be slower and less efficient.
Social
network analysis has proved to be a decisive tool, since it allows to look into
the black box of S&T and discover regularities and exceptions in the
“community of experts”. Identifying influential people in a network of
discovery and invention can provide some valuable policy hints: remaining
plugged-in and gaining access to scarce resources is as crucial a task for
firms as for countries.
In this context, the mobility of
inventors and scientific authors between regions, countries and organizations
is a fundamental shift factor for knowledge spillovers. Local institutions that
favour cross-fertilization between S&T or a developed labour market actually
support innovation.
A natural
extension of our current results, would be to consider institutional
affiliations of authors-inventors: which
types of firms or universities have an active role in the diffusion of
knowledge between S&T? What kind of organizational structures favour
innovation and the exploitation of scientific research?
Inside these organizations, are
authors-inventors able to outclass their colleagues or does their role as gatekeepers
absorb most of their resources? Is higher centrality the result of
opportunistic behaviour or of a serious involvement in applied research?
Furthermore,
our analysis has shown that neither scientific publication nor geographic
co-location alone are sufficient for diffusion: tacit and excludable knowledge travels through direct and indirect
social contact chains using the common vocabulary that characterizes each
epistemic community. Social networks determine most of the observed patterns of
knowledge diffusion and help overcome the negative effect of geographic
distance.
Geography, as emphasized by an
extensive literature, clearly matters, but institutions, by creating the right
incentives for inventors and scientists, are able to bypass its natural constraints.
Results also indicate that the
The interpolation of science and technology
is a complex task, but also a very promising one.
References
Acs Z.J., Audretsch D.B., Feldman M.P. (1994), “R&D spillovers and recipient firm size”,
Review of Economics and Statistics, 76(2): 336-340.
Acs, Z.J. and
Audretsch,
D.B. (1990), “Innovation and Small Firms”,
Aghion, P. & Howitt, P. (1990) "A Model Of
Growth Through Creative Destruction," DELTA Working Papers 90-12, DELTA
(Ecole normale supérieure)
Agrawal A.K., Cockburn I.M., McHale J.
(2003), “Gone But Not Forgotten: Labor Flows, Knowledge Spillovers, and
Enduring Social Capital”, NBER Working Paper 9950
Almeida P., Kogut B. (1999), “Localisation of
knowledge and the mobility of engineers in regional networks”, Management
Science, 45(7): 905-917.
Almeida, P., Kogut, B. (1997) “The Exploration of
Technological Diversity and Geographic Localization in Innovation: Start-Up
Firms in the Semiconductor Industry”, Small Business Economics, Volume 9,
Number 1, pp. 21-31(12)
Arora, A. and Gambardella, A., (1994). "The changing technology of
technological change: general and abstract knowledge and the division of
innovative labour," Research Policy, Elsevier, vol. 23(5), pages 523-532,
September.
Arrow K.J. (1962),“Economic welfare and the allocation of
resources for invention”, in R.R. Nelson (ed.), The Rate and Direction for
Inventive Activity. Economic and Social Factors,
Balconi M., Breschi S., Lissoni F. (2004), “Networks of
inventors and the location of academic research: An exploration of Italian
data”, Research Policy 33(1): 127-45.
Breschi S., Lissoni F. (2004), “Knowledge networks
from patent data: Methodological issues and research targets”, Cespri WP n.
150.
Breschi S., Lissoni F. (2003), “Mobility and social
networks: Localised knowledge spillovers revisited”, Cespri WP n. 142.
Breschi S., Lissoni F. (2001), “Localised knowledge
spillovers and local innovation systems: A critical survey”, Industrial and
Corporate Change, 10(4), 975-1005.
Bresnahan T.F, Gambardella A., Saxenian, A.
(2001), "'Old Economy' Inputs for 'New Economy' Outcomes: Cluster
Formation in the New
Bush, V. (1945) “Science The Endless Frontier”, a Report to the President by the