--- Upon commencing on Wednesday, October 3,
2007 at 11:25 a.m.
MR.
VICKERY: Welcome everybody. I assume everybody has come back from coffee.
My
name is Graham Vickery. I work for the
OECD Secretariat.
If
you have got any questions, any problems, any issues to raise about
understanding the emcee who is in the other room, please contact us.
I
would like now to hand over this session to Walter Stewart, who is going to be
our very able Chair, and I will give some small amount of directions about a
meeting we are going to be having over lunchtime at the end of this session.
So
Walter, please.
MR.
STEWART: Good morning, ladies and
gentlemen. Bonjour, mesdames et
messieurs.
C’est
avec plaisir que je vous accueille à ce volet Recherche 2.0 : La
cyberscience et les nouveaux modes d’interaction dans la collectivité
scientifique.
Nous
avons quatre conférenciers/conférencières.
Je vais vous les introduire par nom et titre seulement.
Vous
pouvez lire toutes les biographies sur le site Web de ce Congrès ou dans ce
petit livre qui est disponible sur la table dans la salle principale.
Je
vous demande de noter la description du volet sur l’aperçu que vous avez reçu
quand vous avez enregistré ce matin.
Il
y a trois questions que nous avons demandées à notre intervenant de s’adresser.
Après
les présentations, il y aura une occasion pour vous, vous qui sont dans la
salle et aussi vous qui participez sur le Web, de demander vos questions.
I would also
draw your attention to the translation devices ‑‑ perhaps I should
have done that first -- draw your attention to the translation devices.
We
may well have -- the presentations will be in English. We may well have questions though in French
and so please feel free to ask your questions in whatever language suits you.
With
that, I am going to introduce our first panellist.
Our
first panellist is Andrew Herbert from Microsoft Research in
MR.
HERBERT: Thank you.
Hopefully,
my slides will appear momentarily in front of me.
So
what I want to take as my theme is perhaps a little more broadly than just the
Web itself but actually it is a look at the impact that computing and computer
science as a whole is having on the other sciences and some of the consequences
of that.
So
the key, I think, is that the sciences ‑‑ and I use that quite
broadly, physical sciences, life sciences, engineering ‑‑ are all
increasingly relying on advanced ideas from computer science essentially to
reduce the time to scientific insight.
In
the past, perhaps we thought of science as being divided into theoretical
science, primarily the domain of mathematics, and experimental science, the
world of the test tube and the accelerator.
In
between those two now sits a third strand of science, which is computational
science, that is, the world of simulation, data mining, visualization, pattern
recognition, machine learning and many other techniques, and the advances in
those techniques are primarily coming from the computer science community.
And
indeed, I think one of the questions that needs to be addressed is how are we
going to produce people in the scientific community who have the right balance
of skills between the core scientific disciplines ‑‑ the
biologists, the chemists ‑‑ and who are able to work with the most
recent computer science techniques and indeed contribute to and advance those?
This
is a perennial problem. Many scientists
learn their computing in the first or the second year of their bachelor’s
programs. They lock into the operating
system and the programming language of that time and know a little bit about
bashing data in files. Using advanced
database techniques, using computational grids and so forth are all very new
and exciting things for them.
So
I think the key points are that the computers are enabling scientists to share
massive amounts of data. For some of the
sciences it is the massive amount of data.
If
you think of physics, when the Large Hadron Collider at CERN comes on stream,
that is going to be generating petabytes of data in which the physicists are
searching for very rare events.
In other disciplines like the life sciences, actually
it is lots of very small databases that have to be that have to be connected
together as we are trying to join up different parts of biological
knowledge: two different problems but
both equally complex and both dependent on networking large amounts of
resource.
I
think you will hear more from the other speakers about the way in which Web 2.0
technologies are being used by scientists to create virtual organizations,
linking scientists in different laboratories together, combining their
resources, whether they are competition resources, data resources, access to
facilities, in many new ways.
As
a consequence of that, it's revolutionizing the way we think about scientific
publication. If scientific work is
ongoing and being conducted through blogs and online experiments, online
meetings, why do we need conferences, why do we need printed journals? There are interesting questions in that online
world about the provenance of data, the tracking of data, the archiving of it,
ownership and very deep issues.
So
for me I think the computing ingredients that come into the picture through
technologies like sensor networks, we can bring real world data into the
theoretical models. We can link those to
our computer models, simulations and so forth.
We can store huge amounts of persistent distributed data and so we can
bring the experiment, the models and the data together simultaneously by using
computer techniques to alternate scientific workflows. Using the technologies of data mining, which
have come out of the world of the enterprise and business, finance and so
forth, we are able to perhaps even think about automating some of the aspects
of generating scientific insights.
I
don't think scientists will go away.
Computers never succeeded in making people go away. But what computers have done is let people
focus on their core skills and competencies and the computers have done the
drudgery behind the scenes for us. They
have made us more productive and I think the same is happening in science.
What
people are good at is interpretation and insight.
I
want to give you one example of an area of research that colleagues in my
laboratory in
We
have been working in an area called computational systems biology, and this
essentially is looking at the biology of cells and how they interact in organisms. The approach has been to treat cells as if
they were abstract computers, which is where the computer scientist gets
interested. And as a computer scientist,
we have developed many tools to help us model computers, model complex software
systems to do things like prove software is correct, to understand how one
computer relates to another, to decide if particular models of computation are
equivalent.
And
we are now starting to transfer those ideas to help the biologists who have
many of the same problems. And indeed to
a computer person -- and my background is in hardware and operating systems --
when a biologist explains how a cell works, it sounds as though it's three
little abstract machines connected together.
Cells
are all about membranes dividing the various parts of the cells, and cells
themselves. The membranes are about
confinement, storing things, and indeed the bulk transport of things around the
organism. Those are computing words.
There
is the protein machine driven by the amino acids, which is where metabolism
takes place. Food is consumed and turned
into energy. Things are propelled around
the system. Signals are processed. That very much feels like a processing
element, if you like.
And
then there are the genes, whose role in biology is clearly the regulatory
system to keep all the pieces working together, and those things signal to each
other. The genes are perhaps like
programs.
So
as a computer scientist, there are many of our words that we can bring to
describe the biological system of the cell and many of our techniques and
modelling complex systems that we can perhaps offer to the biologists to help
them develop fuller models of what is going on in their field.
The
biologists have a challenge. Physics and
chemistry advance because of mathematics.
Once you could model through applied mathematics, the world of physics,
we can make predictions in the mathematical models and then go and verify the
experiments.
Chemistry
made huge strides when models of atomic structures, of molecular structures,
could be represented mathematically once we had the equation and other
mathematical tools.
Biologists
don't have that mathematical framework.
They are still fundamentally doing zoology and botany; collecting
things, squashing them, sticking them in albums, trying to deduce conclusions
by looking at what they have discovered.
And they have no way of writing it down apart from little cartoons and
writing statements in simple English.
So
perhaps some of the formality and notations that we have developed in computer
science can help them.
That's
the track down which some of my colleagues have been going. We have been taking ideas from abstract
models of software systems where we have essentially mathematical notations for
describing dynamic systems and how they interact.
The
particular one that we use is called the pie calculus. It's the notation that people interested in
theoretical computer science use to explain different programs to each other
and to understand what is really going on in those languages. When the vendors are arguing about Java being
better than C-Sharp or the other way around, the theoretical guys can say no,
that's all just syntax. These are the
fundamental ideas and the fundamental explanations.
In
those mathematical models, we often start by drawing simple graphical pictures
as a way of capturing knowledge. What we
have been able to do is give the biologists a formal graphical notation. There are some examples on the screen. Simple biological entities, arrows
representing ways in which they send outputs to each other or respond to
signals. And then in the names of those
things, we can capture their formal behaviour.
Once
we have a formal graphical way of describing things, which is the information
capture part of the process, we can then transform that into something which
looks a little bit like programming language.
And once we have that programming language, if it's something which is
indeed truly compositional, we can start describing circuits or organisms, in
the biological word, by combining those libraries together.
And
then with those programs, perhaps we can run simulations looking farther into
the future. Perhaps we can turn those
programs into essentially manufacturing steps to build organisms entirely as instructions and
look at their behaviour.
So
some particular work that we have been involved in has been looking to see if
we can use those models to actually explain real pieces of important
biology. One of our early results has
been looking at some parts of the human immune system.
On
the right-hand side of the screen, you can see the kind of pictures you would
see in a standard biology textbook describing the process by which a receptor
looks for hostile cells in the system, traps them, absorbs them, breaks them up
into components and then ejects them from the system.
Today
biologists do that by essentially drawing cartoons. The pictures are too small to take you
through all the details. But that's
essentially the level of formality.
How
can that picture explain to you the general concepts?
There
is no information in there about how long it takes that reaction to occur. There is no information in there to let you
think about how you might generalize that particular mechanism to tackle other
kinds of receptors, and so forth.
What
we have done, working with the biologists and listening to them and their
explanations as they unpick some of the biochemistry, is to turn those cartoon
diagrams into the things in the middle, which are the graphical representations
of the various parts of that immune system, and then from those generating the
programs, if you like, in our formal notation.
And then with those programs we can start to run simulations.
The
graphs on the left are the computer simulations showing how the concentrations
of various of the biochemical parts of the system change as the reactions take
place.
We
have made very good progress in that. We
can simulate biological systems as simulations match the behaviour of the real system.
There are a number of cases where we have actually helped the biologists
explain what are some of the key signals that are actually driving the process. Biological systems are immensely complex in
understanding which of the key elements is a particular difficulty for them.
So
we have helped them understand their science better. And indeed we have made some early steps,
when others in the field are doing similar things, in building custom
biological structures at the gene and cell level that have a behaviour that we
want to impose.
We
have build the biological equivalent to the computer multi-vibrator, the thing
that flashes off and on, and we have built a biological system that can do
that.
So
those are exciting early stage results.
The
question is: Where is all this taking us
if we look into a long way forwards?
First
of all, I think there are interesting opportunities here in modelling the
effect that drugs might have on the personal gene machine, and so
pharmaceutical companies are very interested in this line of research.
If
you combine that with some of the work that is going on in biosensing in the
field, that gives us perhaps the ability to be monitoring our own personal gene
machine in real time.
With some of the work going on
nano-materials, particularly nano-materials based on engineering with DNA, we
have the possibility of modelling the affect of drugs on our system, modelling
our own system, creating drugs that are optimized for that individual system
and that leads us to a vision of personal healthcare, something a guy called
Leroy Hood has talked about a great deal.
Healthcare that is
predictive, and so we are responding to things before they become a problem,
that is preventative, that is removing bad things from the system, that is
pre-emptive, striking before it is too late and indeed which you as a person
may participate. Because if all this is
happening with software technologies, the opportunity for you to be involved in
the negotiating of a doctor is very important.
So that is one direction it might go.
Another is thinking
about engineering bioenergy systems and predicting a model in those. So that is just one area where I think
computing is having a huge effect on a particular science. There are several others and that I have been
closely interested in and talking to people about, understanding the human
brain. The human brain is not a
computer, there is no notion of software in the human brain. But certainly at the level of neurons and
synapses a lot of what we understand from machine learning or patent matching
seems to be what is going on and so that is helping us with some of our
interpretation.
Using computers to
model global epidemics. As a Brit, we
are quite concerned about this. We have
got two diseases rampaging our country at the moment, Bluetongue and Foot and
Mouth. We kind of stopped worrying about
cells and that is in someone else’s backyard.
An indeed the work
that is going on, and with the physicists, trying to understand the origins,
workings and indeed the ultimate demise of the universe, you can’t experiment
with the beginning or the end of the universe, that has got to be done with
computers.
So I have tried I
think to open up the way in which computing is changing the way science is
done, accelerating the pace of science. If you want to follow-up in more detail
on some of these things where I have wet your appetite, with colleagues we have
published a report called 20/20 Science, that is trying to address many of
those things. You can download it and
you are very welcome to do so and I would be happy to have a further discussion
about it.
Thank you.
MR. STEWART: Thank you, Andrew.
I would commend that
report to you, it is truly excellent and not to be missed if you haven’t read
it.
I would now like to
introduce
One thing I am going
to talk about in my brief a few minutes here is how the impact of these web2
technologies you will be hearing about not only will affect how scientists do
science but how it will allow a greater community to participate in those
research and scientific activities. I
think that is going to have a very profound affect on scientific policy and other
educational policies and so forth.
So you have been
hearing all the talks, the web2.20 tools mashups, blogs, wikis service from the
architectures are transforming all different walks of life. You have been hearing about Business2.0
enterprise, Battlefield2.0, the U.S. Military has a major program and using
these technologies in a variety of fields.
Microsoft has just introduced a program called Telco2.0, all sorts of
mashups for telephone companies and network services and so forth.
So these same tools
which are transforming all sort of walks of life are, as you have heard from
our speakers and I am sure following speakers, are going to transform science
and research for scientists and researchers themselves, but also for a much
larger community.
And this has been
labelled a citizen’s science, it will allow a faster transfer of knowledge, you
know, as opposed to waiting for the papers and journals, we are seeing now the
transfer of science and knowledge so much quicker coming from academia and the
research community through blogs and wikis and so forth. And now that is the major medium now for new
knowledge and new information that has been past around the world.
And it is also
democratizing science. Increasingly as
we see scientific data being digitized, therefore it becomes immediately more
accessible, assuming you solve the DRM issues.
And so it is not only accessible to all the scientists, but it is also
accessible to members of the public. And
the public can then take the same data and run their own models and do their
own analysis, and this is going to have a significant policy impact.
Let us just give you
one simple example. You may have heard
of a few weeks ago, a blogger had taken some of this C02 data and discovered
that in fact the warmest period in earth’s history in the last 100,000 years
was not the last 10 years, which is the common assumption, but actually
happened in the 1930s, because he had done some analysis and comparative
analysis and corrections and so forth.
Now, this has been
debated but this is a good example of how one individual, one blogger can get
access to this data and do a different analysis interpretation which, of
course, has significant policy implications and so on and so forth.
But now there is all
sorts of activities by students and members of the public involved in doing
these types of things in astronomy, in high energy physics, climate science and
all sorts of things. And Intel, for
example, just released a product called Mashups for the Masses, which is a set
of tools that really enhances capability for individuals to grab datasets from
different areas, mash them together and create new results and new
interpretations of the original datasets.
So you have heard of
mashups mostly coming from the Google world of, you know, taking geographical
data and mashing it up with real estate data and violence and so for so you can
get maps showing where the most houses are, where are the lowest crime rates
and so forth. But now people are using
these mashups to merge together different data sets from all sorts of different
fields.
And so also in the
past this computational science was largely restricted to those who had big
high performance computers and the big databases and storage facilities to do
this type of analysis. But with tools
like EC2 and S2 from Amazon and other companies those types of resource now are
available to the average user or to students at a very low cost so they can do
this type of computational science themselves, take these same datasets, run
large models using either peer to peer networks and so forth or the newest
tools like from Amazon and other service providers.
So the key elements
of course for precipitative web free science are the distributed databases,
instrumentation and computational facilities, extensive virtualization. What we are seeing is these ad hoc
developments of what we call virtual organizations built around these types of
structures, not only between scientists themselves but between communities of
users interested in these very scientific activities and using workflows and
mashups and so on and so forth.
And so what is
happening is this real democratization of science being made available by these
web 2.0 tools. Bioinformatics community,
for example, there is a group of researchers and members of the public who are
developing a whole bunch of mashup tools and service architectures using Amazon
S2 and EC2 to provide non-researchers tools to do a lot of these bioinformatics
analyses, genome analyses and so on and so forth. And so these are types of things that are
starting to happen out there at a grass-roots sense rather than from a formal
research environment.
So here is some very
quick examples of these types of activities.
A big one is of course crowd-sourcing.
There is a group of researchers now who are using crowd-sourcing tools
to allow the large community of humans to really identify new research
techniques and new scientific evidence.
So, for example,
there was a gold company here in Canada put out a prize, sent out to the large
internet saying our geologists think the gold vein is here, we invite the
community to analyze that same geophysical data and come up with their own
interpretation where the best gold veins are.
And surprise, the community came up with the better answers than the
professional geophysicists.
And so now -- I
wanted to bring out my computer except it broke down -- there is now a research
community in the
Another good example
is this Project Neptune many of you may have heard of. This is was a joint U.S./Canadian
project. Fortunately,
And of course, you
can’t send a researcher to the ocean floor.
This is all going to be remotely accessible. And this data will not only be accessible to
the scientists who participate in this project, but it is designed from day one
that this data will also be accessible to students and to the public at large.
And this talk of virtual aquariums, we already put down high definition TV cameras,
you can watch these smokers, you can see the various biota that exist around
that and it is available to anybody on the website right now to look at this
type of activity.
Another good example
is from
The Kyoto Agreement
is dead now, of course, but
But,
again, this data is all going to be made available on the public, so the public
can also interpret this data. So, it’s
just not going to be some high priests of science who say yes or no, but also
the same information will be available to any community to re-interpret and
re-examine this type of data.
Another
great example is the ALTA Cosmic Ray project.
This was started at the
Another
on in
And,
of course, the Sloan Digital SkySurvey.
This is the late Jim Grey who was very instrumental behind this. Again, this is a site of astronomical
data. Many of these services were built
by students. Again, it is available to
scientists and students and the public at large.
Now,
because of this type of service, most of the large supernovae are being
discovered by members of the public as opposed to professional
astronomers. So, by using various
techniques of scanning all the images and scanning the data, it’s the public
who are making these discoveries as opposed to professional astronomers.
So,
that’s just -- the one last one is the Faulkes telescope. This is an eccentric billionaire in
So,
that, I hope will stimulate some of your thinking of the potential of what
these Web 2.0 and participative technologies will enable, at least in the
scientific community, as well as many other walks of life.
Thank
you.
MR.
STEWART: Thank you, Bill.
Le prochain intervenant sera Diana Rhoten, directrice de
programme, Office de Cyberinfrastructure, National Science Foundation.
Diana.
MS
RHOTEN: So I’m going to talk a little
bit about some of the learning and knowledge production affordances of Web 2.0
for science. I just wanted to start with
this clip.
--- Video presentation
MS
RHOTEN: So in addition to providing some interesting statistics about some of
the usage of Web 2.0 services, particularly by kids, it is important to note
that that clip which is available on-line was created by a high school teacher
and then it was edited by myself using tools available on-line and then using
creative comments licensing. I can
present it to you mashed up, mixed up, re-mixed by me. So, just to use some of the tools of Web 2.0
for the purposes of the presentation.
So,
what’s the calculation for science?
Technological capacity is increasing; we’ve all heard that.
If
you multiply that by the fact that the generation of scientists coming into
science are coming through a digital society we have what’s potentially Science
2.0.
We
need to think about the next generation implications for science by thinking
about what the next generation expects from its use on-line and its
expectations about use.
What
are some of the characteristics of Science 2.0?
This is courtesy of Ian Foster whom some of you may know. It’s been adapted by me. We see that there’s changes in the nature and
the size of scientific data. I’m not
going to go through each one of these.
But changes in the unit and venue of scientific communication.
As
Bill mentioned wikis, blogs, project websites, become very much the outlet for
both scientific data, scientific finding, scientific publications. But, we’re also moving beyond just
publications to simulations, visualizations, creating new databases. These are all new products that are coming
out of Science 2.0.
It’s
also changes in the location and the structure of the social aspect of science. Science used to be done in co-located
environments. Research centre was very
much -- at least in the
We’re
now really looking at distributed science.
We have scientists sitting in
We
also see venues of scientific interaction changing. We’ve gone from community co’s (inaudible) to
science gateways, campus and national grids, to science on the Internet.
We’ve
talked about science as a computation.
From computational science to science as computation.
We
also see this bleeding into all fields.
We’re moving from just the physical sciences to all the sciences,
including the social sciences as well as humanities.
So,
in Science 2.0 we really are looking at distributed knowledge production and
learning. And I’ve created here, this is
a chart borrowed from Dan Atkins. You
can see we’ve gone from same time-same place, to same time-different place,
different time-different place. Much of
the aspect of Science 2.0 is happening in a virtual environment.
I
wanted to talk to you a little bit about some of the virtual environments or
virtual exemplars of Science 2.0 that come out of NSF or are supported in part
by NSF.
So BIRN is the Biomedical Infomatics Research
Network. This is a geographically
distributed virtual community that shares resources, including instruments to
examine medical images and create diagnoses.
This is an example of a closed virtual environment in the sense that
this is really still left to, to use both terms, the high priestess of science.
So it is a distributed network
but it's a very high level science. It's
limited to professional researchers.
Whereas if you look down at nano
hub, the science gateway, this was created in 2001. This is a good computing base but web-enabled
portal that enables anyone to access scientific tools, do research
demonstration and even run simulations.
So just by getting a user name, a
log-in and authentication code, anyone, including myself, including yourselves,
can run simulations around anything related to nano technology. They have a variety of different workshops,
lectures, curricula and simulation tools available.
Just a few stats on them.
In the last year they have had
over 25,000 users from 172 countries.
They have had 5,730 users run 230,000 simulations. So you are seeing a real draw to this portal.
Eighty research publications
actually now cite nano hub.
Bill mentioned that what I have
there, the Sloan Virtual Observatory, the Sloan Digital Sky Survey, this is an
example, as he mentioned, of citizen science and crowd sourcing. Just to give you some stats on the use there,
200 million Web hits in the last five years; 930,000 distinct users versus
10,000 astronomers.
So again to the point of citizen
science and the democratization of science, you really see the general public
being drawn to these opportunities and committing their time and their energies
to participate in science.
In the bottom right-hand corner,
we have the example of Second Life. You
heard Jim talk about Second Life this morning.
Science is coming to Second Life
as education is coming to Second Life. There
are learning affordances within Second Life, as well as communication
affordances, in the sense that you can actually share, create objects, learn
about objects, recreate objects and manipulate objects in a 3D environment.
As of my last check, there are
approximately 160 universities now in Second Life.
Their activities range from
giving courses and running conferences or lectures to trying to actually create
new objects that can teach science in totally new ways.
There is a new area within Second
Life also dedicated to science, called SciLands. Nature Publishing has an island within
science. So we are seeing a flood of
activity there, which is interesting.