Ottawa, ON

 --- Upon commencing on Wednesday, October 3, 2007 at 11:25 a.m.

               MR. VICKERY:  Welcome everybody.  I assume everybody has come back from coffee.

               My name is Graham Vickery.  I work for the OECD Secretariat.

               If you have got any questions, any problems, any issues to raise about understanding the emcee who is in the other room, please contact us.

               I would like now to hand over this session to Walter Stewart, who is going to be our very able Chair, and I will give some small amount of directions about a meeting we are going to be having over lunchtime at the end of this session.

               So Walter, please.

               MR. STEWART:  Good morning, ladies and gentlemen.  Bonjour, mesdames et messieurs.

               C’est avec plaisir que je vous accueille à ce volet Recherche 2.0 : La cyberscience et les nouveaux modes d’interaction dans la collectivité scientifique.

               Nous avons quatre conférenciers/conférencières. Je vais vous les introduire par nom et titre seulement. 

               Vous pouvez lire toutes les biographies sur le site Web de ce Congrès ou dans ce petit livre qui est disponible sur la table dans la salle principale.

               Je vous demande de noter la description du volet sur l’aperçu que vous avez reçu quand vous avez enregistré ce matin.

               Il y a trois questions que nous avons demandées à notre intervenant de s’adresser.

               Après les présentations, il y aura une occasion pour vous, vous qui sont dans la salle et aussi vous qui participez sur le Web, de demander vos questions.

               I would also draw your attention to the translation devices ‑‑ perhaps I should have done that first -- draw your attention to the translation devices. 

               We may well have -- the presentations will be in English.  We may well have questions though in French and so please feel free to ask your questions in whatever language suits you.

               With that, I am going to introduce our first panellist.

               Our first panellist is Andrew Herbert from Microsoft Research in Cambridge, U.K.

               MR. HERBERT:  Thank you.

               Hopefully, my slides will appear momentarily in front of me.

               So what I want to take as my theme is perhaps a little more broadly than just the Web itself but actually it is a look at the impact that computing and computer science as a whole is having on the other sciences and some of the consequences of that.

               So the key, I think, is that the sciences ‑‑ and I use that quite broadly, physical sciences, life sciences, engineering ‑‑ are all increasingly relying on advanced ideas from computer science essentially to reduce the time to scientific insight. 

               In the past, perhaps we thought of science as being divided into theoretical science, primarily the domain of mathematics, and experimental science, the world of the test tube and the accelerator.

               In between those two now sits a third strand of science, which is computational science, that is, the world of simulation, data mining, visualization, pattern recognition, machine learning and many other techniques, and the advances in those techniques are primarily coming from the computer science community.

               And indeed, I think one of the questions that needs to be addressed is how are we going to produce people in the scientific community who have the right balance of skills between the core scientific disciplines ‑‑ the biologists, the chemists ‑‑ and who are able to work with the most recent computer science techniques and indeed contribute to and advance those?

               This is a perennial problem.  Many scientists learn their computing in the first or the second year of their bachelor’s programs.  They lock into the operating system and the programming language of that time and know a little bit about bashing data in files.  Using advanced database techniques, using computational grids and so forth are all very new and exciting things for them.

               So I think the key points are that the computers are enabling scientists to share massive amounts of data.  For some of the sciences it is the massive amount of data.

               If you think of physics, when the Large Hadron Collider at CERN comes on stream, that is going to be generating petabytes of data in which the physicists are searching for very rare events.

               In other disciplines like the life sciences, actually it is lots of very small databases that have to be that have to be connected together as we are trying to join up different parts of biological knowledge:  two different problems but both equally complex and both dependent on networking large amounts of resource.

               I think you will hear more from the other speakers about the way in which Web 2.0 technologies are being used by scientists to create virtual organizations, linking scientists in different laboratories together, combining their resources, whether they are competition resources, data resources, access to facilities, in many new ways.

               As a consequence of that, it's revolutionizing the way we think about scientific publication.  If scientific work is ongoing and being conducted through blogs and online experiments, online meetings, why do we need conferences, why do we need printed journals?  There are interesting questions in that online world about the provenance of data, the tracking of data, the archiving of it, ownership and very deep issues.

               So for me I think the computing ingredients that come into the picture through technologies like sensor networks, we can bring real world data into the theoretical models.  We can link those to our computer models, simulations and so forth. We can store huge amounts of persistent distributed data and so we can bring the experiment, the models and the data together simultaneously by using computer techniques to alternate scientific workflows.  Using the technologies of data mining, which have come out of the world of the enterprise and business, finance and so forth, we are able to perhaps even think about automating some of the aspects of generating scientific insights.

               I don't think scientists will go away. Computers never succeeded in making people go away.  But what computers have done is let people focus on their core skills and competencies and the computers have done the drudgery behind the scenes for us.  They have made us more productive and I think the same is happening in science.

               What people are good at is interpretation and insight.

               I want to give you one example of an area of research that colleagues in my laboratory in Cambridge are actively engaged in, which is a fascinating crossover between computer science and biology.

               We have been working in an area called computational systems biology, and this essentially is looking at the biology of cells and how they interact in organisms.  The approach has been to treat cells as if they were abstract computers, which is where the computer scientist gets interested.  And as a computer scientist, we have developed many tools to help us model computers, model complex software systems to do things like prove software is correct, to understand how one computer relates to another, to decide if particular models of computation are equivalent.

               And we are now starting to transfer those ideas to help the biologists who have many of the same problems.  And indeed to a computer person -- and my background is in hardware and operating systems -- when a biologist explains how a cell works, it sounds as though it's three little abstract machines connected together.

               Cells are all about membranes dividing the various parts of the cells, and cells themselves.  The membranes are about confinement, storing things, and indeed the bulk transport of things around the organism.  Those are computing words.

               There is the protein machine driven by the amino acids, which is where metabolism takes place.  Food is consumed and turned into energy.  Things are propelled around the system.  Signals are processed.  That very much feels like a processing element, if you like.

               And then there are the genes, whose role in biology is clearly the regulatory system to keep all the pieces working together, and those things signal to each other.  The genes are perhaps like programs.

               So as a computer scientist, there are many of our words that we can bring to describe the biological system of the cell and many of our techniques and modelling complex systems that we can perhaps offer to the biologists to help them develop fuller models of what is going on in their field.

               The biologists have a challenge.  Physics and chemistry advance because of mathematics. Once you could model through applied mathematics, the world of physics, we can make predictions in the mathematical models and then go and verify the experiments.

               Chemistry made huge strides when models of atomic structures, of molecular structures, could be represented mathematically once we had the equation and other mathematical tools.

               Biologists don't have that mathematical framework. They are still fundamentally doing zoology and botany; collecting things, squashing them, sticking them in albums, trying to deduce conclusions by looking at what they have discovered. And they have no way of writing it down apart from little cartoons and writing statements in simple English.

               So perhaps some of the formality and notations that we have developed in computer science can help them.

               That's the track down which some of my colleagues have been going.  We have been taking ideas from abstract models of software systems where we have essentially mathematical notations for describing dynamic systems and how they interact.

               The particular one that we use is called the pie calculus.  It's the notation that people interested in theoretical computer science use to explain different programs to each other and to understand what is really going on in those languages.  When the vendors are arguing about Java being better than C-Sharp or the other way around, the theoretical guys can say no, that's all just syntax.  These are the fundamental ideas and the fundamental explanations.

               In those mathematical models, we often start by drawing simple graphical pictures as a way of capturing knowledge.  What we have been able to do is give the biologists a formal graphical notation.  There are some examples on the screen.  Simple biological entities, arrows representing ways in which they send outputs to each other or respond to signals.  And then in the names of those things, we can capture their formal behaviour.

               Once we have a formal graphical way of describing things, which is the information capture part of the process, we can then transform that into something which looks a little bit like programming language. And once we have that programming language, if it's something which is indeed truly compositional, we can start describing circuits or organisms, in the biological word, by combining those libraries together.

               And then with those programs, perhaps we can run simulations looking farther into the future.  Perhaps we can turn those programs into essentially manufacturing steps to  build organisms entirely as instructions and look at their behaviour.

               So some particular work that we have been involved in has been looking to see if we can use those models to actually explain real pieces of important biology.  One of our early results has been looking at some parts of the human immune system.

               On the right-hand side of the screen, you can see the kind of pictures you would see in a standard biology textbook describing the process by which a receptor looks for hostile cells in the system, traps them, absorbs them, breaks them up into components and then ejects them from the system.

               Today biologists do that by essentially drawing cartoons.  The pictures are too small to take you through all the details.  But that's essentially the level of formality.

               How can that picture explain to you the general concepts?

               There is no information in there about how long it takes that reaction to occur.  There is no information in there to let you think about how you might generalize that particular mechanism to tackle other kinds of receptors, and so forth.

               What we have done, working with the biologists and listening to them and their explanations as they unpick some of the biochemistry, is to turn those cartoon diagrams into the things in the middle, which are the graphical representations of the various parts of that immune system, and then from those generating the programs, if you like, in our formal notation. And then with those programs we can start to run simulations.

               The graphs on the left are the computer simulations showing how the concentrations of various of the biochemical parts of the system change as the reactions take place.

               We have made very good progress in that.  We can simulate biological systems as simulations match the behaviour of the real  system. There are a number of cases where we have actually helped the biologists explain what are some of the key signals that are actually driving the process.  Biological systems are immensely complex in understanding which of the key elements is a particular difficulty for them.

               So we have helped them understand their science better.  And indeed we have made some early steps, when others in the field are doing similar things, in building custom biological structures at the gene and cell level that have a behaviour that we want to impose.

               We have build the biological equivalent to the computer multi-vibrator, the thing that flashes off and on, and we have built a biological system that can do that.

               So those are exciting early stage results.

               The question is:  Where is all this taking us if we look into a long way forwards?

               First of all, I think there are interesting opportunities here in modelling the effect that drugs might have on the personal gene machine, and so pharmaceutical companies are very interested in this line of research.

               If you combine that with some of the work that is going on in biosensing in the field, that gives us perhaps the ability to be monitoring our own personal gene machine in real time.

               With some of the work going on nano-materials, particularly nano-materials based on engineering with DNA, we have the possibility of modelling the affect of drugs on our system, modelling our own system, creating drugs that are optimized for that individual system and that leads us to a vision of personal healthcare, something a guy called Leroy Hood has talked about a great deal.

               Healthcare that is predictive, and so we are responding to things before they become a problem, that is preventative, that is removing bad things from the system, that is pre-emptive, striking before it is too late and indeed which you as a person may participate.  Because if all this is happening with software technologies, the opportunity for you to be involved in the negotiating of a doctor is very important. So that is one direction it might go.

               Another is thinking about engineering bioenergy systems and predicting a model in those.  So that is just one area where I think computing is having a huge effect on a particular science.  There are several others and that I have been closely interested in and talking to people about, understanding the human brain.  The human brain is not a computer, there is no notion of software in the human brain.  But certainly at the level of neurons and synapses a lot of what we understand from machine learning or patent matching seems to be what is going on and so that is helping us with some of our interpretation.

               Using computers to model global epidemics.  As a Brit, we are quite concerned about this.  We have got two diseases rampaging our country at the moment, Bluetongue and Foot and Mouth.  We kind of stopped worrying about cells and that is in someone else’s backyard.

               An indeed the work that is going on, and with the physicists, trying to understand the origins, workings and indeed the ultimate demise of the universe, you can’t experiment with the beginning or the end of the universe, that has got to be done with computers.

               So I have tried I think to open up the way in which computing is changing the way science is done, accelerating the pace of science. If you want to follow-up in more detail on some of these things where I have wet your appetite, with colleagues we have published a report called 20/20 Science, that is trying to address many of those things.  You can download it and you are very welcome to do so and I would be happy to have a further discussion about it.

               Thank you.

               MR. STEWART:  Thank you, Andrew.

               I would commend that report to you, it is truly excellent and not to be missed if you haven’t read it.

               I would now like to introduce Bill St. Arnaud, who is the Senior Director of Advanced Networks for CANARIE.

               MR. ST. ARNAUD:  Thank you, Walter.

               One thing I am going to talk about in my brief a few minutes here is how the impact of these web2 technologies you will be hearing about not only will affect how scientists do science but how it will allow a greater community to participate in those research and scientific activities.  I think that is going to have a very profound affect on scientific policy and other educational policies and so forth.

               So you have been hearing all the talks, the web2.20 tools mashups, blogs, wikis service from the architectures are transforming all different walks of life.  You have been hearing about Business2.0 enterprise, Battlefield2.0, the U.S. Military has a major program and using these technologies in a variety of fields. Microsoft has just introduced a program called Telco2.0, all sorts of mashups for telephone companies and network services and so forth.

               So these same tools which are transforming all sort of walks of life are, as you have heard from our speakers and I am sure following speakers, are going to transform science and research for scientists and researchers themselves, but also for a much larger community.

               And this has been labelled a citizen’s science, it will allow a faster transfer of knowledge, you know, as opposed to waiting for the papers and journals, we are seeing now the transfer of science and knowledge so much quicker coming from academia and the research community through blogs and wikis and so forth.  And now that is the major medium now for new knowledge and new information that has been past around the world.

               And it is also democratizing science.  Increasingly as we see scientific data being digitized, therefore it becomes immediately more accessible, assuming you solve the DRM issues. And so it is not only accessible to all the scientists, but it is also accessible to members of the public.  And the public can then take the same data and run their own models and do their own analysis, and this is going to have a significant policy impact. 

               Let us just give you one simple example.  You may have heard of a few weeks ago, a blogger had taken some of this C02 data and discovered that in fact the warmest period in earth’s history in the last 100,000 years was not the last 10 years, which is the common assumption, but actually happened in the 1930s, because he had done some analysis and comparative analysis and corrections and so forth.

               Now, this has been debated but this is a good example of how one individual, one blogger can get access to this data and do a different analysis interpretation which, of course, has significant policy implications and so on and so forth.

               But now there is all sorts of activities by students and members of the public involved in doing these types of things in astronomy, in high energy physics, climate science and all sorts of things.  And Intel, for example, just released a product called Mashups for the Masses, which is a set of tools that really enhances capability for individuals to grab datasets from different areas, mash them together and create new results and new interpretations of the original datasets.

               So you have heard of mashups mostly coming from the Google world of, you know, taking geographical data and mashing it up with real estate data and violence and so for so you can get maps showing where the most houses are, where are the lowest crime rates and so forth.  But now people are using these mashups to merge together different data sets from all sorts of different fields.

               And so also in the past this computational science was largely restricted to those who had big high performance computers and the big databases and storage facilities to do this type of analysis.  But with tools like EC2 and S2 from Amazon and other companies those types of resource now are available to the average user or to students at a very low cost so they can do this type of computational science themselves, take these same datasets, run large models using either peer to peer networks and so forth or the newest tools like from Amazon and other service providers.

               So the key elements of course for precipitative web free science are the distributed databases, instrumentation and computational facilities, extensive virtualization.  What we are seeing is these ad hoc developments of what we call virtual organizations built around these types of structures, not only between scientists themselves but between communities of users interested in these very scientific activities and using workflows and mashups and so on and so forth.

               And so what is happening is this real democratization of science being made available by these web 2.0 tools.  Bioinformatics community, for example, there is a group of researchers and members of the public who are developing a whole bunch of mashup tools and service architectures using Amazon S2 and EC2 to provide non-researchers tools to do a lot of these bioinformatics analyses, genome analyses and so on and so forth.  And so these are types of things that are starting to happen out there at a grass-roots sense rather than from a formal research environment.

               So here is some very quick examples of these types of activities. A big one is of course crowd-sourcing. There is a group of researchers now who are using crowd-sourcing tools to allow the large community of humans to really identify new research techniques and new scientific evidence. 

               So, for example, there was a gold company here in Canada put out a prize, sent out to the large internet saying our geologists think the gold vein is here, we invite the community to analyze that same geophysical data and come up with their own interpretation where the best gold veins are. And surprise, the community came up with the better answers than the professional geophysicists.

               And so now -- I wanted to bring out my computer except it broke down -- there is now a research community in the United States dedicated to this, using crowd-sourcing, to use the large collective knowledge of the human population to identify these new trends and new ideas.

               Another good example is this Project Neptune many of you may have heard of.  This is was a joint U.S./Canadian project.  Fortunately, Canada got its funding first and we are the first to deploy.  But this is a large undersea fibre network on the ocean floor off the west coast of Canada and the United States and this is now being deployed as we speak.  And this is going to have all sorts of undersea instruments, cameras, robotic devices, sensors on the ocean floor to measure all sorts of geophysical and oceanographic phenomenon and so forth.

               And of course, you can’t send a researcher to the ocean floor. This is all going to be remotely accessible.  And this data will not only be accessible to the scientists who participate in this project, but it is designed from day one that this data will also be accessible to students and to the public at large. And this talk of virtual aquariums, we already put down high definition TV cameras, you can watch these smokers, you can see the various biota that exist around that and it is available to anybody on the website right now to look at this type of activity.

               Another good example is from Canada, our forests are very important to us, as well as snow.  But this is a large distributed grid being built by our government research department linking up sensors on the forest floors, databases, satellite data and data from a variety of sources to measure the health of Canada’s forests.  But one of its primary objectives is to measure Canada’s compliance the Kyoto Agreement. 

               The Kyoto Agreement is dead now, of course, but Canada signed on under the assumption that our forests are big sinks for carbon dioxide.  But that was an assumption, we really do not know how well our forests absorb carbon dioxide and so we hope that the data from this will allow us to justify driving our SUV’s over the next ten years.

               But, again, this data is all going to be made available on the public, so the public can also interpret this data.  So, it’s just not going to be some high priests of science who say yes or no, but also the same information will be available to any community to re-interpret and re-examine this type of data.

               Another great example is the ALTA Cosmic Ray project. This was started at the University of Alberta with the high energy physics community.  It involves fifty schools now, or probably more, across North America, who are looking at very deep space high energy cosmic rays, and the students participate in this activity.  The data is gathered through web services and collected at the University of Alberta and the students are involved in the analysis and interpretation and so forth, to really understand the very cosmological origins of these very deep space high energy x-rays.  And it’s a great project for students to work with real science and scientists on trying to analyse and interpret this type of data.

               Another on in New York is the Meteo Grid project. This is to allow the democratization of weather forecasting, something that’s very important to a lot of people.  Today, weather forecasting is very much big, central computers, you know, that grind out surveys every four hours and so forth.  But now what they’re doing is producing metadata sets which are distributive (inaudible) very centres on using peer-to-peer networks and so forth, so schools and communities can do their own very localized forecasts on a much smaller grid than what is possible from these big central government sites.  So, again, it’s like this example of how data can now be migrated to various groups who can then use that data, mash it up with their own local information from their own local sensors, and come up with a very detailed forecast for their very specific area.

               And, of course, the Sloan Digital SkySurvey. This is the late Jim Grey who was very instrumental behind this.  Again, this is a site of astronomical data.  Many of these services were built by students.  Again, it is available to scientists and students and the public at large. 

               Now, because of this type of service, most of the large supernovae are being discovered by members of the public as opposed to professional astronomers.  So, by using various techniques of scanning all the images and scanning the data, it’s the public who are making these discoveries as opposed to professional astronomers.

               So, that’s just -- the one last one is the Faulkes telescope.  This is an eccentric billionaire in England who has funded this project.  He was told it was going to be a few million dollars and he believed the researchers.  It turned out to be close to one hundred million, I think.  Anyway, this is two telescopes he’s built, one in Hawaii, and in Australia and they are professional telescopes used by professional astronomers.  But, also that information and data is being made accessible to students and schools in England and a couple here in Canada.  And the beauty of it is, because when it’s nighttime in Hawaii of course it’s daytime for the schools in eastern Canada, at least, and in the UK, and the students -- there’s all sorts of activities.  The students work with astronomers looking at real data and phenomena with these telescopes. And, again, there is this type of collaborative ad hoc virtual organization that’s possible and the extension of science into a much larger community.

               So, that, I hope will stimulate some of your thinking of the potential of what these Web 2.0 and participative technologies will enable, at least in the scientific community, as well as many other walks of life. 

               Thank you.

               MR. STEWART:  Thank you, Bill.

               Le prochain intervenant sera Diana Rhoten, directrice de programme, Office de Cyberinfrastructure, National Science Foundation.

               Diana.

               MS RHOTEN:  So I’m going to talk a little bit about some of the learning and knowledge production affordances of Web 2.0 for science.  I just wanted to start with this clip.

--- Video presentation

               MS RHOTEN: So in addition to providing some interesting statistics about some of the usage of Web 2.0 services, particularly by kids, it is important to note that that clip which is available on-line was created by a high school teacher and then it was edited by myself using tools available on-line and then using creative comments licensing.  I can present it to you mashed up, mixed up, re-mixed by me.  So, just to use some of the tools of Web 2.0 for the purposes of the presentation.

               So, what’s the calculation for science? Technological capacity is increasing; we’ve all heard that.  Moore’s Law tells us that scientific complexity is increasing -- we’ve heard that Andrew -- requiring at the same time increase specialization and increased collaboration and integration.

               If you multiply that by the fact that the generation of scientists coming into science are coming through a digital society we have what’s potentially Science 2.0.

               We need to think about the next generation implications for science by thinking about what the next generation expects from its use on-line and its expectations about use.

               What are some of the characteristics of Science 2.0? This is courtesy of Ian Foster whom some of you may know.  It’s been adapted by me.  We see that there’s changes in the nature and the size of scientific data.  I’m not going to go through each one of these. But changes in the unit and venue of scientific communication.

               As Bill mentioned wikis, blogs, project websites, become very much the outlet for both scientific data, scientific finding, scientific publications.  But, we’re also moving beyond just publications to simulations, visualizations, creating new databases.  These are all new products that are coming out of Science 2.0.

               It’s also changes in the location and the structure of the social aspect of science.  Science used to be done in co-located environments.  Research centre was very much -- at least in the United States, and the investment of NSF in the 1980's.

               We’re now really looking at distributed science. We have scientists sitting in Canada, sitting in China, sitting in Australia, working on the same problem.  This new forum requires new social norms as well as organizational forums.

               We also see venues of scientific interaction changing.  We’ve gone from community co’s (inaudible) to science gateways, campus and national grids, to science on the Internet.

               We’ve talked about science as a computation. From computational science to science as computation.

               We also see this bleeding into all fields. We’re moving from just the physical sciences to all the sciences, including the social sciences as well as humanities.

               So, in Science 2.0 we really are looking at distributed knowledge production and learning.  And I’ve created here, this is a chart borrowed from Dan Atkins.  You can see we’ve gone from same time-same place, to same time-different place, different time-different place.   Much of the aspect of Science 2.0 is happening in a virtual environment. 

               I wanted to talk to you a little bit about some of the virtual environments or virtual exemplars of Science 2.0 that come out of NSF or are supported in part by NSF.  

               So BIRN is the Biomedical Infomatics Research Network.  This is a geographically distributed virtual community that shares resources, including instruments to examine medical images and create diagnoses. This is an example of a closed virtual environment in the sense that this is really still left to, to use both terms, the high priestess of science.

               So it is a distributed network but it's a very high level science.  It's limited to professional researchers.

               Whereas if you look down at nano hub, the science gateway, this was created in 2001.  This is a good computing base but web-enabled portal that enables anyone to access scientific tools, do research demonstration and even run simulations.

               So just by getting a user name, a log-in and authentication code, anyone, including myself, including yourselves, can run simulations around anything related to nano technology.  They have a variety of different workshops, lectures, curricula and simulation tools available.

               Just a few stats on them.

               In the last year they have had over 25,000 users from 172 countries. They have had 5,730 users run 230,000 simulations.  So you are seeing a real draw to this portal.

               Eighty research publications actually now cite nano hub.

               Bill mentioned that what I have there, the Sloan Virtual Observatory, the Sloan Digital Sky Survey, this is an example, as he mentioned, of citizen science and crowd sourcing.  Just to give you some stats on the use there, 200 million Web hits in the last five years; 930,000 distinct users versus 10,000 astronomers.

               So again to the point of citizen science and the democratization of science, you really see the general public being drawn to these opportunities and committing their time and their energies to participate in science.

               In the bottom right-hand corner, we have the example of Second Life.  You heard Jim talk about Second Life this morning.

               Science is coming to Second Life as education is coming to Second Life.  There are learning affordances within Second Life, as well as communication affordances, in the sense that you can actually share, create objects, learn about objects, recreate objects and manipulate objects in a 3D environment.

               As of my last check, there are approximately 160 universities now in Second Life.

               Their activities range from giving courses and running conferences or lectures to trying to actually create new objects that can teach science in totally new ways.

               There is a new area within Second Life also dedicated to science, called SciLands.  Nature Publishing has an island within science.  So we are seeing a flood of activity there, which is interesting.