After the institutionnalisation of social network analysis (http://www.insna.org/) and its concretisation in on-line bibliometric tools with ISI, the emergence of technologies to empower qualitative analysis based on content analysis or grounded theory is now identified as domain (see the CAQDAS networking project http://caqdas.soc.surrey.ac.uk/). But, qualitative analysis of large data bases and the booming of e-datas is creating a new space for research, specifically with the aim of mapping political arenas in relation to specific issues (see the Govcom project http://www.govcom.org/) and for the mapping of controversies (see Macospol webpage http://www.demoscience.org/resource...). Designing and engineering solution for the visualization of complex networks represents a forthcoming challenge (see M.Lima’s project http://www.visualcomplexity.com/vc/ or C. Gannon’s project at the MIT http://caitlingannon.com/2007/09/21...). Moreover, the next step does not address only a technological gap for social sciences; it also means the development of epistemic bargain between disciplines of social sciences, artificial intelligence and computing sciences since the complexity of research problem is increasing in relation to the profusion of new data (see the ISC-PIF project http://iscpif.fr/tiki-index.php?pag...).
Within this context, the CorTexT initiative of IFRIS represents a collaborative project to interface research dynamics and instrumental dynamics into a Platform. CorTexT project aims at assembling: machines and software, skills and knowledge, methods and know-how in order to empower IFRIS researchers with specific tools, instruments, computing systems, database repository and methodology to define, retrieve, store, caracterize and analyze corpuses of texts. The studies of sciences dynamics and innovation in society face the heterogeneity of data, methods and purposes not only to measure but also to characterise the production, circulation, and uses of knowledge in society. (1) The provision, through Internet, of an increasing amount of sources and data is establishing a huge and permanently renewed field of enquiry for textual analysis; (2) The traceability of scientific activities and innovation processes in society thanks to databases of scientific articles, patents, scientific projects, experts recommendations, media coverage, regulation and blogs, represents a new source of data. (3) Researchers in social sciences are also producers of texts through their own research practices (databases, interviews, surveys, and archives), this is also a matter of interest. CorText is thus a technological platform positioning itself as a particular kind of digital laboratory centred on the exploitation and the analysis of heterogeneous textual data. The platform is at once a physical place to assemble together computing tools, methodologies and skills to treat, but it has also to be accessible through Ethernet access in order to enable delocalised researchers of IFRIS to work at distance.
Developping IT Solutions At the heart of the platform CorTexT are the Collaborations between the technological capacity and research projects of the IFRIS community: upstream to the project with the participation of CorTexT to the set-up of the scientific problem by taking into account methodological and technical constraints; throughout the whole project, the platform being then a full partner with a methodological role to provide IT resources and interfaces; along personal or collective on-going projects in which a methodological or technical support is punctually needed.
The technological platform is also a centre for training and acquisition of new skills. CorTexT organizes an annual meeting to present its activities to the IFRIS research community. Apart from this type of general training, COrTexT organize specific day on a instrument or WorkShop mobilizing a panel of instrument and DataBase on a specific thematic.
The platform has also to increase gradually the methodological skills of the team and the offer of ready-made solutions to the IFRIS community. In this purpose is starting to develop a watch on CAQDAS, text computing tools, IT technology and visualisation projects. Exploration and self-development It is important to note that the platform is not reduced to a service capacity. It is proposing a IT capacity that is linked to research project; from specific project CorTexT aims at proposing treatments in routine with specifications for users. Thus, CorText has also to insure a function of methodological and technological exploration for its own engineers.
Staff and collaboration Project Engineer (Audrey Baneyx), IT Engineer (Philippe Breucker), Scientif Director (Marc Barbier), and temporary participation of IT Engineers for IFRIS (Lionel Villard) and external IT services; also with the collaboration of IT Methodologist (Andrei Mogoutov) and IFRIS researchers particularly involved in the project at that stage. Two strategic collaborations are in progress, one with the MediaLab of SciencesPo and another with researchers of the Complex Systems Institute of Paris Ile-de-France. Budget Around 30 k Euros per year (salaries not including). IT resources: computers, servers, screen, software, video-projector, visio conference system.
Governance
The platform is governed by a "Steering committee" twice a year, which is following up the objectives of the platform, examining the balance of activities and discussing current and projected budgets. In relation to this committee, a Network of IFRIS-researchers is established. Though an email list and a yearly meeting, its role consists in giving feedbacks and making propositions concerning the resource provided and the evolution of CorTexT. Activities and outcomes Developing tools to identify Actors and Authors The identification of institutional "actors" is a key issue for the works on large Database full of noise and ambiguities. Today, even within the framework of standardized databases (as Web of Science), the problems of homonymy and follow-up authors through time remains very delicate. We develop an approach consisting in following the cognitive networks (co-quotations, cooccurrence of keywords), social (co-authors) and institutional information (places and institutions of membership) to differentiate homonymous authors and to follow the geographical and institutional movements of central authors for a field or a speciality. Developing tools for bibliometrics To identify correctly bibliographical information is an importing stake in the SHS both for the analysis of data quantitative and qualitative. The idea is to allow the researchers in SHS to analyze their own scientific production from database such as Web of Sciences or Googles Scholar. The solution proposed is called CleanPop (http://cleanpop.ifris.net/). Capturing data in the case of public controversy Another important question concerns the retrieving of series of data and their capture into a framework with the view to develop an array of tools crossing semantics analysis and social network analysis. The Dynamo project is currently in definition and will be developed in relation to the strengthening of data visualization. Developing tools for contextual boundary work of research domain The "contextual" demarcation of a scientific issue or domain is a recurring question. The development of tools to study demarcation is to be applied to answer two questions. The first question concerns the demarcation of a field in emergence such as that of the nanotechnologies, or large transdisciplinary problem of sustainable development. A first solution base on KeyWords clustering is currently proposed, it’s called K-Words Lab.
In relation to a research project handled by P.Laredo and B.Kahane the engineers of CorTexT have responded with the K-Words Lab project, in order to visualize and analyse the dynamics of KeyWords’ clusters through. Based on the QTClust algorithm and on the development of an interface to profile queries and sets of parameters, a solution has been designed and implemented. It consists in converting data from a MySQL environment into a Graphml module and to generate mapping and visual charts thanks to Gephi, all are open technology.
CorTexT is currently proposing some solutions and software for IFRIS Researchers, but their online availability has to be improved. Members of the Platform have published articles and communications based on the work realised. The ENID conferences and the EGC community have been targeted in 2009. CorTexT is also a participant in two Research Projects of the French Research Agency and with research projects about Indicators developed in IFRIS.