How to evaluate science, technology and innovation in a development context?

Monday, March 28, 2022 23 contributions

How to evaluate science, technology and innovation in a development context?

Dear EvalForward members,

I lead the Evaluation Function of CGIAR, a global research partnership tasked with delivering science and innovation that advance the transformation of food, land and water systems in a climate crisis. I am joined today by Nanae Yabuki of the Office of Evaluation of the Food and Agriculture Organization (FAO) of the United Nations, which leads international efforts to end hunger and achieve food security for all.

CGIAR and FAO disseminate their knowledge through knowledge products such as publications and databases. However, when it comes to evaluating the quality of the science, technology and/or innovation (STI) used in our knowledge efforts, this is not always straightforward.

Evaluating the quality of STI should cover both normative knowledge at headquarters and operational work in the field. It requires appropriate criteria and methodologies, such as the use of bibliometric analysis, interviews, surveys, case studies and synthesis.

Building on experiences in evaluating quality of science, and to better align with the Quality of Research for Development (QoR4D) and industry practice, the CGIAR evaluation function recently issued a technical note on the use of bibliometrics to measure and evaluate science quality, with a view to developing guidance on evaluating the quality of science.

Additionally, to this end, FAO’s Office of Evaluation has been designing a quality of science evaluation to better meet the needs of users in the evolving development context.

We would like to hear your perspectives on and experience of the following:

1. What do you think are the challenges in evaluating quality of science and research?

a. What evaluation criteria have you used or are best to evaluate interventions at the nexus of science, research, innovation and development? Why?
b. Could a designated quality of science (QoS) evaluation criterion help capture the scientific aspects used in research and development?

2. What are the methods and indicators that could work in the evaluation of science and research?

3. Have you seen monitoring, evaluation and learning (MEL) practices that could facilitate evaluations of science, technology and innovation?

Because of the specific expertise of the EvalForward community in applied monitoring and evaluation processes, we welcome your comments to help us learn together.

Background documentation:

Blog: How to evaluate science, technology, and innovation in a development context | Eval Forward
Technical note: Science-Metrix & CGIAR Advisory Services Secretariat Evaluation Function (2022). Bibliometric Analysis to Evaluate Quality of Science in the Context of One CGIAR. Technical Note. Rome.
Article: Rünzel, M., Sarfatti, P. & Negroustoueva (2021), Evaluating quality of science in CGIAR research programs: Use of bibliometrics

Looking forward to hearing from you.

Svetlana Negroustoueva
Evaluation lead, CGIAR Advisory Services Shared Secretariat

This discussion is now closed. Please contact info@evalforward.org for any further information.

Svetlana I Negroustoueva
Lead, Evaluation Function CGIAR
Italy
Dear colleagues,
We were very excited to hear from 23 participants from a range of backgrounds. The richness of discussion by a diverse range of experts, including non-evaluators, highlights an overall agreement on the importance of framing and use of context-specific evaluation criteria for contextualizing evaluations of science, technology, and innovation. Below please find a summary if points made, with an invitation to read actual contributions from authors if you missed.
Frame of reference
The following frameworks were introduced: Quality of Research for Development (QoR4D) frame of reference , the Research Excellence Framework (REF) and the RQ+ Assessment Framework
The main discussion touched upon the Quality of Research for Development (QoR4D) frame of reference elements, and directly or indirectly linked to the evaluation criteria: relevance, legitimacy, effectiveness and scientific credibility.
For Serdar Bayryyev and Lennart Raetzell, the context where products are used when determining their relevance was key. When assessing effectiveness (or quality) Serdar suggests to (1) assess the influence of the activities and the extent to which the science, innovation and research products have influenced the policies, approaches, or processes and (2) assess the degree of “networking”, i.e., the degree to which the researchers and scientific institutions have interacted with all relevant stakeholders. Lennart Raetzell shared her recent experience with a thematic impact evaluation for VLIR-UOS (https://www.vliruos.be) on pathways to research uptake mainly in the field of agriculture.
Nanay Yabuki and Serdar Bayryyev agreed about importance of assessing transformational nature to determine whether research activities cause truly transformational change, or at least trigger important policy discourse on moving towards such transformational change. Relevance of science, technology, and innovations (STI) is context specific, as it is how STI triggers a transformational change.
Sonal D Zaveri asserted that Southern researchers agree about the need for research to be relevant to topical concerns, to the users of research and to the communities where change is sought. Uptake and influence are as critically important as is the quality of research in a development context, hence evaluations must measure what is important for communities and people. Many evaluations are designed at distance and evaluation choices are privileged, due to power of expertise, position, or resources. It would be difficult to accept the quality of science bereft of any values related to human rights, inclusion and equity. Unless the findings are used and owned by people —and especially those with little voice in the program—, it cannot be claimed that evaluations have led to public good or used for the benefit of all peoples. On that topic, Richard Tinsley highlighted the importance of considering the final beneficiaries of the scientific results. He stressed a need to have clarity about CGIAR’s primary clients (the Host Country National Agriculture Research Systems (NARS for CGIAR), and the final beneficiaries (multitude of normally unnamed smallholder farmers), who are still one step removed from the CGIAR’s clients (NARS).
From a donor point of view, Raphael Nawrotzki finds that the sub-components of the QoR4D frame of reference are important to measure the quality of “doing science” rather than the results (output, outcome, impacts). Hence, the need to use the OECD DAC criteria of impact and efficiency to capture the “doing” (process) but also the “results” of the research for development enterprise. He asserted that the focus for a funder is on impact (what does the research achieve? What contribution does it make?) and, to a lesser extent, efficiency (how well were resources used, how much was achieved for amount spent?).
For Nanae Yabuki, using the scientific evidence for enhancing the impact of the interventions, reflected FAO’s mandate. For assessing these aspects, ‘utility’ of the research findings is more relevant than ‘significance’ of research findings. Hence the need to come up with appropriate criteria for each evaluation.
Norbert TCHOUAFFE, brought in Theory of Change as a tool to evaluate the impact of science-policy interface network on a particular society, based on five determinants (Scorecards on awareness, Know-how, Attitude, Participation, Self-evaluation).
Methods
The discussants agreed about the importance of using a mixed-method approach to combine both qualitative and quantitative indicators. According to Raphael Nawrotzki, mixed-method approach, is needed especially when evaluating relevance of the research questions and fairness of the process.
Quantitative methods: strengths and limitations
Among quantitative methods, the use of bibliometric analysis was mentioned for:
- evaluating science impact, i.e., impact within a scientific field, still measured best by the number of citations that an article or book chapter receives (Raphael Nawrotzki);
- assessing the legitimacy of the research findings and the credibility of the knowledge products (Nanae Yabuki);
- providing a good indication of the quality of science (QoS), since published papers have already passed a high-quality threshold as they have been peer-reviewed by experienced scientists (Jillian Lenne);
- providing an important overview of the efforts made, and the scientific outreach achieved (Paul Engel).
Thinking about QoS and innovation evaluation, Rachid Serraj illustrated the use of bibliometric and citation indices from the Web of Science (WoS).
Etienne Vignola-Gagné, co-author of the Technical Note, highlights the new and broader range of dimensions of bibliometrics indicators —namely cross-disciplinarity, gender equity, pre-printing as an open science practice, or the prevalence of complex multi-national collaborations— useful for assessing relevance and legitimacy. Some bibliometric indicators can also be used as process or even input indicators, instead of their traditional usage as output indicators of effectiveness. Bibliometrics can be used to monitor whether cross-disciplinary research programs are indeed contributing to increased disciplinary integration in daily research practice, considering that project teams and funders often underestimate the complexity of such research proposals.
For Valentina de Col bibliometrics (e.g., indexing of the Web of Science Core Collection, percentage of articles in Open Access, ranking of journals in quartiles, Altmetrics) were used on published journal articles, and Outcome Impact Case Reports (OICRs) to describe the contribution of CGIAR research to outcomes and impact. Raphael Nawrotzki suggested other related bibliometric indicators: (a) contribution to SDGs; (b) average of relative citation; (c) highly cited publications; (d) citation distribution index.
Keith Child and Serdar Bayryyev noted limitations of bibliometric analysis. For example, not all science, innovation and research products are included and properly recorded in the bibliographic databases, or not even published, hence not all products can be assessed. Furthermore, calculating average number of citations, also presents the basis for some biases: (1) overly exaggerated attention to a specific author; (2) some authors may also deliberately exclude certain reference materials from their publications. Raphael Nawrotzki noted limitation specifically associated with measuring scientific impact through bibliometrics: (1) Long time periods (it can take decades for results from investments in agricultural research to become visible; a robust measurement of science impact in terms of bibliometrics is only possible about 5 years after a research project or portfolio has been completed); (2) Altmetrics (it is difficult to combine bibliometrics and altmetrics to get a full picture of scientific impact); (3) Cost effectiveness (the fraction of support attributable to each funding source is not easily determined; computing cost-effectiveness measures comes with a host of limitations). Paul Engel extended the list of limitations of bibliometrics: it provided very little information on policy outreach, contextual relevance, sustainability, innovation and scaling of the contributions generated through research partnerships. Ola Ogunyinka asserted that the ultimate beneficiaries of the CGIAR (smallholder farmers and the national systems) are far removed (access, funds, weak structures etc) from the journals considered in the bibliometric analyses.
Overall, Jill Lenne and Raphael Nawrotzki agreed on the value of using altmetrics.
Graham Thiele suggested use of social network analysis (SNA) of publications, to explore who collaborates to these and what is their social and organizational context as a complement to bibliometric analysis, particularly for the legitimacy dimension. Valentina de Col used SNA and impact network analysis (INA) to investigate the research collaboration networks of two CGIAR research programs.
Finally, Graham Thiele warned against the risk of using state of the art methods and increased precision of bibliometric analysis (currently available and produced on a continuous basis) at the expense of missing the rounded picture that other studies —such as outcome case studies and impact studies— provide. This point is supported by Paul Engel, based on his experience evaluating Quality of Science in CGIAR research programs.
Guy Poppy introduced the Research Excellence Framework (REF), which alongside assessing research outputs, also evaluates impact case studies and the research environment producing a blended score with outputs having the biggest weighting but impact growing in weighting.
Qualitative methods: strengths and limitations
Using qualitative methods, along with bibliometrics and altmetrics, is essential for a broader picture when assessing quality of science.
Qualitative assessments can be done through interviews and/or surveys. With regards to measuring impact, Valeria Pesce highlighted that qualitative indicators are often based on either interviews or reports, and making sense of the narrative is not easy. She echoed post from Claudio Proietti, who introduced ImpresS.
Ibtissem Jouini highlights trustworthiness of evidence synthesis, yet challenged by the variety of evidence that can be found, the evaluation criteria, approaches, focus, contexts, etc.
Limitations of qualitative methods were also noted by Jillian Lenne and Keith Child –qualitative assessments require the evaluator to make subjective judgments.
Consideration for participatory approaches and methods were highlighted by Sonal D Zaveri: the difference between "access" and "participation", passing though the concept of power (hidden or explicit). Women as traditional bearers of local and indigenous knowledge find themselves cut off from the networked society, where information, communication, and knowledge are ‘tradeable goods’.
Valeria Pesce, Etienne Vignola-Gagné and Valentina de Col discussed actual tools and ways to address challenges of both qualitative and quantitative indicators: IT tools, that allow to (sometimes automatically) classify against selected concepts, identify patterns, word / concept frequency, clusters of concepts etc., using text mining and Machine Learning techniques, sometimes even starting directly from video and audio files.
For narrative analysis: ATLAS.ti, MAXQDA, NVivo- powerful narrative analysis; Cynefin Sensemaker and Sprockler for design and collection functionalities; NarraFirma - strong conceptual backbone, helping with the design of the narrative inquiry and supporting a participatory analysis process.
Conclusion and next steps:
Even without standardization of methods, efforts should be made to design the STI evaluations so that the evaluation results can be used, to the extent possible, at the institutional level, for instance for higher strategic and programmatic planning (Nanae Yabuki, FAO), but also at the level of those who are affected and impacted (Sonal D Zaveri, and others).
Valentina de Col highlighted the value of consolidating and adopting a standardised approach to measure quality of science (QoS) within an organization like CGIAR, to help measure better the outcomes, assess effectiveness, improve data quality, identify gaps, and aggregate data across CGIAR centres.
The worthiness of this discussion for learning is undeniable. In CGIAR, at CAS/Evaluation we have started developing the guidelines to operationalize quality of science evaluation criterion in the revised CGIAR Evaluation Policy. Let us know if you are interested in further engagement.
Referenced and Suggested readings
Alston, J., Pardey, P. G., & Rao, X. (2020) The payoff to investing in CGIAR research. SOAR Foundation. https://www.cgiar.org/annual-report/performance-report-2020/assessing-cgiars-return-on-investment/
Belcher, B. M., Rasmussen, K. E., Kemshaw, M. R., & Zornes, D. A. (2016). Defining and assessing research quality in a transdisciplinary context. Research Evaluation, 25(1), 1-17.
DOI: 10.1093/reseval/rvv025
Norbert F. Tchiadjé, Michel Tchotsoua, Mathias Fonteh, Martin Tchamba (2021). Ecological engineering to mitigate eutrophication in the flooding zone of River Nyong, Cameroon, Pages 613-633: https://link.springer.com/referenceworkentry/10.1007/978-3-030-57281-5_8
Chambers, R. (1997). Whose reality counts (Vol. 25). London: Intermediate technology publications.
Evans, I. (2021). Helping you know – and show – the ROI of the research you fund. Elsevier Connect. https://www.elsevier.com/connect/helping-you-know-and-show-the-roi-of-the-research-you-fun
Holderness, M., Howard, J., Jouini, I., Templeton, D., Iglesias, C., Molden, D., & Maxted, N. (2021). Synthesis of Learning from a Decade of CGIAR Research Programs. https://cas.cgiar.org/evaluation/publications/2021-Synthesis
IDRC (2017) Towards Research Excellence for Development: The Research Quality Plus Assessment Instrument. Ottawa, Canada. http://hdl.handle.net/10625/56528
Lebel, Jean and McLean, Robert. A Better Measure of research from the global south, Lancet, Vol 559 July 2018. A better measure of research from the global south (nature.com)
McClean R. K. D. and Sen K. (2019) Making a difference in the real world? A meta-analysis of the quality of use-oriented research using the Research Quality Plus approach. Research Evaluation 28: 123-135. https://doi.org/10.1093/reseval/rvy026
Ofir, Z., T. Schwandt, D. Colleen, and R. McLean (2016). RQ+ Research Quality Plus. A Holistic Approach to Evaluating Research. Ottawa: International Development Research Centre (IDRC). http://hdl.handle.net/10625/56528
Runzel M., Sarfatti P. and Negroustoueva S. (2021) Evaluating quality of science in CGIAR research programs: Use of bibliometrics. Outlook on Agriculture 50: 130-140. https://doi.org/10.1177/00307270211024271
Schneider, F., Buser, T., Keller, R., Tribaldos, T., & Rist, S. (2019). Research funding programmes aiming for societal transformations: Ten key stages. Science and Public Policy, 46(3), pp. 463–478. doi:10.1093/scipol/scy074.
Singh,S, Dubey,P, Rastogi,A and Vail,D (2013) Excellence in the context of use-inspired research: Perspectives of the global South Perspective.pdf (amaltas.asia)
Slafer G. and Savin R. (2020) Should the impact factor of the year of publication or the last available one be used when evaluating scientists? Spanish Journal of Agricultural Research 18: 10pgs. https://doi.org/10.5424/sjar/2020183-16399
Vliruous (2019). Thematic Evaluation of Departmental Projects: Creating the Conditions for Impact. https://cdn.webdoos.io/vliruos/753d44b984f65bbaf7959b28da064f22.pdf
Zaveri, Sonal (2019). “Making evaluation matter: Capturing multiple realities and voices for sustainable development” contributor to the journal World Development - Symposium on RCTs in Development and Poverty Alleviation. https://bit.ly/3wX5pg8
Zaveri, Sonal (2021) with Silvia Mulder and P Bilella, “To Be or Not to Be an Evaluator for Transformational Change: Perspectives from the Global South” in Transformational Evaluation: For the Global Crisis of our Times edited by Rob Van De Berg, Cristina Magro and Marie Helene Adrian 2021-IDEAS-book-Transformational-Evaluation.pdf (ideas-global.org)
Zaveri, Sonal. 2020. ‘Gender and Equity in Openness: Forgotten Spaces’. In Making Open Development Inclusive: Lessons from IDRC Research, edited by Matthew L. Smith, Ruhiya Kristine Seward, and Robin Mansell. Cambridge, Massachusetts: The MIT Press. https://bit.ly/2RFEMw5
- read on a separate page
Nanae Yabuki
Evaluation Officer FAO
Italy

Thank you for the invaluable insights. The discussion has confirmed that a mix of qualitative and quantitative methods would be necessary to evaluate the Quality of Science (QoS) in the development context.
The bibliometric analysis could be useful in assessing the legitimacy of the research findings and the credibility of the knowledge products. FAO undertakes applied research and produces knowledge products such as publications. Bibliometric analysis can help assess the quality of our knowledge products.
FAO undertakes not only knowledge work but also field activities. We use the scientific evidence for enhancing the impact of the interventions, solving specific development problems etc. For assessing these aspects, ‘utility’ of the research findings is more relevant than ‘significance’ of research findings. Relevance of Science Technology and Innovation (STI) is context specific. How STI triggers a transformational change is also context specific. Therefore, we need to come up with appropriate criteria for each evaluation.
Though the standardization of evaluation criteria may not be practical, it would be worth making efforts to design the STI evaluations comparable at the institutional level for system-wide learning.
Various development organizations use STI in their work and evaluate it. I think we can learn from each other’s experiences.
- read on a separate page
Norbert TCHOUAFFE TCHIADJE
Senior lecturer / Researcher Pan-African Institute for Development Cameroon
Cameroon

Thanks for the insights. The contribution to the society or impact is the target funders are looking for. So from my perspective translating objectives to impacts needs new transformative strategies like the theory of change which I have conceived as a concept entitled "Tchouaffe's theory of change: TToC" (Tchouaffé, 2021) as tool to evaluate the impact of science-policy interface network on a particular society, based on five determinants (Scorecards on awareness, Know-how, Attitude Participation, Self-evaluation).
Have a good day.
Norbert TCHOUAFFE PhD
Pan-African Institute for Development Cameroon

Feel free to consult my chapter ecological engineering to mitigate eutrophication in the flooding zone of River Nyong, Cameroon, Pages 613-633: https://link.springer.com/referenceworkentry/10.1007/978-3-030-57281-5_8
Of the Handbook of climate management, published by Springer nature, 2021
- read on a separate page
Valeria Pesce
Information management specialist / Partnership facilitator Food and Agriculture Organization of the United Nations / Global Forum on Agricultural Research and Innovation (GFAR)
Italy

Thank you for keeping the forum open longer than planned. I was reading all the comments with much interest, not daring to contribute both because I'm new to the Eval Forward community and because I'm not an experienced evaluator, especially of science, but more of a general project-level M&E / MEL practitioner.
I'm posting something only now at the last minute in reply to question 3 on MEL practices, and specifically regarding the measurement of impact (which has come up a lot in other posts, thanks Claudio Proietti for introducing ImpresS), where qualitative indicators are often based on either interviews or reports, and making sense of the narrative is not easy.
Not sure if, in terms of practices, IT tools are of interest, but I think in this type of measurement some IT tools can help a lot. Of course the quality of the evaluation depends on the way narrative questions are designed
and the type of analysis that is foreseen (classifications, keywords, structure of the story, metadata), but once the design is done, it is very handy to use tools that allow you to (sometimes automatically) classify against selected concepts, identify patterns, word / concept frequency, clusters of concepts etc., using text mining and Machine Learning techniques, in some cases even starting directly from video and audio files.
A few tools for narrative analysis I'm looking into are: ATLAS.ti, MAXQDA, NVivo. Other tools I'm checking, which do less powerful narrative analysis but also have design and collection functionalities, are
Cynefin Sensemaker and Sprockler. An interesting tool, with more basic functionalities but a strong conceptual backbone, helping with the design of the narrative inquiry and supporting a participatory analysis process, is NarraFirma.
(O.T.: I would actually be interested in exchanging views on these tools with other members of the community who've used them.)
- read on a separate page
Paul Engel
Knowledge Perspectives & Innovation
Netherlands

I very much agree with Graham [earlier comment below] on this. I did one of these CRP reviews and the outcome and impact studies were key pieces we obtained more information from on what actually happened. And there were too few of them. The bibliometric analysis provided an important overview of the efforts made, and the scientific outreach achieved. Amongst other things, it provided very little information on policy outreach, contextual relevance, sustainability, innovation and scaling of the contributions generated through research partnerships.
- read on a separate page
Valentina De Col
Agricultural Information System Officer The International Center for Agricultural Research in Dry Areas (ICARDA)
Germany

1. What do you think are the challenges in evaluating quality of science and research?
In the case of CGIAR, perhaps consolidating and adopting a standardised approach to measure QoS beyond the single appraisal by each CGIAR centre. This could help measure better the outcomes, assess effectiveness, improve data quality, identify gaps, and aggregate data across centres.
a. What evaluation criteria have you used or are best to evaluate interventions at the nexus of science, research, innovation and development? Why?
Within MEL, we have used in the past a combination of quantitative and qualitative methods. For instance, bibliometrics (e.g., indexing the Web of Science Core Collections, percentage of articles in Open Access, ranking of journals in quartiles, Altmetrics) for published journal articles and Outcome Impact Case Reports (OICRs) to describe the contribution of CGIAR research to outcomes and impact.
We have also recently embarked on a study that uses social network analysis (SNA) and impact network analysis (INA) to investigate the research collaboration networks of two CGIAR Research Programs (CRP). Networks are generated based on journal articles published over the course of four years and their metadata is used to explore aspects ranging from team structures to the evolution of collaborations between organisations.
In general, a mix of quantitative and qualitative methods could be the most useful strategy, allowing for a combination of different approaches and metrics to measure the impacts of interventions.
b. Could a designated quality of science (QoS) evaluation criterion help capture the scientific aspects used in research and development?
Not a single but a combination of different criteria might be the best. We have learned from the work of Rünzel, Sarfatti, Negroustoueva (2021) the usefulness of using relevance, scientific credibility, legitimacy, and effectiveness within the framework for evaluating the Quality of Research for development.
Rünzel, M., Sarfatti, P. & Negroustoueva (2021), Evaluating quality of science in CGIAR research programs: Use of bibliometrics
- read on a separate page
Jillian Lenne
Consultant Independent consultant
United Kingdom

The Outcome to Impact Case Reviews (OICRs) which were part of the 2020 CRP Reviews should be expanded as an integral part of future evaluations of initiatives of One CGIAR. They provide an efficient way of combining quantitative, bibliometric and qualitative assessments of quality of agricultural agricultural for development.
- read on a separate page
Ola Ogunyinka
Senior Research Fellow Natural Resources Institute, University of Greenwich
United Kingdom

A very well put together tool that will ensure the uniformity of methodology in the CRPs as the system moves forward towards actualising the One CGIAR.
My disquiet is that even if the QoS is adjudged good, what then? How does this translate into the very essence of the birth of the CGIAR; increased food production at the SHF level, improved nutrition, health and livelihood at the HH level. The ultimate beneficiaries of the CGIAR are the SHFs and the national systems. These are far removed (access, funds, weak structures etc) from the journals that count.
Hence in my view, this tool has to be combined with a strong outcome/impact case study process that benefits the science within the CGIAR on the one hand and its effect on SHFs and national systems and the larger society
We need to define/elaborate QoS/QoR in such a way to capture why the need for the science in the first place and its effect/outcome to the society at large. The MEL component of this should be strengthened to address these larger issues to ensure that while we assess the science, we do not lose sight of its ultimate goal.
- read on a separate page
Guy Poppy
Director and Professor University of Southampton
United Kingdom

I have participated in two Research Excellent framework exercises in the UK (2014 and 2021) which assess the research excellence of many diff disciplines and institutions. Alongside assessing research outputs, it also evaluates impact case studies and the research environment producing a blended score with outputs having the biggest weighting but impact growing in weighting from 2014-2021.
It is mainly through peer-review but does include bibliometrics (undertaken centrally and provided to you so all done in the same way). Bibliometrics can be "used" to enhance a score as opposed to bringing a score down.
Interestingly the same researchers, institutions score well across the board although one does get "pinnacles" in some specific research areas and/or higher impact etc. Thus, those scoring large amounts of 3 and 4 * (international excellence and world leading) often have high scoring impact and/or research environments too as you might expect.
The above benefits from a substantive period prior to REF of the RAE and lots of information known about baselining/normalising. For example, in ag/food/vet (my panel), we had information on typical citations etc for the sub-topic etc and thus could see if above/below the norm alongside our own peer-review.
More can be found here: https://www.ref.ac.uk for 2021 and here for the 2014 exercise https://www.ref.ac.uk/2014/ which has been analysed by many and there is quite a science on REF and informing best ways of doing it moving forward.
It has an incentive both financially and for reputation which means "its a big deal" for UK universities whom spend lots of time and money on it.
I do think the breadth of coverage and consideration of rigour, originality and novelty etc for outputs and reach/significance and impact for Impact case studies and then a range of metrics on research culture for research environment offers a broad view of excellence and the fact its judged for the topic and in relation to others undertaking work on that topic etc. means it is very thorough and resilient - however, it takes a lot of time and money to do it. There remains a debate as to how much could be done using algorithms and not peer-review in the more STEM subjects.
- read on a separate page
Claudio Proietti
Monitoring and evaluation advisor CIRAD
France

Dear All,
I would like to share the experience that Cirad, the French agricultural research and international cooperation organization working for the sustainable development of tropical and Mediterranean regions, has and the efforts that have been made in the latest years to strengthen an impact-oriented culture through the fostering of an evaluative thinking. I think this work resonates with the challenges and willingness of using evaluation to better steer science and technology development and contribute in innovation processes.
Cirad has already made a first effort of reconstructing what they call the "building a culture of impact" process and you may access more detailed information here: https://doi.org/10.1093/reseval/rvy033
I joined Cirad one year ago and, coming from the CGIAR, I was not surprised to see that a publicly accessible repository was in place and that a set of standardised bibliometric indicators was regularly monitored: https://indicateurs-publication.cirad.lodex.fr/
I learned that, being Cirad a public institution, it undergoes a regular (every four year) evaluation coordinated by a public independent authority, the Haut Conseil de l’évaluation de la recherche et de l’enseignement supérieur (Hcéres). There are three points that retained my attention when I first looked at the Hcéres evaluation framework and method. https://www.hceres.fr/fr/referentiels-devaluation
The first point is that before an external panel of high-level national and international experts performs the evaluation, there is an internal self-evaluation process that is performed by the Cirad itself.
The second point was that the evaluation framework clearly identifies the capacity of the organisation to orient and implement its strategy based on societal challenges and demands, the centrality of partnerships and the quality of science as key criteria.
The third point was the granularity of the evaluation that was performed both at the institutional level and for each of the research units.
Along with this externally committed evaluation, regular evaluations are also performed to assess the main research and development regional networks (https://www.cirad.fr/en/worldwide/platforms-in-partnership).
Here you may find the latest Hcéres evaluation report (2021) in French: https://www.cirad.fr/les-actualites-du-cirad/actualites/2021/evaluation-hceres-du-cirad-une-culture-de-l-impact-et-un-positionnement-strategique-salues
The institutional mechanism I briefly described, was enriched, starting about ten years ago, thanks to the scientific and methodological efforts led by a team of researchers and experts under the umbrella of the Impact of Research in the South (ImpresS) initiative. https://impress-impact-recherche.cirad.fr/
ImpresS has defined a set of methodological principles (i.e. reflexive learning, systemic perspective, case study analysis, contribution analysis, actor-centred and participatory approach, outcome-orientation, focus on capacity development and policy support processes) to develop and implement evaluative approaches and tools that are currently used to assess long term innovation trajectories (ImpresS ex post: https://www.cirad.fr/nos-activites-notre-impact/notre-impact) and to conceive new interventions (ImpresS ex ante: https://www.cirad.fr/les-actualites-du-cirad/actualites/2021/impress-contribution-de-la-recherche-aux-impacts-societaux). In the latest years, the portfolio of research for development projects coordinated by Cirad has steadily increased. A mechanism has been recently put in place to foster and improve the systematic use of outcome evaluations for adaptive management and learning, and build a consistent body of knowledge on research contribution to societal and environmental changes. This mechanism will provide funding and methodological support to teams that are implementing flagship and strategic interventions.
Here you may find the methodological documents and other publications related to the ImpresS work: https://impress-impact-recherche.cirad.fr/resources/impress-publications
Diversity of perspectives and approaches appears to me as a key factor for assessing important dimensions of the performances and contributions of a research for development organisation and disentangle, as best as possible, the complexity of interactions and feedback loops that characterize the majority of these interventions in development contexts. Nonetheless, the most comprehensive evaluation framework would not be enough if the governance and management bodies at different organizational levels and the organisation as a whole would not be willing and able to learn and adapt based on the use of the evaluation findings.
- read on a separate page
Rachid SERRAJ
Director of Strategy UM6P
Morocco

Dear All,
These are indeed interesting questions and exciting debate.
Thinking about QoS and innovation evaluation, in the specific context of the CGIAR, while I agree with most of what has already been said, I could not resist -out of curiosity- the temptation to visit Web of Science, to check bibliometric and citation indices of some of the great R&D pioneers, such as Norman Borlaug or MS Swaminathan.
These heroes of so-called green revolution have built the reputation of the CGIAR and delivered impact at scale on global food security etc. But surprisingly they did not really publish much. For instance, Borlaug just has an H-Index of 13 and 54 total Publications (Web-of-Science checked today); MSS has an HI of 15. What would be probably the scores of an average postdoctoral fellow nowadays! So, what does this tell us after all?
A lot has already been said about impact and tradeoffs of the GR, but as we browse through the old files of the pioneer scientists that have made IRRI, CIMMYT, etc. in the old days, we sense a great commitment and dedication to science for the benefit poor. The noble mission that incentivized brilliant scientists, mostly from the northern hemisphere, to trade their comfortable labs for dusty fields across Asia, Africa and LAM. We can also assume that most of these ‘activist volunteers’ were not really chasing citations or easy recognition, but they had very clear ideas of what must be done and why..
Sixty years after its inception, the CGIAR must find a way to revisit its history and look for inspiration on how to renew its mission and commitments to relevant and impactful science, beyond convoluted measurements and “bean counting”... All parameters of the QoR4D frame of reference are indeed important, and bibliometric parameters are also part of the QoS equation. But what is even more crucial is the scientific culture and ethical values that attract, maintain, and inspire new generations of brilliant international scientists to join the system. The CGIAR scientist need to be put in the center of research impact pathways, the profile should be revalued. On the other hand, ‘system bureaucracies’ should be kept at minimum, as support services to assist scientists in their mission. There is something to be done on the L of the MEL, going back to the basics.
Apologies for being blunt and probably ‘off-track’ but having worked for many years as a scientist at 3 CGIAR centers, successively ICRISAT, IRRI, and ICARDA and then at the science council secretariat (ISPC), this is somehow inspired by my modest experience and devotion for the CG.
Best,
RS
- read on a separate page
Bia Carneiro
Specialist – Social Research & Media, CGIAR FOCUS Climate Security Alliance of Bioversity International and CIAT
Portugal

There are several innovative tools to leverage on bibliometric data for insights beyond citation metrics, which can give more depth to the evaluation of science - as pointed in other contributions, social network analysis techniques can be applied to identify networks of scientific collaboration or of funding institutions on a given topic.
Furthermore, recognising that traditional and novel metrics (such as altmetrics) tend to focus on the significance of research within the scientific community and fail to unpack reach beyond researcher networks, we have recently published a paper in which we used a digital methods approach to evaluate the "soft" forms of influence of climate science in the policy space. Our study proposed a framework to assess the broader influence of the CGIAR Research Program on Climate Change, Agriculture and Food Security (CCAFS) through online media representations.
By considering online networks and narratives as evidence of influence, we repurposed publicly available digital artifacts to assess CCAFS' reach among stakeholder at various levels. We assessed the dynamics of information diffusion, interaction and discourse amplification as representations of how CCAFS supported policymaking at various levels of engagement, as opposed to simply examining ‘formal’ policy outputs reported in monitoring mechanisms, or academic impact metrics.
Focusing on the importance of translating climate science into actionable policy, Google Trends, hyperlink analysis, network analysis, and text mining were applied in an integrated framework to assess the centrality and influence of CCAFS in the climate science-policy interface. Our approach points to the possibility of leveraging on data-driven methods to assess interactions and estimate influence. As such, the development of comprehensive evaluation frameworks by climate research programs requires the establishment of indicators that capture influence from a holistic perspective, not only based on traditional indicators, but also in relation to messaging, visibility, knowledge exchange and engagement.
Reference:
Carneiro B., Resce G., Läderach, P., Schapendonk, F., Pacillo, G. (2022) What is the importance of climate research? An innovative web-based approach to assess the influence and reach of climate research programs, Environmental Science & Policy, 133, https://doi.org/10.1016/j.envsci.2022.03.018
- read on a separate page
Svetlana I Negroustoueva
Lead, Evaluation Function CGIAR
Italy

Dear all,
It is great to see such rich and insightful contributions. It seems there is a wide consensus on the importance of the use of mixed methods for evaluating science and relying on QoR4D frame of reference. I really enjoyed reading your opinions and what you found challenging during your experiences.
With an additional day for discussion (through tomorrow, April 13th) we still hope for additional contributions. The following may further insight and guide those who have not shared their views and experiences yet, especially outside of CGIAR context.
There is an interesting possible debate between what funders may find important to evaluate against the priorities for Southern researchers. My understanding is that funders are widely interested in results (outputs, outcomes, impacts) and that OECD DAC evaluation criteria “impact” and “efficiency” are of particular relevance in terms of accountability and transparency to demonstrate that tax-payer money is used wisely. However, Southern researchers prioritize the need for research to be relevant to topical concerns, to the users of research and to the communities where change is sought. The importance of the relevance dimension was highlighted in several contributions and is mostly related to the importance, significance, and usefulness of the research objectives, processes, and findings to the problem context and to society.
How could relevance be measured in a way that the needs of southern researchers and funders converge? When talking about impacts, Raphael Nawrotzki mentions that the impact within a scientific field is still measured best by the number of citations that an article or book chapter receives.
The question to ask from the audience is “How the impact within a scientific field can reflect the importance of a certain research project and its impact/contribution to the society where change is sought?”. There seems to be a consensus on importance of ‘relevance’ component and the relation between the research output and the original Theory of Change, and a process that was followed in its development. Can this be aligned to measuring ‘relevance’ in a way that can also be considered solid and credible to funders?
And last but not the least, what about practice “Have you seen monitoring, evaluation and learning (MEL) practices that could facilitate evaluations of science, technology and innovation?”
We look forward to more sharing to close off this discussion, and identifying opportunities for further engagement
Sincerely,
- read on a separate page
Richard Tinsley
Professor Emeritus Colorado State University
United States of America
Another interesting and important discussion topic. As I look at the comments to date, they remain mostly confined to the academic/research community and do not get to the final beneficiary of the scientific results. For the CIGAR this can be critical as the CIGAR is considerable isolated from the intended final beneficiaries. That is the primary clients for the CIGAR are the Host Country National Agriculture Research Systems (NARS), while the final beneficiaries are multitude of normally unnamed smallholder farmers, who are still one step removed from the CGIAR’s NARS clients. That would be the national agricultural extension program. These final beneficiaries usually have no direct access to the research results and cannot afford the referred technical journals mentioned as the primary product for the research.
Thus, while I think the CGIAR does an excellent job of basic research particularly regarding varietal improvement for many host countries, I do question how effectively the results can be utilized by the smallholder final beneficiaries. I think most of the varietal improvement in developing countries such as those in Sub Sahara Africa are facilitated by “collaborative” programs between the appropriate CGIAR center and the host NARS. However, the collaboration is supported by external funding to the CGIAR team to cover the operating costs. Thus, it becomes more a CGIAR effort than a fully collaborative effort. From a smallholder operational perspective, the varietal improvement program is the CGIAR most effective research intervention. The reason being it is a simple substitution for what smallholders are already doing, with limited, if any, additional labor required. There may be some substantial logistical requirements in getting newly released variety seed available to smallholder, and logistic can be a major hindrance in many host countries.
The problem comes when you get away from the varietal improvement effort and work with innovations with a labor or other higher operational requirement. Then the limits of small plot research, the basis of most agronomic analysis, becomes a problem. While small plot research does a tremendous job of determining the physical potential of a research innovations in the region it is undertaken and can produce high quality referred journal papers well appreciated by the academic/research community, it does not address the operational requirements, such as labor, necessary to extend the results across a smallholder community. It just assumes it is not a problem. However, labor can be highly limited in most smallholder communities as well as the dietary energy to fuel the labor. How often are the CGIAR’s non varietal improvement research innovations more labor intensive than what the smallholders are currently doing? The question is who within the CGIAR collaborative effort to assist smallholder farmers is responsible to:
- Determine the labor requirements to extend the small plot research result across the smallholder farm.
- If that labor is available to the smallholder producers, and
- If not available, what are the rational compromises farmers make in adjusting the high-quality research results to their limited operational capacity.
Does this fall into an administrative void between the agronomists or other applied bio-scientists and the social scientists assisting small holder communities? Until this is recognized and addressed will the high quality CGIAR center research receive limited acceptance by the smallholder farmers? It is very interesting that the Baker/Hurd Yield Gap analysis started at IRRI some 40 years ago has never address labor as major contributor to the yield gap analysis. I think it would make a major difference and explain much of the yield gap and low level of research acceptance for the CGIAR quality research results.
The problem is exasperated by limited dietary energy available to most smallholder farmers resulting in research & extension attempting to compel smallholder farmers to exert up to twice their available dietary calories. It is interesting that we acknowledge that smallholder farmers are poor and hungry but never factor that as a major hindrance to their ability to take advantage to the quality research being done for their benefit. It is equally interesting that there is very little hard data on the calories available to poor hungry smallholders, let alone how that compares to the 4000 kcal/day needed for a full day of agronomic fieldwork. What little is available typically shows smallholder farmers have access to only about 2000 – 2500 kcal/day, barely providing the basic metabolism requirements, leaving little energy for field work such as the 300 kcal/hr. needed for basic manual land preparation. The result is to prolong periods of crop establishment against the declining yield potential associated with delayed crop establishment until it is no longer possible to meet family food security needs. This will again severely limit the usefulness of the high-quality research coming from the CGIAR centers and the collaborating NARSs, for much of agronomic research the effectiveness is time sensitive.
Mention is made of the MEL evaluation process. This needs to be review with care to make certain it is true evaluation process that guides future projects to better serve the beneficiaries with more effectively programs and not a propaganda tool to cover-up and promote failed programs as often appear to be the case. The concern is in both the criteria being included or excluded in an MEL evaluation and use of aggregate analysis vs percentage analysis. For example, how often do MEL analysis for agronomic programs include timing of field operations that can be highly visible and would pick up the labor constrains mentioned above, and then guide programs to facilitate smallholder access to contract mechanization that would expedite crop establishment, improve timing, compliance with research recommendations, yields, and enhance family food security? I have never seen it included. Looking at USAID MEL effort on reliance on producer organizations to assist smallholder, when you do an aggregate analysis, you come up with some very impressive numbers that only measures the massiveness of the total program, while saying little or nothing about effectiveness or appreciation of the effort to the beneficiaries. However, if shift the same data to percent such as:
- Percent of potential beneficiaries actively participating,
- percent of the community market share,
- percent of side selling, or
- percent increase in family income,
the impact on the individuals and communities can be trivial. In this case the MEL will represent a little bit of monitoring, but no real evaluation, and the only learning is “how to deceive the underwriting taxpayers”. However, such an MEL analysis assures the continuation and entrenchment of programs the beneficiaries are avoiding like the black plague or perhaps in today’s context COVID-19. This would really be a major disservice the beneficiaries, while assuring the implementer future opportunities.
Please allow me to support the above concerns with some webpages from the Smallholder Agriculture website I manage.
https://smallholderagriculture.agsci.colostate.edu/
For operational limits and dietary energy balance:
https://webdoc.agsci.colostate.edu/smallholderagriculture/OperationalFeasibility.pdf
https://agsci.colostate.edu/smallholderagriculture/calorie-energy-balance-risk-averse-or-hunger-exhasution/
https://agsci.colostate.edu/smallholderagriculture/ethiopia-diet-analysis/
For MEL
https://agsci.colostate.edu/smallholderagriculture/mel-impressive-numbers-but-of-what-purpose-deceiving-the-tax-paying-public/
https://agsci.colostate.edu/smallholderagriculture/appeasement-reporting-in-development-projects-satisfying-donors-at-the-expense-of-beneficiaries/
https://agsci.colostate.edu/smallholderagriculture/perpetuating-cooperatives-deceptivedishonest-spin-reporting/
https://agsci.colostate.edu/smallholderagriculture/request-for-information-basic-business-parameters/
https://agsci.colostate.edu/smallholderagriculture/vulnerability-for-class-action-litigation-a-whistleblowers-brief/

Thank you
Dick Tinsley
Prof. Emeritus,
Soil & Crops Sciences
Colorado State University
- read on a separate page
Graham Thiele
International Potato Center
Peru

I was the Director of the CGIAR Research on Roots, Tubers and Bananas until December 2021 and participated in and benefited from two rounds of CGIAR Research Programmes (CRP) reviews. I believe its really essential to build a stronger evaluation culture in CGIAR and learn from what we do.
Echoing some earlier contributions below, for a balanced evaluation its vitally important that the CGIAR sustains its investment into outcome case studies and in particular in impact studies. The latter have declined significantly in recent years in CGIAR and are not produced ordinarily as a part of routine projects or end of intervention surveys but need specialized methods and their own dedicated funding. Otherwise there is a risk with the ideas in the technical note that we use state of the art methods and increased precision of bibliometric analysis which is what we have available and always continues to be produced but we miss the rounded picture that these other studies provide. Its reminiscent of the the anecdote about someone looking under the lamp post for something they dropped when actually it fell elsewhere because this is where the light shines most strongly.
Its also possible to use social network analysis of publications, as one tool for qualitative analysis, to explore who collaborates to publish and what is their social and organizational context as a complement to bibliometric analysis, particularly relevant to the legitimacy dimension. We have a publication on this in the pipeline with experts on social network analysis at University of Florida.
- read on a separate page
Lennart Raetzell
Manager Syspons GmbH
Germany

Dear all,
thank you very much for your valuable contributions. It is a pleasure to read them. I would also like to draw your attention to another aspect of the evaluation of science, technology and innovation in a development context, namely the uptake of successful research among the target group of the research. In our opinion, here at Syspons (www.syspons.com), this is one of the most important aspects as it actually is one of the pathways to reach developmental impact via research. In this regard, we were in the fortunate position to conduct a thematic impact evaluation for VLIR-UOS (https://www.vliruos.be) on pathways to research uptake mainly in the field of agriculture. In this regard, we were able to test and build a model for generating more research uptake in research projects.
I am happy to share the link and open up for further discussions: https://cdn.webdoos.io/vliruos/753d44b984f65bbaf7959b28da064f22.pdf
Best regards,
Lennart
- read on a separate page
Etienne Vignola-Gagné
Science-Metrix / Elsevier
Canada

As one of the Science-Metrix coauthors of the Technical Note listed in the background documentation for this discussion, I wanted to provide a bit more context about the general orientation that my coauthor, Christina Zdawczyk, and I gave to this framework for the deployment of bibliometric strategies as part of CGIAR QoS evaluations.
You may notice that the ubiquitous publication counts and citation impact indicators were afforded only a small portion of our attention in this Technical Note. One of our intentions with this note was to showcase how bibliometrics now offers indicators for a much broader range of dimensions, including cross-disciplinarity, gender equity, preprinting as an open science practice, or the prevalence of complex multi-national collaborations.
That is, there is (in our opinion, often untapped) potential in using bibliometrics as indicators of relevance and legitimacy. Simultaneously, some of the bibliometrics we have suggested can also be used as process or even input indicators, instead of their traditional usage as output indicators of effectiveness. For instance, bibliometrics can be used to monitor whether cross-disciplinary research programs are indeed contributing to increased disciplinary integration in daily research practice, considering that project teams and funders often underestimate the complexity of such research proposals. Moreover, dedicated support is often required for such projects, at levels that are seldom properly planned for (Schneider et al 2019). With the caveat that output publications to be monitored are only available later in the project life cycle, findings on cross-disciplinarity can help modulate and re-adjust research programs and associated support instruments on a mid-term timeline.
As you can see, our view is very much one of using program evaluation tools, including bibliometrics, to improve research and innovation governance and support mechanisms for project teams, rather than to rank performances.
Hope you enjoyed or will enjoy the read.
Etienne
Senior analyst, Science-Metrix (Elsevier)

Reference
Schneider, F., Buser, T., Keller, R., Tribaldos, T., & Rist, S. (2019). Research funding programmes aiming for societal transformations: Ten key stages. Science and Public Policy, 46(3), pp. 463–478. doi:10.1093/scipol/scy074.
- read on a separate page
Sonal D Zaveri
Founder and Coordinator Gender and Equity Network South Asia, GENSA with Community of Evaluators South Asia
India
I share a few of my observations and learnings from my over 30 years as an evaluator practitioner from the Global South. (1) These views also echo some of the entries, reflections and blogs already published in this thread. Although we, either as evaluators or commissioners of evaluations believe that the key evaluation questions should guide the choice of methods and approaches – in reality the evaluation choices (whether we explicitly state it or not) are privileged. The privilege may be because of power of expertise, position or resources. Robert Chambers asks pertinently about “who-whose” reality, who decides, and whose change (2) is more important than others. Answering any of these questions leads us to questioning who has more power to decide – the power could be visible or invisible or hidden . Though there is no question about the importance of quality of science, innovation and technology and the rigor in evaluations, one may want to also ask if this necessary condition is a sufficient one. When we frame the discussion about evaluation being a transformative process and the importance of accepting different world views, we also acknowledge the value of indigenous knowledge and the need for decolonization (3). Unfortunately, we often design evaluations at a distance and although we may use participatory methods and tools, unless the findings are used and owned by people and especially those with little voice, in the program, we cannot claim that evaluations have led to public good or used for the benefit of all peoples. Central to this argument is to what extent do values play in our evaluation? With our greater understanding of racism, gender inequities, and various social cleavages it would be difficult to accept the quality of science bereft of any values related to human rights, inclusion and equity.
Feminist thought recommends methods and tools that can nuance the intersectionality of vulnerability since lived experiences may vary dramatically in any intervention design depending on one’s standpoint and intersecting inequities. It is possible that an emphasis on QoS and innovation could address these concerns but one needs to be particularly vigilant about it and perhaps instead of asking “Are we doing things right?” we must ask, “Are we doing the right things?” A case in point is an example ( 4 ) from India where mango producers had lost 40% of their produce in transit, compelling scientists to introduce a suite of nanomaterials that could be sprayed on the fruit to extend its shelf life. The example indicates that there was a pressing societal challenge which was fast tracked, not necessarily losing the quality of the intervention but perhaps being unconventional in the timeliness and the manner the solutions were rolled out. Also, the solutions were context-specific and highlight that evaluation must measure what is important to measure for communities and peoples, rather than elegantly present evidence that may not be used (5). Southern researchers feel quite strongly about the need for research to be relevant (italics are mine) to topical concerns, to the users of research and to the communities where change is sought. (6) Uptake and influence are as critically important as is the quality of research in a development context.
Even in seemingly open forums such as the internet where one is theoretically free to participate without boundaries, research and related innovations have shown how in a seemingly free and open networked society, power is linked to who controls and makes use of communication pathways and its products. So, who decides what information and knowledge is produced, shared and used and by whom; and finally whose values are represented impact the nature of the knowledge (artefacts) produced. Access is not the same as participation. Access particularly or suggests the ability to make use.... refers to the ability to make use of the information and the resources provided. Also, One may have access to but have no control over resources, which means that participation is limited particularly in decision making It is likely that the ones who are silent and have the least privilege, may actually have the insight and knowledge that is valuable. Women as traditional bearers of local and indigenous knowledge find themselves cut off from the networked society, where information, communication, and knowledge are ‘tradeable goods’ ( 7).
In summary, if we are unable to address the underlying power asymmetries, then the rigour of our science, research and evaluations, though fulfilling an important purpose, will fall short to address the complex demands of our times, in finding solutions to intractable problems and in having our values firmly entrenched in social justice.
1. Zaveri, Sonal (2019), “Making evaluation matter: Capturing multiple realities and voices for sustainable development” contributor to the journal World Development - Symposium on RCTs in Development and Poverty Alleviation https://bit.ly/3wX5pg8
2. Chambers, R. (1997). Whose reality counts? Putting the first last. London: IT Publications. ------ (2017). Can we know better? Practical Action Publishing Ltd, Rugby
3. Zaveri, Sonal (2021) with Silvia Mulder and P Bilella, “To Be or Not to Be an Evaluator for Transformational Change: Perspectives from the Global South” in Transformational Evaluation: For the Global Crisis of our Times edited by Rob Van De Berg, Cristina Magro and Marie Helene Adrian https://ideas-global.org/wp-content/uploads/2021/07/2021-IDEAS-book-Tra…
4. Lebel, Jean and McLean, Robert. A Better Measure of research from the global south, Lancet, Vol 559 July 2018
5. Ofir, Z., T. Schwandt, D. Colleen, and R. McLean (2016). RQ+ Research Quality Plus. A Holistic Approach to Evaluating Research. Ottawa: International Development Research Centre (IDRC)
6. Singh,S, Dubey,P, Rastogi,A and Vail,D (2013) Excellence in the context of use-inspired research: Perspectives of the global South https://www.idrc.ca/sites/default/files/2021-04/Perspectives-of-the-glo…
7. Zaveri, Sonal. 2020. ‘Gender and Equity in Openness: Forgotten Spaces’. In Making Open Development Inclusive: Lessons from IDRC Research, edited by Matthew L. Smith, Ruhiya Kristine Seward, and Robin Mansell. Cambridge, Massachusetts: The MIT Press https://bit.ly/2RFEMw5
- read on a separate page
Raphael Nawrotzki
Advisor (M&E) Deutsche Gesellschaft für Internationale Zusammenarbeit (GIZ)
Germany

Dear Svetlana and the CAS Team,
We appreciate the Technical Note “Bibliometric Analysis to Evaluate Quality of Science in the Context of One CGIAR” (CAS Technical Note). We at the Fund International Agricultural Research (FIA) of GIZ have recently commissioned a Bibliometric Study by Science Metrix (see Evans, 2021) and welcome the increasing value that the CGIAR System is placing on a rigorous evaluation of science quality of their research. We view science quality as a necessary prerequisite on the way towards development impacts as innovations are first developed and tested CGIAR scientists. In the following paragraphs we would like to respond to your questions from a donor point of view.
1. What do you think are the challenges in evaluating quality of science and research?
As funders we expect our commissioned research projects to perform their work in line with the 4 key elements of Quality of Research for Development (QoR4D) including 1) Relevance, 2) Scientific Credibility, 3) Legitimacy, 4) Effectiveness. However, these criteria apply mostly to the process of “doing” research and science. However, we are also interested in results (outputs, outcomes, impacts). If we are talking about quality of science, we find bibliometric analyses very useful to determine the scientific impact of our funded work in relevant scientific fields. And the impact within a scientific field is still measured best by the number of citations that an article or book chapter receives. Within the field of science peer-reviewed publications (and the citations articles receive) are considered the “gold standard” in terms of impacts. Yet there are some challenges associated with measuring scientific impact:
Long time periods. A big challenge, which the 2020 SOAR Report (Alston et al. 2020) pointed out, is that “agricultural research is slow magic” – it can take decades for results from investments in agricultural research to become visible, but decision-makers need to demonstrate results quickly and assess project return on investment in the future to justify an increase in funding. As we also learned from the CAS Technical Note, a robust measurement of science impact in terms of bibliometrics is only possible about 5 years after a research project or portfolio has been completed. The peer-review cycle, individual journal requirements and citation status, often reflecting the readership amount, mean it takes time until the impact becomes evident, especially if the work is novel or highly innovative. The long time horizon poses challenges as we can’t use this information directly for programming.
Altmetrics. We fully understand that bibliometrics are an imperfect measure of true scientific impact. Some research can be highly influential and reaching a large audience via alternative channels including twitter, blogs, or as grey literature. This is captured in altmetrics but it is difficult to combine bibliometrics and altmetrics to get a full picture of scientific impact.
Cost effectiveness. As Science Metrix points out, and particularly for those papers with external co-authors, the fraction of support attributable to each funding source is not easily determined. Donors are often interested in “following the money,” but in measuring science quality, the direct investment is not so easily attributable to outputs and the longer-term impacts of applying and scaling the published scientific contributions ends up bringing the real “bang for the buck”. In our own science quality study (see Evans 2021), we also assessed efficiency in terms of cost-benefit. Specifically, we assessed the number of publications per million Euro invested. We then compared the cost-effectiveness of our funded research projects to those of comparable EU-financed projects and discovered that our research projects were more efficient. Yet computing these cost-effectiveness measures comes with a host of limitations. However, we were happy to see that such an indicator is also proposed as “Level 2 Priority Indicator” (E10) suggested in the CAS Technical Note.
1a. What evaluation criteria have you used or are best to evaluate interventions at the nexus of science, research, innovation and development? Why?
Impact Focus. In our evaluation of science quality, we focused particularly on the OECD DAC evaluation criteria “impact” (what does the research achieve? What contribution does it make?) and to a lesser extent “efficiency” (how well were resources used, how much was achieved for amount spent?). Both evaluation criteria, impact and efficiency are of particular relevance for us as funders in terms of accountability and transparency to demonstrate that tax-payer money is used wisely.
In our evaluation (see Evans 2021), Science Metrix focused mostly on bibliometric indicators, comparing publications of our funded projects, with those of other international agricultural research (outside the CGIAR system). Contribution to SDGs, based on key-word searches and content-analysis, were also part of the analysis, in order to capture the extent to which cross-cutting issues such as gender, human rights, sustainability, resilience as well as climate change mitigation and adaptation were addressed in the peer-reviewed publications. Most bibliometric indicators sought to assess the impact that publications have made, via indicators such as average of relative citations (ARC), highly cited publications (HCP), and citation distribution index (CDI).
1b. Could a designated quality of science (QoS) evaluation criterion help capture the scientific aspects used in research and development?
Yes, indeed a designated quality of science (QoS) evaluation criterion as outlined in the ISDC technical note “Quality of Research for Development in the CGIAR Context” may be highly appropriate to evaluate research within the CGIAR framework. Reflecting CGIAR’s comparative advantage and primary focus, science quality, and not only development indicators, often outside of the sphere of influence, and mandate, of research institutes, should be reflected in evaluations. The sub-components of the QoS evaluation criterion (Relevance, Scientific credibility, Legitimacy, Effectiveness) are important to measure the quality of “doing science”. Nevertheless, we would highlight that such a criterion should always be accompanied by an evaluation of the OECD DAC criteria of impact and efficiency to also capture not just the “doing” but also the “results” of the research for development enterprise.
2. What are the methods and indicators that could work in the evaluation of science and research?
QoS. Evaluating the quality of science (QoS) evaluation criterion and its sub-components (Relevance, Scientific credibility, Legitimacy, Effectiveness) requires a mixed methods approach. As was done in the CGIAR Research Programm (CRP) evaluations, the focus will be on inputs (composition of research staff, collaborations, money, management infrastructure, policies, etc.). Qualitative research methods will be most appropriate when it comes to evaluating how relevant a certain research question is or whether the research process is perceived as fair and ethical. This may require conducting interviews with key stakeholders, and/or carrying out surveys to gather feedback and insights into enabling conditions and barriers to effective and sustainable impact.
SI. In contrast, the evaluation of science impact (SI) will require the use of quantitative analysis using sophisticated bibliometric methods and measures as outlined by Science Metrix in the CAS Technical Note. We consider all all “Level 1 Priority Indicators” (CAS Technical Note, Table 6) as highly relevant science impact indicators that we hope will be computed when evaluating the scientific impact of the current round of OneCGIAR Initiatives.
3. Have you seen monitoring, evaluation and learning (MEL) practices that could facilitate evaluations of science, technology and innovation?
Ongoing monitoring of Quality of Science (QoS) as well as Science Impact (SI) will be difficult. From our perspective both criteria need to be assessed using an evaluation format (study at a certain point in time). Our own science quality study (see Evans 2021) is an example of how SI could be assessed using rigorous bibliometric methods and measures. However, the purpose of our science quality study was to investigate the reach and impact of our funded research work on the scientific field of developmental agriculture. The study served the purpose of accountability and transparency. We did not use the findings for the “L” (learning) dimension of MEL. A true qualitative or mixed methods QoS study would be a more natural fit when the goal is to derive lessons that can be used for adaptive management and steering purposes. The CGIAR Research Programme (CRP) evaluations provide a good example how results from an evaluation could be used to improve “doing science”.

Sincerely,
Raphael Nawrotzki and Hanna Ewell (M&E Unit, FIA, GIZ)

References:
Alston, J., Pardey, P. G., & Rao, X. (2020) The payoff to investing in CGIAR research. SOAR Foundation. https://www.cgiar.org/annual-report/performance-report-2020/assessing-cgiars-return-on-investment/
Evans, I. (2021). Helping you know – and show – the ROI of the research you fund. Elsevier Connect. https://www.elsevier.com/connect/helping-you-know-and-show-the-roi-of-the-research-you-fun
- read on a separate page
Ibtissem Jouini
Senior Evaluation Manager CGIAR
Italy

Bringing together evidence from evaluations which covered –in part or exclusively– the Quality of Science (QoS) can be a strong way to shape changes towards the development agenda. Evidence synthesis is a trustworthy method but certainly not simple to implement given the variety of evidence we can find, the evaluation criteria, approaches, focus, contexts, etc.
The QoS theme was one of the major topics covered by the analysis under the framework of the Synthesis of Learning from a Decade of CGIAR Research Programs (2021).[i] One challenge the synthesis team faced was to find the specific analytical framework that best reflected the variety of evidence from the two phases of CRP implementation: 2011-2016 and 2017-2019 and reach the synthesis objectives.
To further explain this, we had to find out how information had to be categorized and serve as a reference to indicate the focus, the scales, the concepts, and related terms and definitions based on the original objectives of the synthesis and mapping of the analyses forming the core basis of the 43-document corpus of the synthesis. The QoS related levels of inquiry were converted into two main questions and four subthemes.
The two main questions for the Quality of Science (QoS) and Quality of Research for Development (QoR4D) theme are:
1. How has QoS evolved between two CGIAR Research Programs (CRP) phases along three dimensions—inputs, outputs, and processes?
2. To what extent has QoS evolved along two of the four QoR4D elements—legitimacy and credibility?
The results were structured around four subthemes: (1) QoS: Research inputs (2) QoS: Quality of research outputs (3) QoS: Research management/process and, (4) QoR4D elements: legitimacy and credibility. Along these topics, a set of Cross-cutting themes were covered: gender, climate change/environment, capacity building, external partnerships and youth.
Four key issues were addressed in the analysis of findings: (1) patterns and trends between the two phases of CRPs related to the quality of science (QoS) and research for development, achievement of sustainable development outcomes, and management and governance; (2) systemwide issues affecting CRP achievements; (3) recommendations for the future orientation of CGIAR research and innovation; and (4) key evidence gaps and needs for future evaluations.
A narrative synthesis approach was used to summarize and analyze the Learning from a Decade of CGIAR Research Programs, employing secondary source data from 47 existing evaluations and reviews. External evaluations were systematically coded and analyzed using a standardized analytical framework. A bibliometric trend analysis was carried out for the QoS theme, and findings were triangulated against earlier syntheses and validated by members of the Independent Science for Development Council (ISDC), CRP leaders, and expert peer reviewers.

[i] Report: CAS Secretariat (CGIAR Advisory Services Shared Secretariat). (2021). Synthesis of Learning from a Decade of CGIAR Research Programs. Rome: CAS Secretariat Evaluation Function. https://cas.cgiar.org/evaluation/publications/2021-Synthesis
- read on a separate page
Paolo Sarfatti
Team Leader Nutrition Research Facility
Italy

Operationalisation of the Quality of Science criterion in the 2020 CGIAR Research Programmes (CRP) reviews
The 2020 CRP reviews focused on three criteria: Quality of Science, Effectiveness (in the OECD-DAC meaning), and Future Orientation/Sustainability.
For the assessment of the Quality of Science criterion, were adopted two elements: scientific credibility and legitimacy, which are two of the four elements constituting the Quality of Research for Development Framework (Qo4RD) by the Independent Science and Partnership Council (ISPC) in 2017 and successively refreshed by the Independent Science for Development Council (ISDC). The four elements which constitute the QoR4D framework being relevance, scientific credibility, legitimacy, and effectiveness, and todayform the basis for a common frame of reference across CGIAR.
Scientific credibility requires that research findings be robust and that sources of knowledge be dependable and sound. It includes a clear demonstration that data used are accurate, that the methods used to procure the data are fit for purpose, and that findings are clearly presented and logically interpreted. It recognizes the importance of good scientific practice, such as peer review.
In the 2020 CRP reviews the evaluation of scientific credibility covered outputs, where these consisted mainly of published results, germplasm, digital tools, and technical reports, as well as leadership, research staff, processes, and incentives for achieving and maintaining the high scientific credibility of those outputs. The quantitative bibliometric analysis was fully integrated with the qualitative analysis of these other elements.
Assessing scientific credibility also included, among other things, the track records of research teams, use of state-of-the-art research literature and methods, and novelty.
Legitimacy means that the research process is fair and ethical and perceived as such. This feature encompasses the ethical and fair representation of all involved and consideration of the interests and perspectives of intended users. It suggests transparency, sound management of potential conflicts of interest, recognition of the responsibilities that go with public funding, genuine involvement of partners in co-design, and recognition of partners’ contributions. Partnerships are built on trust and mutual commitment to delivery of agreed-upon outcomes.
In the 2020 CRP reviews, the evaluation of legitimacy focused on the analysis of how CRPs partnerships were effectively built and functioning on the basis of mutual understanding, trust, and commitment, with clear recognition of each one’s perspective, needs, role and contribution. Robust multi-stakeholder partnerships should be good indicators for assessing research legitimacy. Assessments of fairness and the ethical aspects of research implementation have been standard features of the reviews.
The 2020 CRP Reviews demonstrated that by adopting mixed-method approach, it is possible to evaluate quality of science using rigorous quantitative bibliometrics analysis in combination with the qualitative assessment of many other important elements of research for development initiatives.
- read on a separate page
Jillian Lenne
Consultant Independent consultant
United Kingdom

Evaluation of quality of research for development
Having been involved in reviewing and evaluating agricultural research for development projects and programs for several decades, I would like to share some observations.
Value of bibliometrics and almetrics
In spite of some of the negative coverage of bibliometrics in current literature, they have an important function in evaluating the quality of published agriculturel for development research papers. Published papers already have passed a high quality threshold as they have been peer-reviewed by experienced scientists. Most international journals have rejection rates of over 90% - only the highest quality papers are published. Bibliometrics provide a means to further assess quality through number of citations, journal Impact factor (IF), quartile ranking and h-indices of authors among other bibliometrics. Citations and h-indices reflect the quality of the published research within the scientific community. Altmetrics demonstrate interest in the paper among the authors’ peer group. The recent publication by Runzel et al (2021) clearly illustrates how combinations of bibliometrics and altmetrics can be successfully used to evaluate the quality of almost 5000 papers published by the CGIAR Research Programs during 2017-2020. The Technical Note – Bibliometric analysis to evaluate quality of science in the context of the One CGIAR greatly expands the number of potential bibliometrics that could be used to evaluate quality.
Are there alternatives to citations and IF? The giant scientific publishing companies such as Elsevier use citations and IFs to monitor the quality of their journals. Higher IF translates into higher sales of journal subscriptions. As such companies own most of the scientific journals, any alternatives would need to be endorsed by them – this is unlikely as they seem to be happy with the status quo. Currently there do not appear to be any recognized alternatives. A recent paper by Slafer and Savin (2020) notes that the quality of a journal (IF) as a proxy for the likely impact of a paper is acceptable when the focus of the evaluation is on recently published papers.
Importance of qualitative indicators
Qualitative indicators of research quality are just as important as bibliometrics and other quantitative indicators and should always be used alongside bibliometrics. The 2020 evaluations of the CGIAR Research Programs (https://cas.cgiar.org/publications) effectively used a range of qualitative indicators to evaluate inputs, processes and outputs under the umbrella of the Quality of Research for Development Framework using the assessment elements: relevance, credibility, legitimacy and effectiveness.
IDRC recently revised its quality of research assessment – firmly anchored in qualitative assessment – to more effectively assess quality in a development context (IDRC, 2017). Of interest is the move to use indicators that look at positioning for use. IDRC has successfully used the RQ+ Instrument to evaluate 170 research studies (McClean and Sen, 2019).
Subjectivity in qualitative evaluation cannot be eliminated but it can be reduced by employing a team of evaluators and by better defining the criteria, indicators and descriptions.
Scientists often raise the issue that they are most interested in the impact of their research rather than its qualitative assessment. Evaluation of effectiveness in the context of positioning for use allows assessment of potential impact through indicators such as stakeholder engagement, gender integration, networking and links with policy makers.
Integrating quantitative (including bibliometrics) and qualitative indicators
The on-going development and refining of quantitative and qualitative indicators provides the potential to integrate them to provide more comprehensive evaluation of quality of research for development. This is an exciting area for future evaluations.
References
IDRC (2017) Towards Research Excellence for Development: The Research Quality Plus Assessment Instrument. Ottawa, Canada. <https://www.idrc.ca/sites/default/files/sp/Documents%20EN/idrc_rq_asses…;
McClean R. K. D. and Sen K. (2019) Making a difference in the real world? A meta-analysis of the quality of use-oriented research using the Research Quality Plus approach. Research Evaluation 28: 123-135.
Runzel M., Sarfatti P. and Negroustoueva S. (2021) Evaluating quality of science in CGIAR research programs: Use of bibliometrics. Outlook on Agriculture 50: 130-140.
Slafer G. and Savin R. (2020) Should the impact factor of the year of publication or the last available one be used when evaluating scientists? Spanish Journal of Agricultural Research 18: 10pgs.
Jill Lenné
Editor in Chief, Outlook on Agriculture and Independent Consultant
- read on a separate page
Serdar Bayryyev
Senior Evaluation Officer FAO
Italy

The need to evaluate science, technology and innovation in a development context
The significance of science, innovation and research in supporting global efforts towards more sustainable and climate-friendly development is growing. There is an urgent need for relevant science, quality research and innovations that are ground-breaking, as the world is experiencing new and unprecedented challenges and crises. Evaluation of the quality of evaluation and research is essential in determining the usefulness of and effectiveness of science, innovation and research activities. Evaluation findings should help decision-makers in determining important priority areas for further investigation and facilitate the decision-making on allocating resources for future research activities.
Key limitations
The evaluation of science and research is, however, quite complicated, and facing numerous methodological challenges. For example, assessment of the relevance and significance of scientific or research products is mostly based on the use of bibliometric methods. This is a quantitative method that may in fact produce solid evidence-based findings, yet its use is constrained by major limitations. For example, not all science, innovation and research products are included and properly recorded in the bibliographic databases, or not even published, hence not all products can be assessed.
The bibliometric methods often are based on calculating average number of citations, which also presents the basis for some biases. For example, there is sometimes an overly exaggerated attention to a specific author, who is known for previous work in a specific field or is affiliated with institutions that have strong political or financial support. In terms of citations, some authors may also deliberately exclude certain reference materials from their publications. Henceforth, whenever the bibliometric data analysis is used, it should be used with caution and should be combined with the use of other methods for validity purposes.
The other major limitation is that in today’s complex world of science and innovation, there are various standards or criteria of assessing quality of research, science and innovation in various parts of the world, and various parts of science and innovation.
Assessment of science and research products may also be biased due to the differences in political affiliation, beliefs, culturally or religiously-based perceptions of those who undertake these assessments or evaluations.
Key considerations
As this stream of evaluation function is still evolving, there are few key considerations that need to be taken into account in undertaking relevant evaluation, or in developing appropriate evaluation tools and methods.
Assessing relevance/significance of science and research.
Assessment of relevance or significance of science, innovation and research products need to take due consideration to the context in which these products are to be used. What works in one context may not be suited for the other, and what constitutes innovation and ground-breaking science varies substantially depending on the intended use or users.
Assessing effectiveness (or quality)
In assessing the effectiveness of the research and scientific analysis, the key is to assess “influence” of these activities, or extent to which the science, innovation and research products have influenced the policies, approaches or processes.
Assessing the degree of “networking”, i.e. the degree to which the researchers and scientific institutions have interacted with all relevant stakeholders, including those that may have had a “negative” or opposing stance to the subject research theme/topic.
Assessing Transformational nature
In today’s world, perhaps, the most important criteria for assessing the relevance, use and efficiency of science, innovation and research activities, is whether these activities cause truly transformational change, or at least trigger important policy discourse on moving towards such transformational change.
The above-written are suggestions for consideration, that are aimed at stimulating further feedback into this important discussion.
Kindest regards,

Serdar Bayryyev
Senior Evaluation Officer
- read on a separate page