How to evaluate science, technology and innovation in a R4D context? New guidelines offer some solutions

How to evaluate science, technology and innovation in a R4D context? New guidelines offer some solutions
22 contributions

How to evaluate science, technology and innovation in a R4D context? New guidelines offer some solutions


Dear colleagues,

The Evaluation Function of CGIAR would like to reopen last year’s discussion on How to evaluate science, technology and innovation in a development context? Contributions received last year were a key building block of the Evaluation Guidelines on Applying Quality of Research for Development Frame of Reference to Process and Performance Evaluations (with FAQ tab) !

In February, we organized a workshop to usher in the launch of the beta version of the Evaluation Guidelines , foster a common understanding among evaluators and subject matter experts of approaches and entry points to evaluating the quality of science (QoS) in CGIAR and in like-minded organizations, i.e. FAO, GEF and UNEP, and IDRC (see participants in Annex). The workshop allowed to draw broader lessons from assessing and evaluating QoS, and to identify opportunities to roll out and monitor the use and uptake of the Guidelines in CGIAR and beyond. First in the series of reflections from the participants is Q&A with Juha Uitto, Director of GEF evaluation office.

We would like to hear your reflections on the beta version of the Evaluation Guidelines: (make sure to read FAQs, the Spanish version to be available on 15 May)

  1. Do you think the Guidelines respond to the challenges of evaluating quality of science and research in process and performance evaluations?
  2. Are four dimensions clear and useful to break down during evaluative inquiry (Research Design, Inputs, Processes, and Outputs)? (see section 3.1)
  3. Would a designated quality of science (QoS) evaluation criterion capture the essence of research and development (section 3.1)?
  4. Do you have experience of using other evaluation criteria to evaluate interventions at the nexus of science, research, innovation and development? Please describe and cite.
  5. What are additional data collection and analysis methods that should be elaborated for evaluating science and research in process and performance evaluations? (see textbox 3, figure 8 and tables 5, 6 and 8)
  6. How can CGIAR support the roll-out of the Guidelines with the evaluation community and like-minded organizations?

Many thanks in advance!


This discussion is now closed. Please contact for any further information.
  • Thank you to all the contributors, those new and familiar with the Guidelines document, or at least various supporting knowledge products. Here is a summary of the discussion structured by core themes.

    Reflections on the Guidelines: content

    Generally, the majority of participants agreed that the new Guidelines offer some solutions to evaluate quality of science in a R4D context. In particular, contributors used terms such as well-researched, useful, clear, adaptable and flexible. A couple of contributors emphasized the importance of flexibility, to seek for a middle ground and the application of the guidelines to other organizations. Another contributor praised the Guidelines for providing an interesting conceptual framework, a flexible guide, and compendium of methods and questions that would also be useful in other evaluation contexts.

    The value of a designated evaluation criterion of Quality of Science 

    There was also consensus that the four QoS evaluation dimensions (design, input, process and output) were clear and useful with well-defined indicators, especially when using a mixed methods approach. One contributor noted that the dimensions capture a more exploratory, less standardized way of doing evaluations at the R4D nexus, enriching the depth of evaluative inquiry. Another contributor emphasised the building and leveraging of partnerships under the ‘processes’ dimension. A further contributor was excited about using the framework to design a bespoke evaluation system for her department. In addition, the three key evaluation questions recommended to evaluate the QoS, were considered appropriate for R4D projects.

    In the context of the ongoing GENDER Platform (of CGIAR) evaluation, a contributor noted the usefulness of the Guidelines as a toolbox in an Agricultural Research For Development (AR4D) context to situate QoS while assessing the key questions following five DAC evaluation criteria - relevance, effectiveness, efficiency, coherence, and sustainability. A key lesson from the evaluation team in applying the guidelines was that they straddled both the perspectives of the evaluator lenses, and the researcher lens, with subject matter experts to unpack the central evaluation questions mapped along the four QoS evaluation dimensions.

    Several contributors requested clarity on whether the Guidelines were useful for evaluating development projects. They were developed for evaluating R4D in the context that co-designed research would be implemented in partnership with development stakeholders who would then be in a position to scale innovations for development impact. While framed around R4D interventions, we consider that the Guidelines are flexible enough to be adapted for evaluating development projects with science or research elements- the four dimensions for evaluating QoS would allow scope to bring them out. A recent CGIAR workshop discussed the retroactive application of the guidelines in evaluation of development interventions by means of two specific case studies: AVACLIM, a project implemented by FAO, and Feed-the-Future AVCD-Kenya project led by ILRI. Both case studies showcased the wide applicability of the Guidelines.

    Several contributors also raised the importance of evaluation of impact. While the scope of work by the CGIAR’s Independent Evaluation Function would not evaluate impact, the Guidelines consider the possibility (See Figure 6) of assessing the impact, towards SDGs and beyond. Likewise, in other contexts and organizations there may be wider windows for integrating focus on impacts. The new Guidelines could be deployed 3-5 years after the finalization of an intervention to assess the progress made in uptake of technologies.

    Echoing the 2022 discussion, some contributions highlighted inclusive or beneficiary focus in evaluations, namely emphasis on communities who might also be an important stakeholder in research and innovation. In a development or R4D intervention, a stakeholder analysis permits identifying beneficiaries  as key stakeholders; and the use of ‘process’ and ‘outputs’ dimensions would allow nuancing their participation and benefits from successful research and development activities.

    Facilitating Learning from Implementation and Uptake of the Guidelines

    Contributors raised issues related to the roll-out or use of the Guidelines, including:

    • Whether the single evaluation criterion of quality of science sufficiently captures the essence of research and development;
    • The usefulness of further clarifying  the differences between process and performance evaluations;
    • The need to include assumptions, specifically those that have to hold for the outputs to be taken up by the client; 
    • The importance of internal and external coherence;
    • The need to define appropriate inclusion and exclusion criteria when designing research evaluations;
    • The importance of defining the research context which is given priority in the revised IDRC RQ+.


    Several suggestions were made on how CGIAR can support the roll-out of the Guidelines with the evaluation community and like-minded organizations. Useful suggestions were also made about the need to build capacity to use the new Guidelines including training sessions and workshops, online resources (webinars, collaborative platforms), mentoring partners, and piloting the Guidelines in case studies and up and coming evaluations. In particular, capacity development of relevant stakeholders to understand and use the Guidelines would be appropriate to support a wider use and further engagement with the evaluation community.

    One contributor suggested conducting a meta-evaluation (perhaps a synthesis) of the usefulness of the Guidelines once the CGIAR used them to evaluate the portfolio of current projects. Remarkably, this is currently being done retrospectively with the previous portfolio of 12 major programs (implemented from 2012-2021) with notable improvements in clarity and definition of the outcomes. Further application of the Guidelines in process and performance evaluations across different contexts and portfolios will reveal more insights to further strengthen and refine this tool.

  • Thank you, Svetlana, for sharing the document and asking for our input. I have limited knowledge of CGIAR systems but I am sharing my observations based on my previous work on agriculture research and development programmes in the Asian context.

    Evaluation of Research and Development of agriculture and associated natural resources management interventions is not straightforward. Hence, I commend the team’s work in bringing important themes in a concise and actionable form. There are however some observations which might be useful for thinking/re-thinking to make it more inclusive and a decision-making tool for the stakeholders.

    I am just trying to limit myself to one question: i.e. ‘Do you think the Guidelines respond to the challenges of evaluating the quality of science and research in process and performance evaluations?’

    The guideline may focus more on the system perspectives and emergence: 

    The document has highlighted the changing context for evaluation in CGIAR. It has raised important issues related to the food security future with a mission to deliver science and innovation to transform food, land, and water systems in a climate crisis and also mentions transformative changes. There is still room to integrate these important components in the actual administration of the evaluation.

    The guideline may need to go beyond the technical driven to inclusive or beneficiary focus evaluation.

    The guideline mentioned some audiences and users (such as funder and implementing agencies) but there is little emphasis on communities who might also be an important stakeholder of research and innovation. There are many successful research and development activities (such as participatory plant breeding, participatory selection, participatory technology prioritization and selection) in which communities/ farmers are important stakeholders. It seems their role is a little missing in this guideline.

    The impact pathways of Research and development intervention are long and unpredictable so evaluation criteria or questions should embrace these aspects.

    Once the research outputs are generated (there will also be cases that the research may not generate the expected outputs), the technology diffusion process may take longer time due to substantial development lag and adoption process and this process may affect the realization of impact/benefits of the technology within the intervention period. This may also influence the sustainability aspects. Integration of these aspects could be a challenge in the research evaluation process.  

  • The reflections are based on my experience as a co-Principal Investigator in the Interim Evaluation of Project REG-019-18, the Nudging for Good project.

    The project entails a research partnership between the International Food Policy Research Institute (IFPRI), Pennsylvania State University/Food and Agriculture Organization (FAO), the University of Ghana, the Thai Nguyen National Hospital, Thai Nguyen University of Pharmacy and Medicine, and the National Institute of Nutrition in Viet Nam. This interdisciplinary team spans a range of disciplines, including epidemiology, nutrition, economics, and machine learning to combine cutting-edge experience in Artificial Intelligence (AI) technology.

    The research partnership was founded on the IFPRI’s experience of food systems that have shown that the timely provision of information can effectively address knowledge constraints that influence dietary choices. IFPRI also leads the research and takes on the responsibilities of data analysis and reporting on the results. Pennsylvania State University/FAO was tasked with extending their existing AI platform with additional functionality on dietary assessments and including the capability to nudge adolescents towards improved dietary practices. The country teams, including the University of Ghana and the Thai Nguyen National Hospital, Thai Nguyen University of Pharmacy and Medicine, and the National Institute of Nutrition in Viet Nam are responsible for the in-country validation and feasibility testing of the AI-based technology.

    The research entails developing, validating, and testing the feasibility of using AI-based technology that allows for accurate diagnostics of food intake. The research was based on the hypothesis that food consumption and diet-related behaviours will improve if adolescents are provided with tailored information that addresses their knowledge-related barriers to healthy food choices.

    Based on the nuances of these research partnerships, and the objectives of the evaluation, we adopted Relevance and Effectiveness from the OECD/DAC evaluation criteria and slightly redefined them to align with the Research Fairness Initiative (RFI). Why RFI?

    Lavery & IJsselmuiden (2018) and other scholars highlighted the fact that structural disparities like unequal access to research funding among researchers and research institutions and differences in institutional capacity capable of supporting research partnerships shape the ethical character of research, presenting significant challenges to fair and equitable research partnerships between high-income countries (HICs) and low and middle-income countries (LMICs). 

    In response to these challenges, the Research Fairness Initiative (RFI) was created and pilot-tested with leading research institutions around the world to develop research and innovation system capacities in LMIC institutions through research collaboration and partnerships with HIC institutions (COHRED, 2018c).  As a reporting system and learning platform, the RFI increases understanding and sharing of innovations and best practices, while improving the fairness, efficiency, and impact of research collaborations with institutions in LMICs (COHRED, 2018c).  The RFI is thus geared towards supporting improved management of research partnerships, creating standards for fairness and collaboration between institutions and research partners, and building stronger global research systems capable of supporting health, equity, and development in LMICs (COHRED, 2018a).  Reporting on research fairness have also been positively associated with opportunities to measure the relationship between the quality of research partnerships and the impact of the research itself, thus creating a platform for program planning, design, management, and evaluation that could have significant impact on the ethics and management of research programs (Lavery & IJsselmuiden, 2018). 

    Lavery & IJsselmuiden (2018) emphasized that evaluative efforts of research fairness, therefore, need to clarify and articulate the factors influencing fairness in research partnerships, apply a methodology capable of operationalizing the concept of research fairness and through the collection of systematic empirical evidence, demonstrate how research partnerships add value for participating organizations.

    Based on the above premises, and reading through the CGIAR QoR4D Evaluation Guidelines, below are my reflections:  

    1. The three key evaluation questions recommended in the guideline are appropriate to evaluate the Quality of Science (QoS) following my reflection on the evaluation questions we used to evaluate the Nudging for Good project.
    2. The four interlinked dimensions – Research Design, Inputs, Processes, and Outputs are clear and useful since they capture a more exploratory, less standardized way of doing academic evaluations – evaluative inquiry.
    3. Training and development, as well as closer engagement between the relevant stakeholders, could be an appropriate starting point for CGIAR to support the roll-out of the Guideline.  
  • Dear members/colleagues,

    My EvalForward contribution.

    Today the world’s politics and economy are driven by science and technology. The advent of the COVID-19 pandemic, HIV epidemic, global market policies, internet governance etc require analytical tools for effective evaluation. So the question is: can monitoring and evaluation measure and keep up with the advent of the science and technology era?

    Innovation has brought the world to this level of development. Therefore, innovative methodologies and data science applied in science and technology should also be used for monitoring and evaluating the progress of programs and outcomes.

    More so, research should help evaluation to be a more analytical tool for implementation strategies. Since evaluation is science-oriented because it uses research methodologies and because of its purpose to bring improvement, evaluation should be able to generate innovative ideas and concepts for the global economy perspective.

    Therefore, the importance and roles of evaluation in science and technology and programmes implementation and outcomes should be emphasized in the guidelines

    Thank you.


  • Oluchi Ezekannagha

    Oluchi Ezekannagha

    Programs Analyst CGIAR Systems office

    What are additional data collection and analysis methods that should be elaborated for evaluating science and research in process and performance evaluations?

    Beyond traditional methods such as surveys, interviews, and document review, newer approaches like data mining, sentiment analysis, social network analysis, and bibliometric analysis could be employed for more comprehensive and diversified insights.

    How can CGIAR support the roll-out of the Guidelines with the evaluation community and like-minded organizations?

    CGIAR could organize rolling webinars, workshops, CG-academic institution partnerships and training sessions to familiarize the organizations and other like-minded ones with the guidelines. Collaborative platforms could be set up for ongoing discussion, learning, and sharing of best practices. Furthermore, providing case studies and real-life examples of how these guidelines have been applied and the results they have yielded could help in their adoption.

    Best wishes


  • Thank you, Svetlana, for the opportunity to participate in this discussion. I respond to two of your questions below.

    Do you think the Guidelines respond to the challenges of evaluating quality of science and research in process and performance evaluations?

    The Guidelines appear to respond to the challenges of evaluating quality of science and research in process and performance evaluations through a flexible and well-researched framework. I am not sure if a single evaluation criterion captures the essence of research and development. I think the answer would be found in reflecting on its application in upcoming varied evaluative exercises at CGIAR, as well as reflection on previous organizational experience. This may involve identifying how it is interpreted in different contexts, and whether further development of recommended criteria may be considered for a possible second version of the Guidelines.

    How can CGIAR support the roll-out of the Guidelines with the evaluation community and like-minded organizations?

    I agree with others that workshops and/or training on the Guidelines could be a means for rolling out the Guidelines and engaging with the evaluation community. Emphasizing its flexibility and fostering reflection on its use in different organizational contexts would be productive.

    In line with my response to the first question above, I would suggest a meta-evaluative exercise be done when there has been more organizational experience applying the Guidelines. There would be obvious value for CGIAR, possibly leading to an improved upon second version. It would also be of great value to the evaluation community with CGIAR taking an important role in facilitating continued learning through the use of meta-evaluation -- what the evaluation theorist Michael Scriven has called both an important scientific and moral endeavor for the evaluation field.

    At Western Michigan University, we are engaged in a synthesis review on meta-evaluation practice over a 50-year period. We’ve come up with many examples of meta-evaluation of evaluation systems in different contexts. We assumed very little meta-evaluation was being done and were surprised to find there are plenty of interesting examples in both the grey and academic literature. Documenting such meta-evaluative work would further strengthen the Guidelines and its applicability as well as add significant value in continued engagement with the international evaluation community.

  • I participated in a workshop in Rome, Italy on February 27 and 28, 2023, which focused on the CGIAR's Independent Advisory and Evaluation Service (IAES) and their new evaluation guidelines. These guidelines are based on the CGIAR Independent Science for Development Council (ISDC)'s Quality of Research for Development (QoR4D) Frame of Reference. They provide a framework for assessing QoR4D within CGIAR and similar organizations, including criteria, dimensions, and methods.

    During the workshop, I shared my own experience in evaluating research and science while co-constructing the Agriculture, Water & Climate strategy. The bibliometric analysis for the AWC strategy followed a bottom-up approach in two phases. The first phase involved mapping existing AWC projects at UM6P (Université Mohammed VI Polytechnique). To validate the results, a triangulation process using survey findings and publication databases was conducted. However, integrating the project database with other data sources proved challenging. The second phase included benchmarking UM6P against international peers in Agriculture, Water & Climate fields using bibliometric indicators such as publication quantity, impact (citations, quartiles), and network connections. Since the university is relatively new, the analysis primarily focused on scientific publications.

    Before attending the workshop, I was unfamiliar with CGIAR, ISDC, and the Evaluation Function. However, the workshop inspired me to understand the evaluation framework better. I am now motivated to implement changes and develop a new methodology to measure research quality and impact for AWC initiatives.

    Looking ahead, two actions are proposed. First, adopting CGIAR criteria for research assessment is suggested as a new approach to measure research quality and impact. Second, assessing projects and their outputs is recommended to evaluate the effectiveness and outcomes of Agriculture, Water & Climate initiatives. These actions will also aid in designing our evaluation framework, considering the dimensions of QoS, such as the relevance and effectiveness of research programs/projects, the participation and transparency of partnerships, and the impact and sustainability of research program/project outputs.

  • Mariafernanda CATAÑO MORA

    Mariafernanda CATAÑO MORA

    Project Management Specialist and Grant Team Coordinator International Maize and Wheat Improvement Center - CIMMYT of CGIAR

    After participating in the workshop by IAES, I have now had the opportunity to review the guidelines in both English and Spanish thoroughly. I am very pleased with these guidelines; they provide a solid foundation at the operational level. When we initially started discussing the Quality of Research Development Framework, I was particularly concerned about how my role could contribute to this endeavor; it all seemed very high-level. But, after reading this document, I can envision myself implementing and adopting these practices in my job.

    Overall, I am highly satisfied with the document; it was carefully crafted, well-translated, and resourceful. Excellent work!

  • I enter into this rich discussion from the PoV of my experience managing the ongoing evaluation of the CGIAR GENDER (Generating Evidence and New Directions for Equitable Results) Platform which is being coordinated by IAES.  And from this vantage, I explore in some detail questions 2, 3 and 1 beginning with an overview of the evaluation context, the design of the evaluation and then capping with one key takeaway from applying the guidelines.


    The guidelines present four interlinked dimensions (research design, input, processes and outputs) which consider the many variables in the delivery and uptake of high-quality research, framed by the  QoR4D frame of reference and OECD DAC criteria. The application is by no means linear. The ongoing GENDER Platform evaluation served as a test-case. The evaluation aims to assess the Platform’s progress, document lessons learned, and provide forward-looking recommendations as it transitions to encompass an expanded mandate as an impact Platform.

    In answering the central evaluation questions, although the evaluation was not framed around an explicit “quality of science” (QoS) criterion, the guidelines were a useful toolbox in an agricultural research for development (AR4D) context to situate QoS while assessing the key questions following five DAC evaluation criteria - relevance, effectiveness, efficiency, coherence, and sustainability. The Platform evaluation, which was conducted by a multidisciplinary team led by an evaluator, integrated participatory, theory-driven, utilisation-focused and feminist approaches and deployed mixed-methods in data collection.

    By way of context, the GENDER Platform synthesizes and amplifies research, builds capacity, and sets directions to enable CGIAR to have an impact on gender equality, opportunities for youth, and social inclusion in agriculture and food systems. The Platform is organized around three interconnected modules (Evidence, Methods and Alliances).The guidelines were applied to the Evidence Module, which aims to improve the quantity and quality of gender-related evidence.


    In terms of the evaluation design, in line with the inception report,  the evaluation team developed sub- evaluation matrices that addressed the Platform’s Modules’ impact pathways and results framework. These sub matrices fed into an overarching  parent evaluation matrix. The matrices, overarching matrix (and other outputs) were reviewed by a team of external peer reviewers, including some members of IAES’s evaluation reference group, and by the Platform team to strengthen its validity. The reviews informed the subsequent revisions of the documents. 

    The four QoS dimensions have been integral in helping to evaluate the evidence module - these four dimensions were mapped to the focal evaluation criteria. Subject matter experts that led the Evidence module assessment systematically applied it to assess the module in a nested manner based on the mapping they conducted. Each of the Platform’s  three module assessments then fed into the overarching Platform evaluation in a synergistic manner.


    From this test case, one of several takeaways is that the convergence of different lenses is pivotal in applying the guidelines. The multidisciplinary evaluation team, in this case, benefited from both evaluator lenses -led by an evaluator, and “researcher lens”, with subject matter experts who were (gender) researchers that led the assessment of the Evidence module. The evaluation team in applying the guidelines straddled both perspectives to unpack the central evaluation questions mapped along the four QoS dimensions. Although multidisciplinary evaluation teams may not always be feasible in some contexts, in applying the guidelines, such multidisciplinarity may prove handy. However, it is essential that such teams invest sufficient time in capacity sharing and cross-learning to shorten the learning curve it may take for the convergence needed to effectively assess “QoS”, or mainstream it along the standard OECD DAC criteria as was done in this case. And the guidelines (and other derivative user-friendly products) can serve as a ready-to-use resource in both cases.

    High-quality research can be as challenging to assess as it is to deliver. Researchers, program managers, and other actors may also find the guidelines useful as a framing tool for thinking through evaluator perspectives at the formative and/or summative stages of the research or programming value chains for more targeted implementation and programming strategies. Application of the guidelines in process and performance evaluations across different contexts and portfolios will reveal insights to further strengthen and refine the tool.

    Finally, the GENDER Platform evaluation report and Evidence module assessment that details the application of the guidelines are soon to be released by IAES.

  • Dear Svetlana,

    With reference to the E-mail dated 7th June 2023 and in response to your questions raised on 10th May 2023 in the EvalForward platform, I am attaching my opinion in a word file for your perusal. 


    Parshuram Samal

    Ex-Principal Scientist, ICAR-NRRI, India

  • Dear all,

    I appreciate the CGIAR Evaluation Guidelines as a reference framework providing insights, tools and guidance on how to evaluate the quality of science, including in the context of development projects that include scientific and research components. This is specifically my perspective, as an evaluator of development projects that may include research components or the development of scientific tools to enhance the project effectiveness in the agricultural sectors. I preface that I have not analyzed the guidelines and related documents in depth and that I am external to the CGIAR. However, I believe the guidelines are an important contribution.

    In the absence of similar guidelines to evaluate the quality of research and science, I realize that my past analysis was somewhat scattered across the 6 OECD/DAC criteria, even though encompassing most of the dimensions included in the guidelines. Under the criterion of relevance, for example, I analyzed the rationale and added value of the scientific products and the quality of their design; as well as the extent of “co-design” with local stakeholders, which the guidelines frames as “legitimacy”, within the QoS criterion. Under efficiency, I analyzed the adequacy of inputs of research, the timely delivery of research outputs, the internal synergies between research activities and other project components, and the cost-efficiency of the scientific products. Most of the analysis focused on the effectiveness, and usefulness of the scientific tools developed, and on potential sustainability of research results. It was more challenging to analyzescientific credibility”, in the absence of subject-matter experts within the evaluation team. This concept was analyzed mostly basing on stakeholders’ perceptions through qualitative data collection tools. Furthermore, scientific validation of research and scientific tools is unlikely to be achieved within the project common duration of 3 years. Therefore, evaluations may be conducted before scientific validation occurs. The guidelines’ four dimensions are clear enough and useful as a common thread for developing evaluation questions. I would only focus more on concepts such as “utility” of the scientific tools developed from the perspective of project final beneficiaries; “uptake” of scientific outputs delivered by stakeholders involved and “benefits” stemming from research and/or scientific tools developed. In the framework of development projects, scientific components are usually quite isolated from other project activities, with few internal synergies. In addition, uptake of scientific outputs and replication of results is often an issue. I think this is something to focus clearly through appropriate evaluative questions. For example, QoS evaluation questions (section 3.2) do not focus enough on these aspects. EQ3 focuses on how research outputs contribute to advance science but not on how research outputs contribute to development objectives, what is the application of research findings on the ground or within policy development, what is the impact of outputs delivered, which, in my opinion deserves increased focus and practical tools and guidance. Besides this, which is based on my incipient reading of the guidelines, I will pilot the application of the guidelines in upcoming evaluations, in case the evaluand includes research components. This will help fine-tuning the operationalization of the guidelines through practical experience.

  • Many thanks for sharing the document and for the opportunity to comment. I looked at the guidelines from the perspective of an evaluator with limited knowledge of the CGIAR system embarking on a new evaluation.

    For someone in my position, the guidance provides accessible background on the CGIAR approach, definitions, and so on, and useful links to other relevant materials.  The guidance overall provides an interesting conceptual framework in Ch 3, a flexible guide in Ch 4, and compendium of methods and questions that would also be useful in other evaluation contexts.

    It’s challenging to design a streamlined approach to look at impact over extended timeframes of a such a wide range of research outputs.  There is perhaps too much emphasis on uptake via formal academic publications. This is somewhat balanced by the efforts to look at impacts in real time in systems research and there are some excellent (CG) examples of multidisciplinary studies in this regard.  I didn’t see any reference to use of evaluation case studies. These proved useful in grounding an earlier CGIAR programme evaluation I was involved with and captured pathways to development outcomes that may not be reflected in formal literature or project reporting.

  • I think the guidelines are a set of recommendations for evaluating projects or organisations. If CGIAR wants to help roll them out, they could perhaps promote them to their community and the organisations they support.

    • They could also organise training sessions to help people understand how to apply the guidelines in their day-to-day work.
    • They could create online resources, such as videos or guides, to help people better understand the guidelines and how to apply them.
    • They could also work with partners to develop tools and methodologies for evaluating projects or organisations using the guidelines
    • These events could also be used as a platform to share examples of projects or organisations that have successfully applied the guidelines and to discuss lessons learned.
    • They could work with partners to develop mentoring programmes to help organisations apply the guidelines and improve over time.
    • Finally, they could organise events to promote the guidelines and create opportunities for stakeholders to meet and exchange ideas on how to apply them in their work.


    As an agricultural technician with a lot of experience I would suggest helping or setting up resource people who will be able to explain the guidelines to stakeholders and answer their questions. You could also contribute to the creation of resources to help people better understand the guidelines and their application. You could propose that the CGIAR work with other organisations to develop tools and methodologies for assessing projects or organisations using the guidelines. You could also propose that the CGIAR organises events to promote the guidelines and create opportunities for stakeholders to meet and exchange ideas on how to apply them in their work.

  • How can CGIAR support the roll-out of the Guidelines with the evaluation community and like-minded organizations?

    I believe that CGIAR can help like-minded organizations use the guidelines by emphasizing its best feature—flexibility.

    Flexibility is necessary. The guidelines were informed by the work of CGIAR, which is tremendously varied. A common evaluation design would not be appropriate for CGIAR. Neither would it be appropriate for most like-minded organizations.

    Flexibility is a middle ground. Instead of using a common evaluation design, each project might be evaluated with one-off bespoke designs. Often this is not practical. The cost and effort of individualization  limits the number, scope, and timeliness of evaluations. A flexible structure is a practical middle ground. It suggests what like-minded organizations and their stakeholders value and provides a starting place when designing an evaluation.

    Flexibility serves other organizations. The very thing that makes the guidelines useful for CGIAR also makes it useful to other organizations. Organizations can adopt what is useful, then add and adapt whatever else meets their purposes and contexts.

    Perhaps CGIAR could offer workshops and online resources (including examples and case studies) that suggest how to select from, adapt, and add to its criteria. It would not only be a service to the larger community, but a learning opportunity for CGIAR and its evaluation efforts.

  • Thanks, Seda for your important question. As the Guidelines state several times, they were informed by the International Development Research Centre (IDRC) RQ+ Assessment Instrument ( Hence some useful ideas and suggestions from a development organization are an integral part of the Guidelines.

    Perhaps the easiest way to answer your question is to use Table 7 on Pg. 19 Qualitative data themes, indicators per Quality of Science dimension with assessment criteria. This Table was developed for evaluating CGIAR research for development projects. As far as I can see, most of the themes and indicators of quality in a science-based research for development project are just as relevant to evaluating quality in a development project. Under design, as an evaluator I would want to know whether the design was coherent and clear and the methodologies fit the planned interventions. Under inputs, I would be looking at the skill base and diversity of the project team, whether or not the funding available was sufficient to complete the project satisfactorily and whether the capacity building was appropriate for planned activities and would be sufficient to provide sustainability for impact after the project finished. Under processes, my main questions would be the recognition and inclusiveness of partnerships, whether the roles and responsibilities were well-defined and whether there were any risks or negative consequences that I should be aware of. Finally under outputs, I would be interested in whether the communication methods and tools were adequate, whether planned networking included engagement of appropriate and needed stakeholders, whether the project was sufficiently aware if the enabling environment was conducive to the success of the project , where relevant – were links being made with policy makers, and whether scaling readiness was part of stakeholder engagement.

    Section 4 of the Guidelines on the Key Steps in Evaluating Quality of Science in research for development proposes methods which are also relevant to development projects. These include review of documents, interviews, focus group discussions, social networking analysis, the Theory of Change and the use of rubrics to reduce subjectivity when using qualitative indictors. The use of rubrics is a cornerstone of the IDRC RQ+ Assessment Instrument.

  • Do you think the Guidelines respond to the challenges of evaluating quality of science and research in process and performance evaluations?

    As an international evaluation expert, I am so fortunate to evaluate a large range of projects and programs covering research (applied and non-experimental), development and humanitarian interventions. Over past decade, I got opportunities to employ various frameworks and guidelines to evaluate CGIAR projects and program proposals especially with the World Agroforestry Center (ICRAF) and the International Institute for Tropical Agriculture (IITA) in Central Africa (Cameroon & Congo). For example, when leading the final evaluation of the Sustainable Tree Crops Programme, Phase 2 (PAP2CP) managed by the IITA-Cameroon, together with the team, we revised the OECD-DAC framework and criteria to include a science criterion to address the research dimensions such as inclusion and exclusion research criteria.

    When designing high-quality research protocols for a science evaluation, establishing inclusion and exclusion criteria for study participants is a standard and required practice. For example, inclusion criteria define as the key features of the target population that the evaluators will use to answer their research question (eg. demographic, and geographic characteristics of the targeted location in the two regions of Cameroon) should be considered. These are important criteria to understand the area of research and to get a better knowledge of the study population. Reversely, exclusion criteria cover features of the potential study participants who meet the inclusion criteria but present with additional characteristics that could interfere with the success of the evaluation or increase their risk for an unfavorable outcome (eg. characteristics of eligible individuals that make them highly likely to be lost to follow-up, miss scheduled appointments to collect data, provide inaccurate data, have comorbidities that could bias the results of the study, or increase their risk for adverse events). These criteria can be also considered to some extent as part of the cross-cutting themes, but still are not covered by the OECD-DAC evaluation criteria and framework, therefore can be become a challenge for evaluating quality of a science/research and performance evaluation.

    Are four dimensions clear and useful to break down during evaluative inquiry (Research Design, Inputs, Processes, and Outputs)? 

    A thorough review of the four dimensions shows that these are clear and useful especially when dealing with mixed methods approach involving both quantitative and qualitative methods and adequate indicators. Given that however context and rationale are always the best drivers of objectivity for the research design, research processes including collection of reliable and valid data/evidence to support decision-making process, it is very important that evaluators not only define the appropriate inclusion and exclusion criteria when designing a science research but also evaluate how those decisions will impact the external validity of the expected results. Therefore, on the basis of these inclusion and exclusion criteria, we can make a judgment regarding their impact on the external validity of the expected results. Making those judgments requires in-depth knowledge of the area of research (context and rationale), as well as of in what direction each criterion could affect the external validity of the study (in addition to the four dimensions).

    Serge Eric

  • Dear Seda, what a great contribution. Thanks. Proving a quality of science is important, yet insufficient. It, and the explanations around it, falls short for a organisation claiming its research programme is for development. The guidelines feel feint on this. As you say, and as I alluded to in my response, you want the them to be a stronger, more compelling read.    

  • I would like to complement the interventions below, and also the FAQ question about the quality of research vs. the development programme. As hinted by their title, the guidelines focus primarily on the scientific aspect. Of note here is the "outputs" dimension, which refers to quality of research outputs and contributions to advancement of science. I think the authors can more clearly identify how concrete development results related to particular research area can be considered as well. This is still not clear to me in the other three dimensions. Could you perhaps point us to the relevant parts of the guidelines on this?

  • Do you think the Guidelines respond to the challenges of evaluating quality of science and research in process and performance evaluations?

    On February 27 and 28, 2023, I attended a workshop in Rome, Italy about the CGIAR’s Independent Advisory and Evaluation Service (IAES)’s new set of evaluation guidelines. These build on the CGIAR Independent Science for Development Council (ISDC)’s Quality of Research for Development (QoR4D) Frame of Reference, and provide the framing, criteria, dimensions, and methods for assessing QoR4D – both within CGIAR, and in other like-minded organizations. The hybrid online and in-person event was designed to help practitioners across and beyond the CGIAR system to understand and apply the new guidelines in their own evaluative contexts.

    I found the workshop informative, resourceful and impactful. The main lessons learnt/takeaways for me from the workshop were:

    • Improved evaluation processes to assess the success and effectiveness of quality of science to provide evidence for policymaking;
    • The value of exchange of skills/experience between facilitators and participants for a particular evaluation project;
    • Sharing and documenting best practices, drawing on the knowledge and experience of IAES;
    • The development of  working ‘standards’ or principles to ensure effective engagement with donors and relevant actors; and
    • Supporting and advocating for public funding opportunities and supporting government by identifying where we can build capacity for effective innovation support and how we can effectively monitor and evaluate publicly funded projects.

    One challenge remains: how to apply the QoR4D for evaluating contribution to the SDGs

    I will use my takeaways from the workshop in my next book chapter entitled Nature-based solution to preserve the wetlands along the critical zone of River Nyong as well as in my MSc lecture.

    Dr Norbert Tchouaffe
    Pan-African institute for Development

    1. Do you think the Guidelines respond to the challenges of evaluating quality of science and research in process and performance evaluations?

    Having been involved in evaluating CGIAR program and project proposals as well as program performance over the past decade, I have used an evolving range of frameworks and guidelines.  For the 2015 Phase I CRP evaluations, we used a modified version of the OECD-DAC framework including the criteria relevance/coherence, effectiveness and impact and sustainability. The lack of a quality of science criterion in the OECD-DAC framework was addressed but evaluated without designated elements or dimensions. Partnerships were evaluated as cross-cutting and evaluation of governance and management were not directly linked to the evaluation of quality of science. For the 2020 Phase II CRP evaluative reviews, we used the QoR4D Frame of Reference with the elements relevance, credibility, legitimacy and effectiveness together with three dimensions inputs, processes and outputs.  Quality of science was firmly anchored in the elements credibility and legitimacy and all three dimensions had well-defined indicators. During the 2020 review process, the lack of a design dimension was highlighted in regard to its importance in evaluating coherence and methodological integrity and fitness as well as the comparative advantage of CGIAR to address both global and regional problems.  

    The beta version of the Evaluation Guidelines encapsulates all of these valuable lessons learnt from a decade of evaluations and, in this respect, it responds to the challenges of evaluating quality of science and research in process and performance evaluations. During its development, it has also consulted with other evaluation frameworks and guidelines to gain greater understanding of evaluation of both research and development activities. Due to this, it is flexible and adaptable and thus useful and usable by research for development organizations, research institutes and development agencies.

    Recently, the Evaluation Guidelines were used retrospectively to revisit the evaluative reviews of 2020 Phase II CRPs with a greater understanding of qualitative indicators in four dimensions. Application of the Guidelines provided greater clarity of the findings and enhanced the ability to synthesize important issues across the entire CRP portfolio.

    1. Are four dimensions clear and useful to break down during evaluative inquiry (Research Design, Inputs, Processes, and Outputs)? (see section 3.1)

    The four dimensions are clear and useful especially if accompanied by designated criteria with well-defined indicators. They are amenable to a mixed methods evaluation approach using both quantitative and qualitative indicators. In addition, they provide the flexibility to use the Guidelines at different stages of the research cycle from proposal stage where design, inputs and planned processes would be evaluated to mid-term and project completion stages where outputs would then become more important.

    1. Would a designated quality of science (QoS) evaluation criterion capture the essence of research and development (section 3.1)?

    From my own use of the quality of science criterion with intrinsic elements of credibility (robust research findings and sound sources of knowledge) and legitimacy (fair and ethical research processes and recognition of partners), it captures the essence of research and research for development. Whether it will capture the essence of development alone will depend on the importance of science to the development context.

  • Dear Svetlana,

    Hi and thanks for the opportunity to comment on the guidelines. I enjoyed reading them, yet only had time to respond to the first two questions.

    My responses come with a caveat - I do not have a research background, yet observed during a time i worked with agricultural scientists that the then current preoccupation with assessing impact among the ultimate clients group, as gauged by movements in the relative values of household assets, tended to mask the relative lack of information and interest about the capacity and capabilities of local R&D / extension systems before, during, and after investment periods. Their critical role in the process often got reduced to being treated as assumptions or risks to "good" scientific products or services.

    This made it difficult to link any sustainable impact among beneficiaries with information on institutional capacity at the time that research products were being developed. This may also have explained how believing in (hopelessly inflated) rate of return studies required a suspension of belief,  thus compromising prospects for efforts in assessing the impact of research to make much difference among decision-makers.

    Moving on - My responses to the two of your questions follow, and hope some you find interesting, useful even. 

    1.    Do you think the Guidelines respond to the challenges of evaluating quality of science and research in process and performance evaluations?

    Responding to this question assumes/depends on knowing the challenges to which the guidelines refer. In this regard, Section 1.1 is a slightly misleading read given the title. Why?

    The narrative neither spells out how the context has changed nor therefore, how and why these pose challenges to evaluating the Quality of Science. Rather, it describes CGIAR’s ambition for transformative change across system transformation – a tautology? - resilient agri-food systems, genetic innovation, and five  - unspecified  - SDGs. And, it concludes by explaining that, while CGIAR funders focus on development outcomes, the evaluation of CGIAR interventions must respond to both the QoR4D – research oriented to deliver development outcomes – and OECD/DAC – development orientation – frameworks. 

    The reasons that explain the insufficiency of the 6 OECD DAC criteria in evaluating CGIAR’s core business do not appear peculiar to CGIAR’s core business, relative to other publicly funded development aid  - the unpredictable and risky nature of research and the long time it takes to witness outcomes. Yes, it may take longer given the positioning of the CG system but, as we are all learning, operating environments are as inherently unpredictable as the results. Context matters. Results defy prediction; they emerge. Scientific research, what it offers, and with what developmental effect is arguably not as different as the guidelines suggest.  About evaluating scientific research, the peculiarity is who CGIAR employ and the need to ensure a high standard of science in what they do – its legitimacy and credibility. The thing is, it is not clear how these two elements, drawn from the QoR4D frame of reference, cover off so to say the peculiarities of CGIAR’s core business and so fill the gap defined by the 6 OECD DAC criteria. Or am I missing something?

    The differences between Process and Performance Evaluations are not discernible as defined at the beginning of Section 2.2. Indeed they appear remarkably similar; and so much so I asked myself – why have two when one would do? Process evaluations read as summative self-assessments across CGIAR and outcomes are in the scope of Performance Evaluations. Performance Evaluations read as more formative and repeat similar lines of inquiry - assessing organisational performance and operating models as well as process to Process Evaluations – the organisational functioning, instruments, mechanisms and management practices together with assessments of experience with CGIAR frameworks, policies etc.. No mention of assumptions – why given the “unpredictable and risky nature of research?” Assumptions, by proxy, define the unknown and for research managers and (timely) evaluations, they should be afforded an importance no less than the results themselves. See below

    The explanation as to the differences between the Relevance and Effectiveness criteria as defined by OECD/DAC with QoR4D in Table 2 is circumscribed. While the difference to do with Relevance explicitly answers the question of why CGIAR?, that for effectiveness is far too vague (to forecast and evaluate). What is so limiting about how the reasons why CGIAR delivers knowledge, products, and services  - to address a problem and contribute to innovative solutions  - can not be framed as objectives and/or results? And especially when the guidelines claim Performance Evaluations will be assessing these. 

    2. Are four dimensions clear and useful to break down during evaluative inquiry (Research Design, Inputs, Processes, and Outputs)? (see section 3.1)

    This section provides a clear and useful explanation of the four interlinked dimensions – Research Design, Inputs, Processes, and Outputs in Figure 3 that are used to provide a balanced evaluation of the overall Quality of Science. 

    A few observations:

    “Thinking about Comparative Advantage during the project design process can potentially lead to mutually beneficial partnerships, increasing CGIAR’s effectiveness through specialization and redirecting scarce resources toward the System’s relative strength”.…

    1)    With this in mind, and as mentioned earlier in section 2.3, it would be useful to explain how the research design includes proving, not asserting, CGIAR holds a comparative advantage by going through the four-step process described in the above technical note. Steps that generate evidence with which to claim CGIAR does or does not have a comparative advantage to arrive at a go/no go investment decision. 

    2)    Table 3 is great in mapping the QoS’s four dimensions with the six OECD/DAC criteria and I especially liked the note below on GDI. I remain unclear, however, why the Coherence criterion stops at inputs and limits its use to internal coherence. External coherence matters as much, if not more, and especially concerning how well and to what extent the outputs complement and are harmonised and coordinated with others and ensure they add value to others further along the process.  

    3)    While acknowledging the centrality of high scientific credibility and legitimacy, it is of equal importance to also manage and coordinate processes to achieve and maintain the relevance of the outputs as judged by the client. 

    4)    I like the description of processes, especially the building and leveraging of partnerships  

    5)    The scope of enquiry for assessing the Quality of Science should also refer to the assumptions, specifically those that have to hold for the outputs to be taken up by the client organisation, be they a National Extension Service or someone else. Doing this should not be held in abeyance to an impact study or performance evaluation. I say this for, as mentioned earlier, the uncertainty and unpredictability associated with research is as much to do with the process leading up to delivering outputs as it is in managing the assumption that the process along the impact pathway, once the outputs have been “delivered”, will continue. This mustn’t be found out until too late. Doing this helps mitigate the risk of rejection. Scoring well on the Quality of Science criterion does not guarantee the product or service is accepted and used by the client remembering that it is movement along the pathway, not the QoS, that motivates those who fund CGIAR.