RE: Artificial intelligence in the context of evaluation | Eval Forward

Dear Muriel and Colleagues,

Thank you for the questions and insights. I’d like to share some experiences I made while working on a large database (thousands of projects) from which a portfolio had to be extracted for the impact evaluation. The methodology utilized machine learning algorithms, a branch of AI. 

The approach was two-fold: 1) a machine learning algorithm was developed by the experts, and 2) a semi-manual search was performed. In the first case, the portfolio turned out to be smaller than expected, but the projects were very precise and on the topic of interest. Yet, the portfolio was too small to make robust statistics out of it. In the second approach, the portfolio was much bigger but many projects had to be removed from the dataset as they were marginally correspondent to the topic of interest. An expert guidance was needed to define the keywords and refine the portfolio and a programming expert to develop a customized application. Subsequent activities proved to be very fruitful with using language-based processing of the projects and available evidence on the web (web-scrapping, incl. social media).

The following methodology challenges could be observed:

  • Language-bias - the approach turns out more efficient where EN language dominates (in the project reporting, media and other communications) and in the countries which actively use it in a daily life. The semantic complexity, which can differ greatly across the languages, requires different algorithms, some of which may be more, some less sophisticated. 
  • Project jargon - it can vary greatly from project to project and some buzzwords can be used interchangeably. Also, various donors sometimes have different wording in their agendas, which needs to be reflected upon when designing the algorithms. A project can be classified as climate, but can be much more focused on construction engineering, water, waste etc., which also impacts how the machine will work with the semantics.
  • Availability of data on the web - it is more likely that for younger projects it will be more abundant than for the older ones. It can be also disproportionate, depending on the content that is produced by each project and shared. 
  • Black-box phenomenon - at some point evaluators may lose control over the algorithms. This can pose challenges to the security and governance. 
  • Database architecture - the considerations should be already made at the stage of developing datasets and databases for the reporting purposes while implementing the project. The structure and content of a database, incl. errors such as typos, has a paramount importance for the efficiency of work with AI. 
  • Costs – as the open source software poses challenges to the security, it may be helpful to invest into a customized app development and support from the IT experts. 

To conclude, I found AI very useful there where large datasets and portfolios were available for the analysis, and where the web data was abundant. It can help greatly, but at the same time requires a good quality assurance as well as dedicated expertise.

I am concerned about the privacy and security in using AI. It is already difficult to harmonize the approach in the international cooperation, especially with projects from different donors and legal systems at the international and national or even institutional levels. But we should give it a try!

Best wishes,

Anna Maria Augustyn