RE: Artificial intelligence in the context of evaluation | Eval Forward

Dear Colleagues,

Thank you, Muriel, for initiating this insightful discussion, and I greatly appreciate all the valuable contributions from everyone. 

At DevelopMetrics, we work a lot with USAID and other donors building Large Language Models to analyze evaluations and other documents for decision-making. One caution that I would like to share is that we have conducted multiple benchmarking studies and found that generalized AI like ChatGPT is not effective at understanding the nuances of development terminology. For example, if you ask ChatGPT which interventions have historically been most successful at empowering women in Pakistan, you're relying on a model that is trained on all evidence on the internet and built based on a Silicon Valley data architecture - in other words, you're perpetuating existing biases. This is why domain-specific models that are vetted by technical experts are so important. 

I hope that's helpful.

Best regards,

Lindsey

 

Lindsey Moore

CEO & Founder

+1 646-593-4568

www.developmetrics.com