Report shows notable Gen AI tools scored high in effectiveness at an average of 81%, while there is noticeable room for improvement in optimization, rated at only 69%
CHAPEL HILL, N.C., April 16, 2024 /PRNewswire/ -- Vero AI, a first-of-its-kind analytical engine and scoreboard that helps enterprises fully harness the potential of advanced technology including artificial intelligence while minimizing risk, announced the findings of its inaugural "Generating Responsibility: Assessing AI Using AI" report. Vero AI's report provides a comprehensive assessment with measurable scores of 10 prominent generative AI models, to help enterprises understand how these tools align with responsible AI standards as determined by Vero AI's VIOLET Impact Model™. The model was created by I/O psychologists and AI technology veterans.
"As generative AI continues to rapidly evolve, organizations are increasingly challenged to grasp its benefits and potential risks," said Eric Sydell, PhD., CEO and co-founder. "Although there have been some attempts to quantify and assess components of popular generative AI models for fairness and compliance, the criteria in these studies have been too narrow in scope to provide valuable recommendations. To fully harness AI in a responsible manner, especially with the emergence of new AI regulations, a broad approach accompanied by a scientific method of measuring AI systems at scale is needed."
Using its AI-powered analytical engine Iris™, combined with human experts, Vero AI evaluated publicly available documentation of some of the more popular LLMs and generative models, including Google's Gemini, Open AI's GPT-4, Meta's LLAMA2, and more. Iris allows for automatic processing of vast amounts of unstructured information. The models were then assigned scores based on key components of the VIOLET Impact Model, including Visibility, Integrity, Optimization, Legislative Preparedness, Effectiveness, and Transparency. Vero AI's VIOLET Impact Model is a holistic, human-centered framework of elements and methodologies that provide a comprehensive and objective view of the impact of algorithms and advanced AI architectures.
The generative AI models analyzed showed varying strengths and weaknesses according to the criteria evaluated
- The average effectiveness score was 81%.
- The lowest average score was on optimization (at 69%) while visibility (76%) and transparency (77%) were less than 10 points higher. These results underscore the importance of vendors giving equal weight to all components of an algorithm when designing and building their models, and continuing to monitor them to make sure they are meeting responsible AI standards.
Generative AI models are aiming for a responsible approach to AI, but the task at hand is large
- Most generative AI models have posted responses to calls from the White House to manage the risks posed by AI, on their websites. Additionally, many have clear feedback channels for users to reach out with model experience feedback, questions, or privacy and data related concerns.
- The majority of generative AI vendors could benefit, however, from increased efforts related to transparency about their model algorithms, training data sources, and data quality, as well as documentation about how they ensure fairness and prevent biased outputs.
- Although individual scores ranged from as low as 56% in certain categories to a high of 86%, some strengths stood out for each of the evaluated models. For example:
- Google's Gemini, Meta's LLAMA2, Inflection's INFLECTION2, Big Science's BLOOM all scored high for accountability
- OpenAI's GPT-4, Cohere's COMMAND and Amazon's TITAN TEXT, AI21Labs' JURASSIC 2 have made noticeable efforts in risk management
There is a clear path forward to achieving responsible AI, prioritizing evaluation and transparency
There are many AI frameworks across the globe, even the top generative AI models did not score perfectly on the VIOLET Impact Model and demonstrated room for growth. Responsible AI results in the equitable and beneficial use and downstream effects of AI for all of humanity. As companies contemplate integrating AI into their operations, Vero AI makes the following recommendations:
- Have your model independently evaluated for effectiveness and make these results clearly and easily accessible to end users.
- Provide clear information pertaining to human annotation rules practiced in the development of the system and information outlining the scale of human annotation.
- Be transparent regarding data sources – what methods were used to ensure data quality? How were humans involved?
Derived from a global approach to AI ethics and regulation, incorporating best practice frameworks and legislation from across a variety of countries and cultures along with scientific practices, VIOLET ensures that both business effectiveness and human interests are served.
There is a full list of scores for each of the 10 generative AI models in the report. Vero AI is also offering a full version of its comprehensive report, including more details on its methodology and measurement scale for a price of $495 US dollars. Both versions of the report can be found at https://www.vero-ai.com/resources/gen-ai-report.
About Vero AI
Vero AI's platform is a first-of-its-kind analytical engine and scoreboard built to help enterprises fully harness the potential of AI algorithms and tools while minimizing risk. Through its scientifically derived, AI-assisted platform and objective framework, The VIOLET Impact Model, Vero AI ingests information of all types and creates meaningful, interpretable, reliable scores that allow users to know immediately whether their existing AI tools, algorithms, or any complex system is functioning across a range of curated, holistic criteria. Vero AI sets a new standard for AI optimization and risk mitigation at scale, empowering enterprises to thrive in an era defined by technological innovation and disruption. To learn more visit https://www.vero-ai.com/.
SOURCE Vero AI
Share this article