Introduction to the Generative AI Comparison
In the evolving world of generative AI, we’ve seen significant advancements in the last ten months. OpenAI’s ChatGPT now includes plugin capabilities, Google‘s Bard has been enhanced by Gemini, and Anthropic has introduced Claude. This article revisits a study, incorporating more test queries and a revised evaluation method to determine the best generative AI platform.
The Tested Platfor
The platforms evaluated in this study are:
- Google Bard
- Bing Chat Balanced (emphasizing informative and friendly results)
- Bing Chat Creative (focusing on imaginative results)
- ChatGPT (based on GPT-4)
- Claude Pro
It’s important to note that GPT-4 Turbo and SGE were not included in this study.
Methodology and TLDR
Each AI was asked 44 different questions across various topics. The queries were simple, reflecting typical user interactions. Bard/Gemini performed best overall, but this doesn’t imply it’s the outright winner. For instance, Bard excelled in local search queries, achieving high scores. Bing Chat, while lagging in local searches, stood out for providing extensive citations and resources, a feature less common in ChatGPT and Claude. ChatGPT struggled with current events and web access but improved significantly with the MixerBox WebSearchG plugin. Claude, while trailing overall, excelled in generating article outlines and handling large prompts.
Evaluating the Strengths and Weaknesses
Understanding each tool’s capabilities is crucial for a comprehensive assessment. Categories tested include article creation, bios, commercial queries, disambiguation, jokes, medical questions, article outlines, local searches, and content gap analysis.
Scores Across Different Queries
- Local Searches: Bard excelled by providing precise and comprehensive responses.
- Content Gaps: Bard outperformed others in identifying content improvements.
- Current Events: Bard and Bing Chat Balanced were closely matched, with Bard slightly leading.
The Scoring System
The evaluation included metrics like topical relevance, accuracy, completeness, quality of response, and use of resources. Each platform had its strengths and weaknesses, with Bing Chat standing out for its resource linking capabilities.
Summary and Conclusions
While Bard leads in several categories, it’s essential to consider each platform’s unique strengths. ChatGPT and Claude, despite certain limitations, performed well in specific scenarios. The choice of the best generative AI solution depends on the user’s specific needs and the type of queries they are addressing.