AI models are already as good as experts at half of tasks, a new OpenAI benchmark suggests