Why AI needs to be trained to see the beauty of humanity
When Geoffrey Hinton, widely known as the ‘godfather of AI’, won the 2024 Nobel Prize in Physics, he sounded a warning about the future of AI. He affirmed that there is a longer-term existential threat that will arise when we create digital beings that are more intelligent than ourselves. “We have no idea whether we can stay in control,” he explained. “But we now have evidence that if they are created by companies motivated by short-term profits, our safety will not be the top priority. We urgently need research on how to prevent these new beings from wanting to take control. They are no longer science fiction.”
So when the world watched on in 2022 as OpenAI released ChatGPT, a wave of both innovation and disruption was set in motion. Yet behind this technological marvel, a storm was brewing. Since then, there have been numerous staff departures at major AI tech companies such as OpenAI, from board members and key safety researchers through to the company’s “superalignment team“, which was set up to prepare for the advent of artificial intelligence that might be capable of dominating its creators. On his departure from OpenAI, Ilya Sutskever, OpenAI’s former chief scientist and one of the company’s cofounders, said that “safety culture and processes have taken a backseat to shiny products.”
more concerning - US Bill
What drove these key industry figures away wasn’t just disagreement over timelines. It was a fundamental clash of philosophies about how AI systems should be developed. On one side stood those who believed in moving fast and iterating. On the other were those who argued that with advanced AI systems capable of reshaping society, there was no room for error.
A different vision for AI development
Despite these concerns, investment in AI technology is growing at a rate of knots. In 2024, US private AI investment grew to US$109.1 billion – nearly 12 times China’s $9.3 billion and 24 times the U.K.’s $4.5 billion, according to Stanford. In light of this – and the not insignificant concerns about the safety of AI development – there have been calls for AI systems to be trained differently in order to reduce potential existential threats to humanity.
Brian Roemmele is one such researcher proposing a radically different approach to AI safety. While others focused on containment and control, Roemmele asked a simple but profound question: What if we taught AI to love humanity?
His concept was deceptively simple. Current AI models are trained on vast datasets scraped from the internet – a digital reflection of humanity that often emphasises conflict, division, and negativity. News articles focus on disasters. Social media amplifies outrage. The training data for these language models and algorithms essentially teaches them that humans are, at best, complicated and, at worst, destructive.
Roemmele’s proposal turned this on its head. Instead of feeding AI systems a diet of digital discord, why not curate training datasets that showcase humanity at its finest? Stories of compassion. Acts of creativity. Moments of triumph over adversity. The goal wasn’t to create naive AI systems, but to ensure they understood the full spectrum of human experience, with proper weight given to our capacity for good.
This approach to responsible AI represented a fundamental shift in thinking about AI governance and safety standards. Rather than trying to cage artificial intelligence with endless rules and restrictions, Roemmele suggested we shape its very understanding of what humans are.
Some generative AI firms responded. Anthropic’s Claude 3 used “constitutional” training to elevate empathy. Cohere introduced a trustworthy AI benchmark to measure robustness against misinformation and disinformation. The core lesson remained: values flowed from the corpus long before they flowed from code.
The power of perspective in machine learning
Current generative AI systems learn patterns from their training data. If that data is skewed towards negativity, the AI’s understanding of humanity becomes similarly distorted. It’s like teaching a child about the world using only true crime documentaries – technically accurate, but hardly representative.
Consider how this affects decision-making in AI systems. An algorithm trained primarily on conflict might suggest adversarial solutions. One trained on cooperation might propose collaborative approaches. The difference isn’t just philosophical – it’s practical, affecting everything from content recommendations to autonomous vehicle behaviour.
This is where Roemmele’s vision intersected with broader concerns about AI safety research. By carefully curating training datasets to include humanity’s finest moments – scientific breakthroughs, artistic achievements, acts of kindness, examples of resilience – we could create AI systems that genuinely understood human value.
The approach addressed several critical aspects of the AI safety standard simultaneously. It reduced the risk of AI systems developing adversarial relationships with humans. It improved the robustness of AI models by exposing them to a wider range of human behaviour. And perhaps most importantly, it aligned artificial intelligence with human values from the ground up.
AI: Beware what you wish for
While values misalignments in AI training models are a concern, there is reason to be optimistic, according to Eric Lim, an Associate Professor in the School of Information Systems and Technology Management at UNSW Business School. “I have come to understand how complex and how astronomically sophisticated human consciousness is – and that it seems virtually impossible at this stage for AI to ever algorithmically replicate that without being alive like we are,” he explained.
“As such, misalignments are going to largely come from either a willful or ignorant misuse of AI from humans, or simply equally willful or ignorant AI developers in imprinting their flawed and hubristic ideology unto humanity or from both sides feeding off one another in a vicious cycle.”
A/Prof. Lim pointed to Jonathan Pageau as an expert in theological symbolism, who put forward a pertinent analogy about AI and the moral of ancient stories such as the genie in the lamp. “From these stories, we know that people who made those wishes granted by the genie always have to undo their previous wishes with their last wish, because the wishes granted by the genie never really turn out the way they intended,” he said.
In Jonathan Haidt’s book, The Anxious Generation, A/Prof. Lim observed that we have yet to learn from the effects of unleashing social media technology into the world, as the effects of technology can only be felt long after it has been deployed.
“What we sorely lack in this era isn’t expertise and technological prowess, but wisdom, fortitude and the ability to discern what to do (and what not to do). I do believe that the advent of AI might usher in the transformation of our society from a knowledge-based economy to a wisdom-based economy,” said A/Prof. Lim.
Practical steps for AI leaders
For those leading AI development teams, implementing this vision requires concrete action. The first step involves auditing existing datasets. What stories are your AI systems learning from? What view of humanity are they developing? This isn’t about censorship or creating unrealistic training data – it’s about balance.
AI governance frameworks need updating to include “perspective audits” alongside traditional safety measures. Just as we test AI systems for bias and accuracy, we should evaluate what worldview they’re developing. This becomes part of comprehensive risk management in AI development.
Leaders should establish dedicated teams focused on dataset curation. These teams would work to identify and include positive examples of human achievement, creativity, and compassion. They would ensure that training data includes stories of problems solved, communities united, and innovations that improved lives. This isn’t about ignoring humanity’s challenges – it’s about providing context that includes our capacity to overcome them.
The AI Safety Institute model needs evolution too. Current institutions focus heavily on preventing harm. While crucial, this defensive posture should be balanced with proactive measures to ensure AI systems develop a nuanced, appreciative understanding of human potential.
Investment in responsible AI must extend beyond technical safeguards to include this humanistic approach. It’s not enough to prevent AI from causing harm – we must actively shape it to recognise and amplify human flourishing.
Guidelines for AI developers and training models
For developers working directly with machine learning models, implementing Roemmele’s vision requires specific technical approaches. Start by creating balanced datasets that include positive human interactions alongside necessary negative examples. This balance is crucial for developing safe AI systems that neither ignore risks nor fixate on them.
When designing reinforcement learning systems, reward functions should incentivise outcomes that benefit humanity broadly, not just optimise narrow metrics. This means moving beyond simple efficiency measures to include considerations of human wellbeing and flourishing.
Developers should implement what might be called “empathy modules” – components designed to help AI systems understand human emotional states and respond appropriately. This goes beyond sentiment analysis to include understanding context, nuance, and the full complexity of human experience.
Technical research in AI safety must expand to include methods for measuring an AI system’s “humanitarian quotient” – its ability to recognise and value human welfare. This becomes as important as traditional metrics like accuracy or efficiency.
Cybersecurity considerations also shift with this approach. Protecting AI systems from malicious actors isn’t just about preventing technical exploits – it’s about ensuring that the AI’s fundamental understanding of humanity can’t be corrupted or manipulated.
The path forward
The departures from OpenAI weren’t just about individual career choices. They represented a broader reckoning within the AI community about the path towards AGI. The researchers who left were united by a common concern: that in the rush to build more powerful AI technologies, we were forgetting to ask fundamental questions about their relationship with humanity.
Roemmele’s approach offers a third way between unchecked acceleration and paralysing caution. By focusing on how AI systems understand humanity, we address existential risks at their root. This isn’t about slowing down AI development – it’s about ensuring that as these systems become more capable, they become more aligned with human values and appreciation for human life.
The implementation won’t be simple. It requires cooperation between AI labs, policymakers, and researchers. It demands new frameworks for evaluating AI safety that go beyond preventing immediate harm to consider long-term relationships between humans and artificial intelligence. It needs investment not just in technical infrastructure but in the careful, thoughtful work of curating humanity’s story.
As we stand at this crossroads, the choice is clear. We can continue training AI systems on datasets that emphasise human conflict and failure, creating artificial intelligence that sees us as problems to be solved. Or we can take Roemmele’s path, teaching AI to see the beauty of humanity – our creativity, resilience, compassion, and potential.