An AI-generated aerial view of a North Atlantic right whale with its head partially out of the water and sea spray over its body
A North Atlantic right whale generated by AI. Credit: 51爆料 MaRRS Lab

Scrolling through social media, you may have dallied on reels of Leonardo DiCaprio dancing or Tom Cruise crooning, only to realize they鈥檙e spoofs created with artificial intelligence. Hyper-realistic videos and images like these 鈥 also called deepfakes 鈥 are notorious for celebrity pranking. But the technology has serious scientific applications, too. In the field of ecology, for example, AI doppelg盲ngers of rare species could improve efforts to understand, monitor and protect them.

Specifically, wildlife deepfakes could help train AI models to detect wildlife in footage from satellites, planes and drones. Ecologists increasingly rely on such birds-eye imagery to study species behavior and population trends.

鈥淲e are truly in the age of big data when it comes to remote sensing in ecology and conservation,鈥 says Dave Johnston, director of the  at 51爆料鈥檚 Nicholas School of the Environment. 鈥淥ver the past two decades, our ability to collect high-resolution remote-sensing imagery has grown exponentially, largely due to advances in drone technology and increased satellite capabilities.鈥

Augmenting Data

Traditionally, researchers had to use their own eyes to scour satellite and aerial images for target species. Now, AI detection tools can expedite the process. The key is in the data used to train the computer models. The models need to 鈥渟ee鈥 lots and lots of realistic examples of a species to know what to look for in field footage.

For some common wildlife, copious footage exists, so assembling training data is fairly easy. But footage is often limited for species that are rare, that blend into their surroundings or that live in inaccessible areas, such as war zones.

鈥淥ne of the big challenges in ecology is the idea of data scarcity,鈥 says Henry Sun, a 2025 51爆料 graduate with dual bachelor鈥檚 degrees in biology and marine science and conservation and a minor in computer science. 鈥淔or a species where you only have several hundred individuals, you鈥檙e just not going to have diverse enough images to be able to train a good AI detection model.鈥

What鈥檚 an ecologist to do? One promising option is to beef up scant training data with AI-generated, or synthetic, data 鈥 in essence, deepfakes. This approach, called data augmentation, could enable new ecological insights, according to a in Nature.

Sun, a former Nicholas School Rachel Carson Scholar and North Carolina Space Grant , recently investigated the topic of data augmentation for his senior thesis, which he plans to publish. Specifically, he explored whether AI could produce images realistic enough to supplement drone footage of the North Atlantic right whale, whose population has declined to . Theoretically, synthetic data could be used to help train other AI tools to detect North Atlantic right whales in real aerial footage.

Sun鈥檚 research was inspired by a larger collaboration between the Nicholas School and several Canadian organizations 鈥 including the Canadian Space Agency, Fisheries and Oceans Canada, the University of New Brunswick and the environmental consulting group Hatfield Consultants 鈥 to build a space-based detection system for North Atlantic right whales, which are notoriously elusive, in large part because of their small population.

鈥淭here鈥檚 a lot of ocean, and despite the fact that whales are coming back, there鈥檚 still a very small number of them compared to the area that you have to search,鈥 says Johnston, who was Sun鈥檚 thesis advisor. 鈥淎nd so that means we need very efficient tools to find them. But it also means that we don鈥檛 often have really good archives of data to train those models to identify them.鈥

Aerial views of an AI-generated humpback whale and real humpback whale in the ocean
One of these images depicts a real humpback whale photographed by a drone. The other image was created by A.I. Can you tell which is which? Answer: The one on the left is fake. Credits, left to right: 51爆料 MaRRS Lab; 51爆料 MaRRS Lab-under permit by NOAA

To create deepfake whales, Sun and his team used diffusion models, which generate images in response to prompts in the form of descriptive text, an exemplary image or both. Although other researchers have for use in whale detection, Sun鈥檚 team says it鈥檚 the first to use diffusion models for this purpose.

The researchers used several commercially available diffusion models that are pre-trained on reams of internet data. In other words, these base models, as they鈥檙e known, are primed to produce a variety of images in response to prompts.

Sun and team experimented with several methods of image generation, first using text prompts, then image prompts, and finally a method called fine-tuning, which included text and image prompts. Fine-tuning is a way to improve the performance of a base model by further training it on a smaller, more specific dataset.

鈥淪ometimes the diffusion model produces anatomically deformed whale images, like whales that are conjoined or whales with multiple sets of fins, which shows that it hasn鈥檛 exactly learned the most accurate representation yet,鈥 Sun explains. Fine-tuning can teach the model to avoid those mistakes.

Testing Credibility

All told, the team created hundreds of aerial images of North Atlantic right whales and, for comparison, hundreds of aerial images of humpback whales. Because far more real-life footage on humpbacks is available for training generative AI, the team hypothesized that their models would produce more realistic synthetic humpback imagery.

The last step was to test the veracity of their deepfake whales. Were they credible? To answer that question, the researchers fed their photos into a Google tool called Reverse Image Search, which analyzes an input image, searches the internet for similar pictures, and spits out results. In this case, the goal was to see if Google could recognize the whales depicted in the synthetic data and return images of the same species.

An AI-generated aerial view of a humpback whale with two tales, one on either side of its body
Sometimes the models created anatomically deformed whales, like this two-tailed humpback. Credit: 51爆料 MaRRS Lab

In the fake photos produced by text or image prompts, Google mistook many North Atlantic right whales for humpbacks. By contrast, it correctly identified both species of whale in almost all of the images produced through fine-tuning.

The team also found that images of North Atlantic right whales created through fine-tuning were more accurate than those generated with text or image prompts alone.

The next phase of the research is to investigate whether synthetic whale imagery can indeed supplement training data for AI detection models. As a starting point, Sun enlisted 51爆料 undergraduate Max Niu to begin basic testing.  

鈥淢ax has been training deep-learning models using both real images and some of the fake images that I鈥檝e made,鈥 Sun says. The idea is 鈥渢o see if there鈥檚 a proportion of fake images that will benefit the model.鈥

Walking the Line

This fall, Sun will continue his studies as a Ph.D. student at the 51爆料 Marine Lab, working in the . Although he plans to turn his attention from whales to sea urchins, he is committed to helping demystify AI for researchers.

鈥淪omething that I鈥檓 extremely interested in is capacity-building for natural scientists in the realm of artificial intelligence, because I think increasingly, these are skills that everyone needs,鈥 Sun says. To that end, Sun hopes to plan AI-related outreach events, such as the hour-long session he hosted during last March about using .

As more ecologists turn to AI, however, ethical considerations will become more pressing, says Holly Houliston, a Ph.D. student with the British Antarctic Survey and the University of Cambridge, who helped supervise Sun鈥檚 work as a visiting scholar at the Marine Lab. The data centers that power AI are energy- and water-intensive, so practices like generative AI data augmentation should be used conservatively in targeted ways, according to Houliston.

鈥淵ou have to be really clear on the ecological question you鈥檙e trying to answer. For example, if you are looking at calves 鈥 baby whales 鈥 then you might want to generate more images of these younger animals because you probably only have a few. But if you鈥檝e got a balanced dataset, then maybe you don鈥檛 need to generate more synthetic imagery,鈥 Houliston explains. 鈥淭he use of diffusion models and generative AI in general have environmental impacts. Studies like this can help us ecologists understand how to responsibly use them.鈥

As Johnston notes, 鈥渢his intersection between computer science and environmental sciences is only going to grow.鈥


Building Bridges

Dave Johnston holding drone
Dave Johnston with a drone. 

In early July, announced that have received funding through the Artificial Intelligence for Metascience research program. A partnership between the university and OpenAI, the program explores how AI can accelerate scientific discovery through multidisciplinary collaborations.

Dave Johnston, director of the  at 51爆料鈥檚 Nicholas School of the Environment, is a team member on the project Consilience: AI-Augmented Interdisciplinary Research. Consilience is a voice-based AI system that supports collaboration among experts in different fields by translating terminology and uncovering novel research connections. The team will test the system through a university hackathon and a randomized controlled trial. 

Led by Brinnae Bent of Pratt School of Engineering, the team also includes Christopher Bail of Sanford School of Public Policy, Boyuan Chen of the Thomas Lord Department of Mechanical Engineering and Materials Science, Walter Sinnott-Armstrong of the Department of Philosophy and the Kenan Institute of Ethics, and Lee Tiedrich of Pratt.