# How Large Language Models (LLMs) Are Supporting Journalists in Investigative Reporting
In the past few years, the emergence of advanced generative AI models has ignited discussions regarding their capability to replace human workers. Yet, these AI instruments are increasingly being utilized to enhance human skills rather than supplant them, particularly in domains such as investigative reporting. A recent illustration from *The New York Times* (NYT) highlights how large language models (LLMs) are becoming crucial tools for journalists charged with navigating extensive data sets, including leaked audio recordings.
## The Scenario of the Election Integrity Network Investigation
The NYT’s investigative report, titled [“Inside the Movement Behind Trump’s Election Lies”](https://www.nytimes.com/interactive/2024/10/28/us/politics/inside-the-movement-behind-trumps-election-lies.html), explores the Election Integrity Network, a coalition closely linked with the Trump-led Republican National Committee. The article uncovers how this group coordinated activities to contest election outcomes and enhance Republican voter participation. To craft this narrative, the NYT team needed to analyze over 400 hours of leaked audio from the group’s gatherings over a three-year period, alongside reviewing a variety of documents and training resources.
This extensive collection of information posed a formidable challenge for the group of four reporters. Nevertheless, with the assistance of AI, they managed to transcribe and scrutinize the material more effectively. The NYT acknowledged that they “utilized artificial intelligence to help pinpoint particularly significant moments” from the audio, enabling the team to concentrate on the most pertinent information.
## AI-Driven Transcription: A Breakthrough for Journalists
The initial phase of NYT’s process involved employing AI technologies to transcribe the 400 hours of audio, resulting in a transcript that neared five million words. Although automated transcription is not a recent innovation, the quality and reliability of AI-based transcription services have significantly enhanced in recent years.
For example, *Wirecutter*, a product review platform owned by the NYT, observed that the top AI transcription service in 2018 achieved only 73% accuracy. By 2024, even the least precise service evaluated had an accuracy of 94%. OpenAI’s Whisper, among the premier AI transcription systems, is now viewed as more accurate than numerous human-operated transcription services. This degree of precision enables journalists to transcribe extensive audio segments swiftly and at a fraction of the manual transcription expense.
For reporters, this progression is a considerable time-saving advantage. Instead of dedicating hours or days to the laborious task of transcribing audio, they can redirect their efforts toward the more essential task of analyzing the content. However, the transcription marks just the initial step. The genuine challenge is to comprehend the immense volume of information.
## Employing LLMs to Examine and Filter Data
Once the transcription was finalized, the NYT reporters confronted the formidable challenge of sifting through five million words of text to ascertain the most vital information. To facilitate this, the team utilized several large language models (LLMs). These models aided them in searching the transcripts for particular topics, recognizing notable participants, and identifying recurring motifs in the dialogues.
LLMs are especially adept at this kind of activity. They can swiftly assess large datasets and condense complex documents, rendering them invaluable for investigative journalism. For instance, Anthropic’s Claude model has shown proficiency in absorbing entire texts and responding to queries regarding them or interpreting their meanings. Likewise, Google’s NotebookLM can condense lengthy writings and even develop podcast scripts based on the content.
However, LLMs are not without their drawbacks. While they excel in discerning patterns and summarizing details, they can falter in grasping context, subtle distinctions, or implicit meanings. Additionally, LLMs may occasionally produce “confabulation,” generating text that is grammatically sound but factually incorrect. This concern highlights the necessity of human supervision within the analysis phase.
## The Human-AI Collaborative Method
Acknowledging the constraints of LLMs, the NYT reporters adopted a blended approach for their investigation. After leveraging AI to pinpoint potentially significant excerpts, the team manually scrutinized each passage to confirm its accuracy and relevance. In a note accompanying the article, the NYT underscored that “every quote and video clip from the meetings in this article was verified against the original recording to ensure its accuracy, appropriately reflected the speaker’s intent, and fairly represented the context in which it was expressed.”
This collaborative strategy harnesses the advantages of both AI and human evaluation. The LLMs furnish a rapid and effective means to sift through substantial data volumes, while the human journalists contribute the critical reasoning and contextual awareness essential for guaranteeing the precision and integrity of the final piece.
## The Prospects of AI in Journalism
The application of AI in journalism is still developing, but its possibilities are evident. By automating labor-intensive tasks such as transcription and data evaluation, AI enables reporters to concentrate on the more intricate facets of their work.