BERLIN — Robot reporters have arrived in the newsroom. Algorithms are writing up company earnings, covering sports championships and dabbling in crime and politics. One day, if the techno-optimists are to be believed, they will be doing much of the work journalists do today.
For now, your loyal correspondent’s job seems to be safe. Despite the galloping pace of technological progress, computers are still far from being able to develop sources, provide high-level analysis or infuse a narrative with character and color.
But as news organizations integrate artificial intelligence into their operations, it’s becoming increasingly clear that the media industry — and its workforce — is not safe from disruption.
Automatically generated articles
Robots are already performing routine tasks once done by human reporters.
Once, on mornings when companies published quarterly earning reports, business journalists at the Associated Press would get up early, wait for the numbers and hammer out copy as fast as they could.
“Earnings and sports were obvious for us because they’re data driven” — Lisa Gibbs, AP’s director of news partnerships
No longer. In 2014, the AP automated the process. Now, a software monitors the earnings, then spits out a basic story onto the wire in under five minutes — faster than any reporter ever did the job.
Similarly, the newswire has automated its coverage of minor league baseball and college basketball.
“Earnings and sports were obvious for us because they’re data driven,” said Lisa Gibbs, AP’s director of news partnerships. “The sources of data are clean, and there is a value of having information about them out very quickly.”
The agency estimates it will publish roughly 40,000 automatically generated articles by the end of this year — still just a fraction of the more than 700,000 articles, including revisions, it puts out each year.
The Washington Post has been using machines to cover high-school sports | Brendan Smialowski/AFP via Getty Images
The AP is not alone. Newsrooms around the world have begun using software to report on sports scores, earnings’ reports and election results. Bloomberg News has automated its earnings coverage. The Washington Post uses software to cover high-school games. The Los Angeles Times has a bot tweeting about earthquakes.
In Europe, the Austrian Press Agency is planning to use software during the upcoming European Parliament election to quickly push out articles with election results from each of Austria’s more than 2,000 municipalities. Norwegian news agency NTB said it has been running automatically generated stories about football games for three years, and plans to expand its coverage to the country’s minor leagues this year, bringing its automated coverage to up to 170,000 games per year.
Whatever the subject of the coverage, the process for this kind of automated journalism is similar.
First, reporters identify reliable data sets. They team up with programmers to write a template spelling out what a story needs to say, how it is supposed to sound and what possible variations there should be: Have the earnings of a company “soared,” or have they “plummeted”?
After any glitches are ironed out, the software is let off its leash to write its stories. But this doesn’t mean that the work of the journalists is done: At the AP, reporters still make tweaks to headlines, or mess around with some of the language. And they continue to monitor developments that might make their templates out of date.
Car chases in Southern California have become so ubiquitous, researchers were able to come up with a template for writing news stories about them. Feed in a few key details, and out comes journalistic prose — or something approaching it | Getty Images
If the U.S., for instance, decides to make changes to its tax law, business editors must decide whether to turn off automation because the risk of stories being inaccurate or incomplete is too high.
“We have editors now who used to spend their time writing and editing earning stories,” said Gibbs, who led the AP’s business desk when it first introduced automated reporting. “Now it’s about maintaining a very large database.”
Her agency currently has no plans to expand the use of automatically generated articles into other areas, she said, adding that “we’re not in the business of automating things just because we can.”
In the fall of 2015, researchers from “Structured Stories,” a now-dormant academic project, approached reporters covering car chases in a local NBC Los Angeles newsroom with a request.
After the journalists filed a story, they were asked to enter facts about the incident into a database.
Every car chase is different, but if you think about them as narrative structures, they all have reoccurring plot elements. Every chase has a triggering event: someone is killed or a police officer notices a speeding vehicle. And every pursuit has an ending: an accident, a surrender, a shooting. By compiling enough examples, the researchers were seeking to teach their machines how to cover a car chase.
And it worked. The subfield of artificial intelligence underlying most newsroom robots is called “natural language generation” — or NLG. The basic idea is that if you want a computer to be able to write something, you must provide it with the information in a form it is able to process, and then you have to teach it how to use it.
Using examples from more than 60 car chases, the researchers came up with a way to encode the races so that a computer could understand them and a template for a typical story about an L.A. car chase. When they fed it data from another chase, the software spit out an article that read like classic journalistic prose:
Driver Drives Off 300 Ft Cliff During Pursuit
July 13th, 2015 (Point Fermin Park, San Pedro, California) — A vehicle pursuit that began in Wilmington during the late evening of July 13th later ended with the crash of the suspect in Point Fermin Park, San Pedro. The incident began at about 11:00 PM when an unidentified driver, driving a Toyota Prius, fled from officers of the Los Angeles Port Police following a traffic stop on Pacific Coast Highway in Wilmington. The suspect was then pursued by the LAPP along Alameda Street to San Pedro, and then further along Alameda Street to Point Fermin Park in San Pedro. The incident concluded with the crash of the suspect in Point Fermin Park, where the suspect drove over a cliff. The unidentified suspect was injured. Referring to the incident, witness Manuel Castro said “We peeked our heads and it was just a gray Prius and we saw the wreckage and the cops over here.” The suspect was treated for injuries at the scene and was expected to be arrested.
What made this experiment different from articles about quarterly earnings reports is that the machines weren’t just punching numbers into a template. They were recounting a series of events. In other words, they were telling stories.
The outcome proved that “it’s possible to represent most news stories, and certainly formulaic news stories, as data, and it is also possible to generate news products, say articles, that are very similar to what journalists produce,” said David Caswell, who oversaw the “Structured Stories” project before joining the BBC last year as the executive product manager of its News Labs incubator.
Robot journalists are not yet able to produce articles by themselves, beyond stenography-style reporting about straightforward facts. Even in cases where they’re able to write the story, they still need human reporters to tell them how to process information first. They serve primarily as virtual assistants, allowing one journalist to do work that might have required dozens — if not hundreds.
That’s the idea behind Radar, an “automated news service” that offers a glimpse at the cutting edge of computer-assisted journalism.
Radar — the name is an acronym for “Reporters and Data and Robots” — was launched in September 2017 as a joint venture of the U.K.’s Press Association and startup Urbs Media, with more than €700,000 of funding from Google’s News Initiative.
It seeks to use the vast but often untapped troves of public data that is released by the British government and other institutions, some of which drills down to the level of the U.K.’s hundreds of local authorities.
“In a way, these tools make it more likely that more of us will be able to do more sophisticated reporting” — Lisa Gibbs
A story produced by Radar starts like any other: with an idea. The group’s reporters root around data sets, looking for something interesting. “To my mind, the best judge to a story is still a human journalist rather than a machine,” said Radar Editor-in-Chief Gary Rogers.
Once the team has identified a subject worth picking up — say, how often ambulances are delayed across the U.K. — reporters might make some phone calls or conduct interviews to understand the broader context and harvest general quotes for their articles.
Only then does the automation begin. The reporters write a template, which will allow them to generate hundreds of individual articles from just one data set — in this case, noting how often ambulances are delayed in the community and how that compares to the national average.
They add some analysis and feed it into their NLG software, which spits out hundreds of “localized” articles Radar offers to its subscribers.
Newsroom executives interviewed for this article argued that rather than making journalists redundant, robot reporters will allow them to focus on elements of the job that add more value | Xavier Galiana/AFP via Getty Images
Sometimes, local newsrooms publish the stories as they come in; sometimes, they have their own journalists pick them up, do additional reporting and turn them into larger features.
Radar’s team of five reporters, plus the startup’s two founders, hammers out around 8,000 stories per month, covering issues ranging from crime to transport, education, environment, health and social policy.
Some of the data they use has been available for years but had remained, so far, untouched by journalists — partly because so many local U.K. newspapers have shut down, and partly because much of the material is too detailed for it ever to have been profitable for human reporters working on their own.
“We’re using NLG technologies as a writing tool, in effect,” said Rogers.
Newsroom executives interviewed for this article argued that rather than making journalists redundant, robot reporters will allow them to focus on elements of the job that add more value.
Historically, in most newsrooms, only a small fraction of reporters have been free to truly dig deep into stories. Many of the tasks being performed can be — or one day soon could be — done by robots.
“In a way, these tools make it more likely that more of us will be able to do more sophisticated reporting,” said Gibbs, who now leads AP’s newsroom AI efforts.
Traditionally, most journalism is a single-use product. A reporter is assigned a story, gathers information, and then writes. Over time, they build up sources and knowledge, and get better at what they do — but when it comes to the writing part, they have to start fresh each time.
“The editorial side of journalism is going to be more important than ever. But it’s going to be completely different” — David Caswell, BBC
That’s where robot-journalists could revolutionize the business, technologists like the BBC’s David Caswell believe. He envisions that instead of writing articles, breaking news reporters of the future will work on templates that — fed with new data — can produce a limitless number of stories.
Newsrooms will build up libraries of templates, enabling them to quickly produce stories, sometimes for multiple audiences. The reporting about a car chase in L.A. could, for example, be run through five templates, producing, in turn, a short summary, a listicle, a colloquial blog post, a colorful article with lots of detail and a version in Spanish for the city’s Latino community.
“The editorial side of journalism is going to be more important than ever,” said Caswell. “But it’s going to be completely different.”
For some journalists, that will mean new opportunities. For others, it could mean having to learn new skills — or risk losing their jobs.
Some fields of journalism — investigative projects, magazine features, in-depth political and business analysis, op-eds and commentaries — seem, for now, to be safe from technologies like NLG.
Templates and databases can only do so much. “You can’t automate creative writing,” said Alexander Siebert, one of the CEOs of Berlin-based tech company Retresco. “Artificial intelligence can grasp the structure of grammar and turn data into creative language — but the ‘creative idea’ is and remains in the hands of humans. It will take many additional years of research until this can be done by machines.”
But other newsroom areas could see significant disruption.
“There are new ways of doing journalism that will be completely accessible and possible for new generations of journalists,” said Caswell. “But it’s maybe harder for older journalists to adapt to those kinds of thinking.”
“If the emphasis on AI … is motivated to further reduce costs and resources — meaning people’s jobs — then that’s a problem” — Sarah Kavanagh, Senior NUJ official
To write templates, journalists will have to learn to look for recurring patterns in whatever they’re covering — similarly to what computer scientists and developers do when they write code. Such computational thinking is a skill most working journalists currently lack.
“It’s essential that employers provide training,” said Sarah Kavanagh, a senior campaigns and communications officer at the U.K.’s National Union of Journalists.
Kavanagh described recent efforts to automate reporting in the industry as a “double-edged sword.” While her union welcomes the use of technology, including AI, to enhance reporting — particularly in underserved news deserts — she also warned that the option to automate reporting could bring newsrooms to lay off reporters.
Similarly, disruptive innovations had been used in the past, she warned, “in a way that is it not supporting quality, sustainable journalism, but about cutting costs.”
“If there are technological tools that are developed that will help people to dig into lots of information and to save time in their work of journalists, then … they should be welcomed,” she said. “But if the emphasis on AI … is motivated to further reduce costs and resources — meaning people’s jobs — then that’s a problem.”
Just what changes artificial intelligence will ultimately bring to the newsroom is still unclear. While there are distinct limits to NLG, other avenues are just being explored.
“It’s not far-fetched to assume that in five to 10 years from now, technology will be at a place that, depending on the story one is writing, the machine will just make suggestions about sentences, or an outline for the piece,” said Seth Lewis, an emerging media researcher at the University of Oregon.
Enter “deep learning.”
Unlike NLG, where computers are told what to do, deep learning analyzes vast troves of data and learns from that experience. That makes it highly effective, but it also turns computers into black boxes, making it impossible to fully understand the reasons behind their decisions.
In February, the American nonprofit OpenAI made headlines when it said it had created a software that uses “deep learning” to generate text and that it is so good — and has so much potential for misuse — that it wouldn’t release its full research.
Human journalists might not be superseded by robot reporters just yet | Image via iStock
It was trained by being fed 8 million documents. Using an approach similar to the predictive text generator on a smartphone, the program produces articles that read like near-perfect prose by predicting what word is likely to follow another one.
The only problem: As authentic as the stories may seem, the facts they contain are completely made up. Asked to generate an article about the OECD, the software created a straightforward news piece, including a fabricated quote attributed to a chairperson at the organization.
This makes the program, arguably, a potential competitor to fiction writers, novelists or poets, who aren’t necessarily constrained by the facts. But it’s much less useful for reporters, whose job is to stick to the truth.
For journalists, including your loyal correspondent, that should come as a relief. We won’t have to dust off our resumes. At least not yet.
Judith Mischke contributed reporting.