The Library Of Congress Is A Training Data Playground For AI Companies

Rashi Shrivastava

Forbes

4
18.09.2024

Black and white portraits of Rosa Parks, letters penned by Thomas Jefferson and The Giant Bible of Mainz, a 15th century manuscript known to be one of the last handwritten Bibles in Europe. These are among the 180 million items including books, manuscripts, maps and audio recordings housed within the Library of Congress.

Every year hundreds of thousands of visitors walk through the library’s high-ceiling pillared halls, passing beneath Renaissance-style domes, embellished with murals and mosaics. But of late, the more than 200-year-old library has attracted a new type of patron: AI companies that are eager to access the library’s digital archives — and the 185 petabytes of data stored within it — to develop and train their most advanced AI models.

“We know that we have a large amount of digital material that large language model companies are very interested in,” Judith Conklin, chief information officer at the Library of Congress (LOC) told Forbes. “It's extraordinarily popular.”

The upsurge in interest in the library’s data is also reflected in the numbers. The congress.gov site, which is managed by the LOC and hosts data about bills, statutes and laws, gets anywhere between 20 million to 40 million monthly hits on its API, an interface that allows programmers to download........

© Forbes

visit website

Categories

Sources

Popular

The Library Of Congress Is A Training Data Playground For AI Companies

Rashi Shrivastava

© Forbes