Vetenskapsakademiens Handlingar was a Swedish scientific journal that was first published in 1739. At this point, Latin was the dominant language in science, and the aim of the journal was to reach out with practical and economic knowledge in Swedish. The journal addressed everything from new scientific findings to practical advice on agriculture, crafts and industry.
“Someone wrote about a strange illness, and someone about mining and the working conditions in mining. It paints a picture of how people lived at this time”, says Sofie Johansson.
She is a language technologist and researcher in multilingualism at the Department of Swedish, Multilingualism and Language Technology at the University of Gothenburg.
“I have tried to digitize this material for many years now, but it has been difficult. I have written down an estimated 120,000 words by hand”, says Lena Rogström, a language historian at the same department, with a great interest in the history of lexicography, entomology and 18th-century languages.
Digital tools for text recognition
The university library tried to help the researchers with OCR scanning (optical character reading, a kind of automatic reading), but the quality was not good enough. Then Rogström and Johansson heard about Transkribus, a tool for text recognition. In connection with this, the Centre for Digital Humanities announced project funding, which they applied for and received.
“We started testing Transkribus and it worked very well. The program transcribed the texts and we corrected the mistakes manually”, says Sofie Johansson.
Transcribus worked well even on gothic script that can be difficult to decode for both humans and machines. The researchers have now digitized around half a million words from eight volumes of the journal.
“We realized from the beginning that we wouldn’t have time to do everything, but we have made a fair bit of progress”, says Lena Rogström.
Great variations in academic language
Sofie Johansson has also analyzed some texts with the help of other text analysis tools. The research subject ‘entomology’ (the science of insects) had more different words than, for example, ‘agriculture’.
“But entomology is a newer science and the vocabulary had not stabilized yet. The academic language varies greatly between the 18th and 19th centuries. The modern vocabulary becomes more and more similar to ours over time”, says Sofie Johansson.
“We would have seen even more similarities if the spelling was normalized. Many words, for example the small ‘and’ were spelled in several different ways and then the computer splits them up based on how they are spelled even if they are the ‘same’ word”, says Lena Rogström.
“You can also see the emergence of concepts in the material. Someone has investigated how water expands when it freezes, but doesn't talk about it freezing but going from liquid to solid”, says Sofie Johansson.
The researchers also discovered that the writers were very familiar with the work of others. Contrary to what one might think, there was a large turnover of material, there were plenty of scientific journals and the postal service was efficient.
Rare texts by Carl von Linné
One author who appears among the texts is Carl von Linné.
“He is one of our greatest natural scientists, but there is no comprehensive linguistic studies on his vocabulary. He didn't write much in Swedish, but mostly in Latin, but in the Royal Academy of Sciences' documents, everything is in Swedish, and that paints a picture of how he wrote in relation to others”, says Lena Rogström.
Their project has received a very positive response, including from the Swedish Academy's dictionary, SAOB. The Swedish Academy will also support the project further so that the researchers can start marking up the documents with author, subject and year and make them available and searchable for other researchers as well as the public.
“It was very inspiring to read the presidium speeches about why you should do science and why you should distribute it”, says Sofie Johansson.
By: Katarina Wignell
Photo credit: Proceedings of the Royal Swedish Academy of Sciences