How we used AI to extract citation data from conference posters


Can AI extract citations from conference posters posted on social media? An experimental study by researchers at Digital Science proves that Gemini 2.0 flash can extract citation data from images on X (previously Twitter) with a 92% accuracy, and we were able to link 63% of these otherwise “invisible” citations in the Dimensions database. 

Conference posters often contain unpublished or important information. For example, a poster may share early insights from a study or clinical trial, months or years before publication. Access to this information could change clinical and strategic decisions, but because research shared in posters, presentations and other images often lacks structured data, it remains undiscovered by tracking tools, and analyzing it manually for citation data is too time-consuming.

Dr Mike Taylor (Head of Data Insights) and Dr Carlos Areia (Senior Data Scientist), both at Digital Science, conducted an exploratory study to find out if it is possible to extract structured citation data from poster images using LLMs. To obtain meaningful insights from this data, the study then assessed how easy it is to link these results to Dimensions and Altmetric databases.

Here, we summarize the methodology and results of this study, but you can find the full study here: https://doi.org/10.2196/78148

Over 115,000 posts from X associated with the 2024 American Society of Clinical Oncology (ASCO) meeting were screened and filtered for high-quality poster images. Due to the high metadata availability of ASCO conferences, a smaller, non-clinical conference was also included to double-check accuracy and citation linkage. 

The team then used a prompt-engineered Gemini 2.0 Flash model to classify images, summarize posters, and extract structured citation elements (eg, authors, title, DOI) in JSON.

A hierarchical algorithm matched the JSON data against the Dimensions database. Manual validation was performed on a random 20% sample. You can find the full methodology and read the full prompt for the model in the published paper. 

This study is a successful proof of concept that LLMs can be used to extract citations from images (like conference posters), in a scalable and accurate way.

In total, within the 115,714 posts and 16,574 images analyzed, 651 met the inclusion criteria and yielded 1117 potential citations. The algorithm was able to link 63% of these citations to 616 unique research outputs (580 journal articles; 36 clinical trial registrations). The manual review of 135 randomly sampled citations found that they were correctly linked 92% of the time.

How do Altmetrics and Dimensions turn citations from conference posters into valuable insights?

Besides proving that LLMs can extract citation data from images, the study showcases how Altmetrics and Dimensions provide meaningful, actionable context to the discovered citations. Dimensions hosts the largest collection of interconnected research data, and Altmetric monitors and reports the online attention of research, including mentions and citations in social media, mainstream media, policy patents and more.

Unlike traditional citation-based metrics, these platforms offer a broader perspective on research influence, making them an essential complement for researchers, institutions, and policymakers. Dimensions enabled the team to connect the discovered citations to publications, clinical trials, patents, and grants. 

This enabled an in-depth analysis of the findings. For example, using the International Cancer Research Partnership Cancer Types taxonomy in Dimensions, the researchers were able to quickly identify the main cancer types mentioned in the poster citations. Altmetric is able to provide a real-time gauge of the research. Altmetric shows that several posters were eventually published and achieved a significant level of attention.

This study opens the door for future use of AI on image extraction to collect scholarly mentions and citations in novel sources, as well as other relevant clinical data from conference posters. The success of this experimental study is evidence of Digital Science’s sustained commitment to innovating industry-leading research methods. 

Since 2010, Digital Science has helped enterprises and organizations from various sectors accelerate discovery and create bespoke solutions to their research needs. Whether this be through sentiment analysis, impact summaries, congress tracking or other tailored metrics. Now, our user-friendly platforms and tools, including Altmetric and Dimensions, have made these advanced technologies accessible to everyone, empowering researchers, universities, funders, enterprises, and publishers to advance knowledge and fuel breakthroughs.



Source link