Large-scale investigation of weakly-supervised deep learning for the fine-grained semantic indexing of biomedical literature

Nentidis, Anastasios; Chatzopoulos, Thomas; Krithara, Anastasia; Tsoumakas, Grigorios; Paliouras, Georgios

doi:10.1016/j.jbi.2023.104499

Computer Science > Computation and Language

arXiv:2301.09350 (cs)

[Submitted on 23 Jan 2023 (v1), last revised 5 Oct 2023 (this version, v2)]

Title:Large-scale investigation of weakly-supervised deep learning for the fine-grained semantic indexing of biomedical literature

Authors:Anastasios Nentidis, Thomas Chatzopoulos, Anastasia Krithara, Grigorios Tsoumakas, Georgios Paliouras

View PDF

Abstract:Objective: Semantic indexing of biomedical literature is usually done at the level of MeSH descriptors with several related but distinct biomedical concepts often grouped together and treated as a single topic. This study proposes a new method for the automated refinement of subject annotations at the level of MeSH concepts. Methods: Lacking labelled data, we rely on weak supervision based on concept occurrence in the abstract of an article, which is also enhanced by dictionary-based heuristics. In addition, we investigate deep learning approaches, making design choices to tackle the particular challenges of this task. The new method is evaluated on a large-scale retrospective scenario, based on concepts that have been promoted to descriptors. Results: In our experiments concept occurrence was the strongest heuristic achieving a macro-F1 score of about 0.63 across several labels. The proposed method improved it further by more than 4pp. Conclusion: The results suggest that concept occurrence is a strong heuristic for refining the coarse-grained labels at the level of MeSH concepts and the proposed method improves it further.

Comments:	26 pages, 5 figures, 4 tables. A more concise version
Subjects:	Computation and Language (cs.CL); Digital Libraries (cs.DL); Machine Learning (cs.LG)
Cite as:	arXiv:2301.09350 [cs.CL]
	(or arXiv:2301.09350v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2301.09350
Journal reference:	Journal of Biomedical Informatics, Volume 146, 2023, 104499, ISSN 1532-0464
Related DOI:	https://doi.org/10.1016/j.jbi.2023.104499

Submission history

From: Anastasios Nentidis [view email]
[v1] Mon, 23 Jan 2023 10:33:22 UTC (7,809 KB)
[v2] Thu, 5 Oct 2023 14:17:39 UTC (5,020 KB)

Computer Science > Computation and Language

Title:Large-scale investigation of weakly-supervised deep learning for the fine-grained semantic indexing of biomedical literature

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Large-scale investigation of weakly-supervised deep learning for the fine-grained semantic indexing of biomedical literature

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators