![]() | |
Type of site | Search engine |
---|---|
Created by | Allen Institute for Artificial Intelligence |
URL | semanticscholar |
Launched | November 2015 |
Semantic Scholar is an artificial-intelligence backed search engine for academic publications developed at the Allen Institute for AI and publicly released in November 2015.[1] It uses advances in natural language processing to provide summaries for scholarly papers.[2] The Semantic Scholar team is actively researching the use of artificial-intelligence in natural language processing, machine learning, Human-Computer interaction, and information retrieval.[3]
Semantic Scholar began as a database surrounding the topics of computer science, geoscience, and neuroscience.[4] However, in 2017 the system began including biomedical literature in its corpus.[4] As of November 2021, they now include publications from all fields of science.
Semantic Scholar provides one-sentence summary of scientific literature. One of its aims was to address the challenge of reading numerous titles and lengthy abstracts on mobile devices.[5] It also seeks to ensure that the three million scientific papers published yearly reach readers since it is estimated that only half of this literature are ever read.[6]
Artificial intelligence is used to capture the essence of a paper, generating it through an "abstractive" technique.[2] The project uses a combination of machine learning, natural language processing, and machine vision to add a layer of semantic analysis to the traditional methods of citation analysis, and to extract relevant figures, tables, entities, and venues from papers.[7][8]
In contrast with Google Scholar and PubMed, Semantic Scholar is designed to highlight the most important and influential elements of a paper.[9] The AI technology is designed to identify hidden connections and links between research topics.[10] Like the previously cited search engines, Semantic Scholar also exploits graph structures, which include the Microsoft Academic Knowledge Graph, Springer Nature's SciGraph, and the Semantic Scholar Corpus.[11]
Each paper hosted by Semantic Scholar is assigned a unique identifier called the Semantic Scholar Corpus ID (abbreviated S2CID). The following entry is an example:
Semantic Scholar is free to use and unlike similar search engines (i.e. Google Scholar) does not search for material that is behind a paywall.[12][4]
One study compared the search abilities of Semantic Scholar through a systematic approach, and found the search engine to be 98.88% accurate when attempting to uncover the data.[12] The same study examined other Semantic Scholar functions, including tools to survey metadata as well as several citation tools.[12]
As of January 2018, following a 2017 project that added biomedical papers and topic summaries, the Semantic Scholar corpus included more than 40 million papers from computer science and biomedicine.[13] In March 2018, Doug Raymond, who developed machine learning initiatives for the Amazon Alexa platform, was hired to lead the Semantic Scholar project.[14] As of August 2019, the number of included papers metadata (not the actual PDFs) had grown to more than 173 million[15] after the addition of the Microsoft Academic Graph records.[16] In 2020, a partnership between Semantic Scholar and the University of Chicago Press Journals made all articles published under the University of Chicago Press available in the Semantic Scholar corpus.[17] At the end of 2020, Semantic Scholar had indexed 190 million papers.[18]
In 2020, users of Semantic Scholar reached seven million a month.[5]