This article is within the scope of WikiProject Computing, a collaborative effort to improve the coverage of computers, computing, and information technology on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.ComputingWikipedia:WikiProject ComputingTemplate:WikiProject ComputingComputing articles
This article is within the scope of WikiProject Academic Journals, a collaborative effort to improve the coverage of Academic Journals on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Academic JournalsWikipedia:WikiProject Academic JournalsTemplate:WikiProject Academic JournalsAcademic Journal articles
This article is part of WikiProject Websites, an attempt to create and link together articles about the major websites on the web. To participate, you can edit the article attached to this page, or visit the project page.WebsitesWikipedia:WikiProject WebsitesTemplate:WikiProject WebsitesWebsites articles
Bianca Kramernotes that Semantic Scholar is "showing whether a citation cites methods, results or background". I feel this would be worth mentioning, will look later for a secondary source. Nemo07:01, 25 October 2019 (UTC)[reply]
Current text is:
"Semantic Scholar is free to use and unlike similar search engines (i.e. Google Scholar) does not search for material that is behind a paywall.[citation needed]"
which implies that it only provides results which are not behind a paywall, while actually meaning: "it doesn't search in between and across the material behind a paywall".
Yes, this is clearly wrong. It's easy to verify by just searching for any article that's published by a non-open access journal and seeing if it comes up. However, the current citation does use the "behind a paywall" wording without clarifying what that means, so I think it would require a different clarifying source if it's to be changed? Joshisanonymous (talk) 16:50, 20 April 2024 (UTC)[reply]
It does not allow anyone to link articles that are not freely available on line; it only recognizes articles that are available freely. FrankieItalo (talk) 01:11, 6 May 2024 (UTC)[reply]
Ping Kouroshkoratamadia and Joshisanonymous . You're correct. The source that is quoted above and is in the WP article is wrong.
Semantic Scholar (SS) is free to use
SS searches for and extracts information from freely available online journal articles
however, SS ALSO searches for material behind paywalls! In other words, Semantic Scholar can (and does) access many articles that are not published in open access scholarly journals.
Descriptions of SS are very misleading! Even the U.S. Department of Commerce Research Library, LITERATURE SEARCH: SEMANTIC SCHOLAR gets it wrong, "[SS] does not search for material that is behind a paywall."
I found 3 explanations of how Semantic Scholar (SS) works, regarding paywalls.
ONE From the SS FAQ, Content sectionQ1. Where does Semantic Scholar source papers from? A1. "Semantic Scholar sources its content via web indexing and from partnerships with scientific journals... You can find a list of our sources by visiting our publisher partners page... We index content from PubMed, arXiv, Springer Nature, and more."
Q2. How do I access the full text of a paper? A2. "...you will find access options below the abstract of the paper located on the paper detail page... you will see options to View PDF, View Paper, or View via Publisher that will redirect you to a full-text PDF... If the paper is not freely accessible, the publisher website has options to purchase the paper. For more information, see How do I access a PDF using my institutional affiliation?"
This is nothing different from Google Scholar or any other research paper repository with subscription-access only journals.
TWOInA1 above, publisher partners links here: https://www.semanticscholar.org/about/publishers University of Chicago Press is listed. This is how U Chicago Press describes its partnership with SS, emphasis mine: "Articles published in University of Chicago Press journals will now appear in the Semantic Scholar corpus, providing readers with bibliographic information and article summaries. Each article entry links directly to the journal’s webpage, so subscribers can read the full text or download a PDF."
So, if the SS citation is for an article in an open access journal, you can read it or download it. All other article citations returned by SS are paywalled.
THREEInProceedings of the 58th Annual Meeting of the Association for Computational Linguistics (2020), this paper runs the citation search on open access papers only, S2ORC: The Semantic Scholar Open Research Corpus (pdf). On p. 4570, "Papers in SS are derived from numerous sources: obtained directly from publishers... from arXiv or PubMed, or crawled from the open Internet." This means that most of the papers that SS gets come directly from publishers and are not open access. The S2ORC consist of all the papers in the Semantic Scholar corpus that are in English, have abstracts and are open access. The SS full corpus is approximately 300M journal articles. Some filtering is done to get to 81.1M papers.
See pp. 4972−4973 and Table 3. "Our publisher-provided abstract coverage is 90.4%, or 73.4M papers. Our PDF coverage is 35.6%, or 28.9M papers... we extract bibliography entries for 27.6M of the 28.9M PDFs. We identify 8.1M of the 28.9M PDFs as open access, and we provide full text for all papers in this open access subset. Using these extracted bibliographies, we resolve a total of 380.5M citation links between papers..."
There are serious problems with its classifications. It uses only single letter first initials and therefore mixes across all kinds of fields for common names.
It throws anything in a foreign language together without analysis; it needs to analyze foreign languages as well as English.
It does not allow scholars to correct errors-- for instance, it lists reviews as articles under the author(s) of the book reviewed, which is totally inappropriate.
It also arbitrarily separates sections of one author's works by what it thinks is the subject and will not allow combining of pages by the author concerned. FrankieItalo (talk) 01:15, 6 May 2024 (UTC)[reply]
The problems with non-English names (and probably anything that uses the Cyrillic alphabet) remind me of the OFAC and FinCEN lists that I used to work with. I work/worked in bank risk management, and I could not believe how expensive and error-prone some of the sanctions solutions "services" were! So many false negatives due to the first 3 sentences of what you wrote. Paul Allen and BERT AI should do much better.
I noticed in the SS FAQ that it is difficult or impossible for scholars to correct errors of fact in the citation entries for their work, which is ridiculous.--FeralOink (talk) 23:35, 10 July 2024 (UTC)[reply]