figshare SEARCH

Introductory considerations

Searching the figshare website can be performed either by using the front-end interface or by using the API we provide. Either way, the search query will be analyzed and sent to a search engine built by figshare on top of the open source Elasticsearch server. In the following document, we will try to detail the type of searches one may perform and the results one should expect from the search engine. The environment used for demonstrating the figshare search feature is comprised of six research documents, scraped for this purpose from the archives of the Public Library of Open Science. For reference purposes, we provide the links to the articles below: Article 1 Article 2 Article 3 Article 4 Article 5 Article 6

The actual article content is indexed and analyzed from the first part of each article's introduction, the tags are the subject areas, and the categories are parsed from the article's .xml file, and are the same as the categories that appear on the figshare page corresponding to the article.

=========================

General search engine features/characteristics

  1. Stemming of words:
  2. All indexed words are stemmed, so, search queries like process/processes/processing are equivalent.
  3. Stopword elimination:
  4. Stopwords like prepositions for instance are not indexed by the search engine, so, search queries like "just in time"/"just on time"/"just of time" are equivalent.

Please note however that, even if the stopwords do not get indexed, in the case of the phrase search detailed below, the fact that a stopword exists somewhere in the actual phrase does matter. Thus, search queries like "standing on the table"/"standing table" are not equivalent.

=========================

Search types and expected results

This is the basic type of search one can perform on the figshare website or API, and it works pretty much as expected, with documents returned in the order of relevance as follows: documents where the query term appears more times are ranked higher. All article fields (in this case: title, description, tags, categories) are taken into consideration.

Example:

Query string: "therapy"
Return results with relevant paragraphs/tags/categories:
  1. Article 4
  2. Title: A National Survey of Teachers on Antiretroviral Therapy in Malawi: Access, Retention in Therapy and Survival
  3. Description: [...] Malawi is scaling up antiretroviral therapy (ART) (...)
  4. Description: [...] retention in therapy and survival (...)
  5. Tags: Antiretroviral therapy
  6. Article 2
  7. Title: Uptake of WHO Recommendations for First-Line Antiretroviral Therapy in Kenya, Uganda, and Zambia
  8. Description: [...] first-line antiretroviral therapies (ART) (...)
  9. Description: [...] TDF combined therapy offers (...)
  10. Description: [...] TDF-based therapy as the preferred option with AZT-based therapies listed as alternatives. (...)
  11. Description: [...] triple combination therapies containing AZT and TDF. (...)
  12. Description: [...] lower prices for AZT and TDF combination therapy, (...)
  13. Description: [...] more expensive than d4T-based therapies, respectively. (...)
  14. Description: [...] AZT- and TDF-based therapies, (...)
  15. Tags: Antiretroviral therapy
  16. Article 3
  17. Description: [...] access to antiretroviral therapy (ART) (...)
  18. Description: [...] life-long HIV therapy with limited (...)
  19. Tags: Antiretroviral therapy
  20. Article 5
  21. Description: [...] Antiretroviral therapy (ART) is effective (...)
  22. Tags: Antiretroviral therapy
  23. Article 1
  24. Tags: Antiretroviral therapy

Please note that Article 4 is ranked higher than Article 2, even though the latter has more occurences of our search term. The reason for this is that the former has two of those occurences in the title, and the scoring is weighed rather heavily towards the title field.


This works exactly as the general term search described above, however, it is a bit more specialized, in that it allows selecting the field(s) in which to perform the search. As in the general term search above, documents where the query term appears more times are ranked higher. This type of search works on the figshare website, as well as in the API.

Example:

Query string: "title: therapy"
Return results with relevant paragraphs/tags/categories:
  1. Article 4
  2. Title: A National Survey of Teachers on Antiretroviral Therapy in Malawi: Access, Retention in Therapy and Survival
  3. Article 2
  4. Title: Uptake of WHO Recommendations for First-Line Antiretroviral Therapy in Kenya, Uganda, and Zambia

Please note that articles 1, 3 and 5 do not appear in the search results anymore, as just the items with the word therapy in the title are returned.


This is the general multi-word search that returns documents that contain at least one of the query terms. This works like a boolean "OR" search, and ranking rules are the same as the types of searches listed above. This type of search works on the figshare website, as well as in the API.

Example:

Query string: "verticality endorse progress"
Return results with relevant paragraphs/tags/categories:
  1. Article 1
  2. Description: [...] Initially, the South African National Department of Health endorsed a vertical ART (...)
  3. Description: [...] system able to signpost programme progress and performance (...)
  4. Description: [...] These vertical M&E systems ran parallel (...)
  5. Description: [...] Kawonga and colleagues discuss the limitations inherent to such a vertical (...)
  6. Article 2
  7. Description: [...] this recommendation, stating that countries should take steps to progressively reduce (...)

This works like the general multi-term search described above, only it allows specifying the field(s) on which to perform the search. Ranking rules are the same as the types of searches listed above. This type of search works on the figshare website, as well as in the API.

Example:

Query string: "title: antiretroviral therapies"
Return results with relevant paragraphs/tags/categories:
  1. Article 4
  2. Title: A National Survey of Teachers on Antiretroviral Therapy in Malawi: Access, Retention in Therapy and Survival
  3. Article 2
  4. Title: Uptake of WHO Recommendations for First-Line Antiretroviral Therapy in Kenya, Uganda, and Zambia
  5. Article 1
  6. Title: Implementation of an Electronic Monitoring and Evaluation System for the Antiretroviral Treatment Programme in the Cape Winelands District, South Africa: A Qualitative Evaluation
  7. Article 3
  8. Title: Diminishing Availability of Publicly Funded Slots for Antiretroviral Initiation among HIV-Infected ART-Eligible Patients in Uganda

This is the equivalent of a boolean "AND" search and returns only the documents with occurences of all of the words contained in the search string. Ranking rules are unchanged with regard to the search types described above. This type of search works on the figshare website, as well as in the API.

Example:

Query string: "antiretroviral, therapies"
Return results with relevant paragraphs/tags/categories:
  1. Article 4
  2. Title: A National Survey of Teachers on Antiretroviral Therapy in Malawi: Access, Retention in Therapy and Survival
  3. Description: [...] antiretroviral therapy (ART) (...)
  4. Description: [...] access, retention in therapy and survival for subgroups of the population. We have (...)
  5. Tags: Antiretroviral therapy
  6. Article 2
  7. Title: Uptake of WHO Recommendations for First-Line Antiretroviral Therapy in Kenya, Uganda, and Zambia
  8. Description: [...] to transition away from first-line antiretroviral therapies (ART) containing stavudine (...)
  9. Description: [...] the likelihood of drug resistance. TDF combined therapy offers an additional benefit (...)
  10. Description: [...] WHO consolidated guidelines on the use of ART named a TDF-based therapy (...)
  11. Description: [...] AZT-based therapies listed as alternatives (...)
  12. Description: [...] combination therapies containing AZT and TDF
  13. Tags: Antiretroviral therapy
  14. Article 1
  15. Title: Implementation of an Electronic Monitoring and Evaluation System for the Antiretroviral Treatment Programme in the Cape Winelands District, South Africa: A Qualitative Evaluation
  16. Description: [...] South Africa's antiretroviral treatment (ART) programme (...)
  17. Tags: Antiretroviral therapy
  18. Tags: antiretrovirals
  19. Article 3
  20. Title: Diminishing Availability of Publicly Funded Slots for Antiretroviral Initiation among HIV-Infected ART-Eligible Patients in Uganda
  21. Description: [...] about the future of access to antiretroviral therapy (...)
  22. Description: [...] the feasibility of implementing life-long HIV therapy with limited health care infrastructure (...)
  23. Tags: Antiretroviral therapy
  24. Article 5
  25. Description: [...] Antiretroviral therapy (ART) is effective in reducing maternal (...)
  26. Tags: Antiretroviral therapy

Please note that, just like in the case of the general term search, Article 4 is ranked higher than Article 2, even though we can see more occurences of the query string words in the latter. The reason, just like above, is the weighing applied on the title field, deemed more important than the description field.


This works like the general compound term search above, only it allows specifying the field(s) on which to perform the search. Ranking rules are unchanged with regard to the search types described above. This type of search works on the figshare website, as well as in the API.

Example:

Query string: "title: antiretroviral, therapies"
Return results with relevant paragraphs/tags/categories:
  1. Article 4
  2. Title: A National Survey of Teachers on Antiretroviral Therapy in Malawi: Access, Retention in Therapy and Survival
  3. Article 2
  4. Title: Uptake of WHO Recommendations for First-Line Antiretroviral Therapy in Kenya, Uganda, and Zambia

This is the most restrictive type of search, as it returns only those documents in which an occurence of the full query string (word ordering matters) is found. Ranking rules are unchanged with regard to the search types described above. This type of search works on the figshare website, as well as in the API.

Example:

Query string: " 'antiretroviral therapy (ART)' "
Return results with relevant paragraphs/tags/categories:
  1. Article 4
  2. Description: [...] antiretroviral therapy (ART) [6]. Data on all ART patients (...)
  3. Article 2
  4. Description: [...] to transition away from first-line antiretroviral therapies (ART) containing stavudine (...)
  5. Article 5
  6. Description: [...] service needs. Antiretroviral therapy (ART) is effective in reducing maternal (...)
  7. Article 3
  8. Description: [...] have raised uncertainty about the future of access to antiretroviral therapy (ART) and the goal of universal access. (...)

Please note that the ranking, in this case, takes into consideration individual term occurences in the documents, hence the ordering (remarkably similar to the ordering in the case of the general term search)


This works like the general phrase search above, only it allows specifying the field(s) on which to perform the search. Ranking rules are unchanged with regard to the search types described above. This type of search works on the figshare website, as well as in the API.

Example:

Query string: "title 'therapy and survival' "
Return results with relevant paragraphs/tags/categories:
  1. Article 4
  2. Title: A National Survey of Teachers on Antiretroviral Therapy in Malawi: Access, Retention in Therapy and Survival