Alma Mater

Google Launches New Search Engine for Scientists


Putting the power big data within easy reach of scientists and data journalists, Alphabet Inc. has launched a new search engine, the Google Dataset Search which is currently in the pilot stage. You will be able to access this at: https://toolbox.google.com/datasetsearch

"To enable easy access to this data, we launched Dataset Search, so that scientists, data journalists, data geeks, or anyone else can find the data required for their work and their stories, or simply to satisfy their intellectual curiosity", Natasha Roy, Research Scientist, at Google AI, said in a blog post Thursday.

The latest tool from the search giant adds to the power of other specialized search engines like Google Scholar and Google Books. It locates files and databases based on how their owners classify them and does not necessarily read the content of the files unlike the way search engines normally parse web pages.

Google is using the open source metadata standards set by schema.org which is a collaborative, community that creates, maintains, and promotes schemas for structured data on the Internet, on web pages, and in email messages.

According to the Google Dataset developer guidelines it will accept the following for submission to the search engine:

  • A table or a CSV file with some data
  • An organized collection of tables
  • A file in a proprietary format that contains data
  • A collection of files that together constitute some meaningful dataset
  • A structured object with data in some other format that you might want to load into a special tool for processing
  • Images capturing data
  • Files relating to machine learning, such as trained parameters or neural network structure definitions


What is interesting is that the search engine also provides datasets pertaining to environmental and social sciences, as well as data from other disciplines including government data and data provided by news organizations, such as ProPublica. It is also available in multiple languages.  

No comments

Powered by Blogger.