In a blog post published this morning, Microsoft’s Joseph Sirosh, corporate vice president at Microsoft, who is in charge of Azure ML, announced the public preview of the Azure Data Catalog, an in-house tool to facilitate discovery of a company’s data sources.
Azure ML is Microsoft’s machine learning platform, which was launched last February. As companies create various data repositories, it becomes more challenging for interested parties such as analysts and data scientists to understand what data sources exist in a large organization.
The purpose of the data catalog, which is a cloud service managed by Microsoft, is to provide a way to identify the sources, and then search for them in an organized fashion.
Users register their sources with the service’s data source registration tool. It extracts structural metadata — data about the data source such as attribute names and data types. While the metadata is copied to the catalog in the cloud, the source of the data remains in place on premises or in the cloud, depending on where it’s stored.
Once it’s in the catalog, other employees can find it using keyword searching, filtering and other search techniques. Other users can even annotate a source listing with additional information in a crowd sourced kind of approach. As people add more information such as tags or alternative names, it should in theory help others who come after better understand its purpose.
Once employees finds a source they like, the tool enables them to connect directly from the catalog to their favorite data visualization tools or to a data tool to work directly with it.
Microsoft is offering this service to make it easier to discover data sources, but there will be instances when certain ones shouldn’t be publicly available to all employees. It is offering an upgraded version of the catalog, which enables companies to control access in those sensitive data stores.
It appears that Microsoft is using the same template it used for the public data marketplace it announced last February for the data catalog.
The service will be available as a public preview beginning on Monday next week.