You are here: silicon.com > Hardware > Storage

Storage

Google open sources data-moving tool

Protocol Buffers 'simple' alternative to XML

Tags: open source, google

By Matthew Broersma

Published: 11 July 2008 08:27 BST

Google has open sourced an internal development tool called 'Protocol Buffers', a data description language that forms a basic part of the operation of the company's vast computing cluster.

The tool, which has been in use for several years at Google, handles the process in which the company encodes almost any sort of structured information that needs to be passed across the network or stored on a disk, Google open-source programs manager Chris DiBona said in a blog post announcing the move.

Protocol Buffers could be useful for other organisations that need an efficient way to move structured data around a network, for instance in large clusters or data centres, DiBona said.

The best of Google Earth

From Hollywood to Vegas and racetracks to controversial domes... click here to travel the world with Google Earth.

Google uses thousands of data formats for networked messages, and XML is simply too cumbersome to use as an encoding method for it all, Google software engineer Kenton Varda explained in a separate blog post. "As nice as XML is, it isn't going to be efficient enough for this scale," he wrote. "When all of your machines and network links are running at capacity, XML is an extremely expensive proposition."

Various other methods exist for passing encoded data over networks but Google found none of them suited its particular need - which was for a system optimised for efficiency over everything else, Varda said. Protocol Buffers is a sort of interface definition language (IDL) but IDLs have a reputation for being over-complicated, he said.

He said: "One of Protocol Buffers' major design goals is simplicity. By sticking to a simple lists-and-records model that solves the majority of problems, and resisting the desire to chase diminishing returns, we believe we have created something that is powerful without being bloated."

He estimated the system is at least an order of magnitude faster than XML, while other Google documentation said Protocol Buffers can be parsed 20 to 100 times faster. The binary files produced by Protocol Buffers are three to 10 times smaller than a comparable XML file, Google said. Google released an FAQ detailing Protocol Buffers, along with source code for the Java, Python, and C++ protocol buffer compilers.

Google admitted that the system is comparable to long-established projects such as JavaScript Object Notation (JSON), which is often used in Ajax web programming. But JSON, like XML, is a human-readable text format, rather than a binary format such as Protocol Buffers, a fact that reduces JSON's efficiency, Google said.

Even so, Google was criticised on some fronts for creating its own system from scratch and ignoring currently existing approaches. David Golightly, user experience developer lead for Zillow.com, argued the textual syntax used in Protocol Buffers could have been made interoperable with an existing text-based format.

Golightly said in a blog post: "I'm always just a little disappointed when someone goes about creating their own new textual format syntax on arbitrary grounds, rather than adapting an existing format to their needs."

Google is not the first to open source its internal data interchange system: Protocol Buffers is very similar to the Thrift framework, developed by Facebook and now an open-source project in the Apache Software Foundation Incubator. Thrift, however, differs in that it describes services rather than pure data.

Original article: Google open sources 'Protocol Buffers' from ZDNet UK

  1. Zones
  2. Management
  3. Networks
  4. Software
  5. IT Services
  6. Hardware
  1. Verticals
  2. Public Sector
  3. Financial Services
  4. Retail & Leisure

Seb Janacek Minority Report: Here come the iPhone competitors Should Apple be afraid?

Peter Cochrane Peter Cochrane's Blog: Screen time Will the smaller screen take over - just as the PC eclipsed TV?


  • Jobs
Market Risk VP, Tier 1 Investment Bank, City of London

The house, based in the heart of the city, currently trades on oil, UK gas, UK emissions and continental emissions, with expansion into UK power, ...

Post Sales Support Engineer - Software - CAD - Cambridge - East Mids

Exciting new vacancy for a Post Sales Support Engineer within the UK's leading re-seller of design software in the UK. As a post sales support ...

A/P- Middletier- C#- Structured Products- S- 65,000-80,000

A leading credit structured products desk based in the city of London are looking for a strong senior developer to work on their front-office trading ...

Agenda Setters 2008
Welcome to the ninth annual Agenda Setters poll – silicon.com's list of the top 50 most influential individuals in the technology and IT industries, from techies and CIOs to entrepreneurs and business leaders. Find out more in our latest special report.





Quick Sitemap Links: