Bryan McQuade gave a great tutorial on understanding and optimizing web performance metrics. What I particularly appreciated is that he really started in the basement: in the TCP/IP stack, and in the hardware. Abstraction has served us well over the years; we've successfully pushed a lot of plumbing into the walls, where nobody has to look at it much. But this abstraction has brought its own problems. It's all too easy for a web developer to spend a whole career dealing with fairly abstract APIs and standards at (or even above) the top of the TCP/IP stack, and to forget what's going on at the lower levels. And to do so is tantamount to having water dripping out of the walls, but not knowing how to fix the plumbing. It's more important than ever to understand how a TCP connection gets started, the importance of packet sizes, how DNS affects performance, and more. As devices and networks get faster, users aren't getting more forgiving about performance, they're getting less.
On Tuesday (June 26) and Wednesday (June 27), Dr. Richard Cook and Mike Christian presented two related keynotes. As far as I know, there was no prior coordination between them, but they fit together perfectly. Cook talked about the difference between systems as imagined (or designed), and systems as found in the real world. As he said the surprise isn't that complex systems fail, but that they fail so rarely. We design for reliability, with multiple levels of redundancy and protection; but what we really want is resilience, the ability to withstand transients, and to recover swiftly when things go wrong. Can we build systems in advance (systems that we imagine) that have operational resistance, as found in the real world? That's the problem, for web development and operations, as well as for medical systems.
Mike Christian's keynote on Wednesday, "Frying Squirrels and Unspun Gyros," was almost the perfect complement. In addition to lots of disaster porn, he showed us the way out of the predicament, the difference between systems as imagined and systems as they actually are. We build data centers with plenty of backups: UPS supplies, generators, all that. These systems are all extremely well designed, and as imagined, they ought to work. But as Amazon conveniently demonstrated just two days after Velocity ended, they don't necessarily work in the real world. Christian pointed out that many data center outages are caused by problems in the backup systems; 29% are caused by UPS failures alone.
Given our poor track record at building systems that are really
reliable, and given that all our efforts at reliability only lead to
systems that are less reliable, what's the alternative? Move up the
stack, and build networks that are resilient: design software so that,
when one data center goes down, load is automatically shifted to
another data center in a different area. At that point, we can
question whether we need backup systems at all. If, when the Virginia
data center fails, load can shift to data centers in Oslo, Oregon, and
Tokyo until the Virginia data center comes back online, do we really
need to spend millions of dollars on backups that are actually making
our systems less resilient?
It's hard to leave Velocity without mentioning John Rauser's talk. Rauser is one of my favorite speakers, and his talk focusing on the London Cholera epidemic of 1854 was a masterpiece. We're all familiar with looking at summary statistics and ignoring the outlying data. Rauser demonstrated that the outliers are often the most important: they're exactly what you need to prove your point. In the context of operations, rather than epidemiology, outliers often show the appearance of new failure mode. Look at the tail of your data; that's where you'll get a preview of your next outage, even if you're not experiencing any problems now.
If you missed Velocity, you missed a great event. We're looking forward to seeing you next year, and in the meantime, building a faster, stronger web.
Photo: John Allspaw by O'Reilly Conferences, on Flickr
Related:
]]>Convergence of this sort seems inevitable because Hadoop, MPP Databases, and even Linux super-computing clusters all share at least a superficial architectural pattern — horizontally distributed nodes of compute and storage connected either by Gigabit Ethernet or fast interconnects. And more and more it is the heaviness (and host stickiness) of the large scale data hosted on them that is driving design.
I couldn't find Lei Chang's slides, but a previous talk given by Donald Miner of EMC makes it clear that data flexibility is driving their work here (see slides 26-28). They are trying to provide an analytics platform that doesn't require organizations to host multiple SQL and M/R-specific copies of their data. Their Unified Analytics Platform includes the MPP Greenplum DB, Hadoop, and tooling, and today many of their customers presumably have to do just that — store the same data twice to access with both Hadoop M/R and SQL. Today you either continuously ETL it back and forth or rely on slow and inflexible choices like external tables and Hive to access it in place.
At my previous company we sold some work that was designed to demonstrate Hadoop's power to contribute to corporate strategic analysis. The idea was to combine Hadoop with an MPP RDBMS (in this case we used Cloudera with Greenplum DB) to get the power of each. Hadoop could groom unstructured data for combination with a traditional transactional data warehouse for downstream SQL analysis. Hadoop could also be used to do analysis directly on un-structured or semi-structured data using non-traditional approaches, or to do analysis on very large cache's of traditional transactional data in a different timescale. The Greenplum DB environment would then provide SQL access to the combined stores of traditional transactional data and freshly groomed unstructured data.
We proved value in the approach, but it was unnecessarily complex because we had to store everything twice so that the SQL and M/R tribes within the group each have native access to everything. We also made use of GPDB external tables hosted in HDFS but performance suffered for queries involving that data.
At around the same time I was also working with a customer that already had a significant investment in a Linux super-computing cluster but was looking at moving some of their processing and analysis to a complementary Hadoop cluster. About half of the analysis they were running was amenable to Map/Reduce and the other half still required the more granular parallelism of MPI, but if two distinct clusters were required then all that data was going to have to be moved between processes. It would be a lot more interesting to leave the data where it is and simply shift the processes that were running on the nodes.
Data is getting heavier (more voluminous) relative to the networks that carry it around the data center (to the chips that process it, which also aren't really getting faster). So the low-energy state is moving inexorably toward stationary data and processed with mobile algorithms. Today algorithm mobility is hampered by the national borders that separate similar but different enough machine clusters. But Yarn (M/R 2.0), MPI on Hadoop, experiments like GOH, an evolving and improving Hive, Bulk Synchronous Parallel and a whole slew of other projects all hint at the possibility of a convergence toward a unified cluster of multi-use machines that will be able to expose all of the different kinds of data management capabilities currently resident in different system types. Or, put another way, we'll see something like an E.U. of clusters with materially better algorithm mobility across borders that are defined by the data resident therein rather than the kinds of algorithms that they can host.
It's easy to imagine a future where large clusters of like machines dynamically adapt between SQL, M/R, MPI, and other programming paradigms depending on a combination of the resident data and the required processing. Regions of nodes will "express" themselves as Hadoop, MPP SQL or whatever at different times depending on what the data needed, without having to move it across slow networks and I/O.
I guess I'm describing a kind of data utopia where this perfectly homogenized cluster supports every algorithm class against every data type. And we all know that never happens. After all, we never even came close to the master-data-managed every-data-once third-normal-form relational world either. But we did have a well-understood ideal to pursue however imperfectly. Maybe this data-centric convergent cluster becomes that ideal for our emerging era of co-mingled structured/semi-structured/unstructured data.
Related:
]]>In honor of the third health care track at the O'Reilly Open Source Convention, I invite everyone to join me in five ways to have a healthy conference.
No elevators, no escalators. Given the distance between floors at the Oregon Convention Center, regular stair use will tone you up.
Most people will need to sit during the conference talks and meals, so that everyone can see, but during in-between times, visit the exhibitors or just hang about in the halls and chat with your fellow OSConers.
The conference caterers provide ample opportunities for healthy eating.
Now I'm beginning to ask for difficult commitments. Don't be seduced by free drinks; visit the vendor reception but get home early for a sleep that will keep you perky through the next day.
Bring your running or walking shoes to join Nate DiNiro and friends in the "Couch to Quantified Self" run. Walkers are welcome too.
For starters, advanced software cannot emerge from the mind of one programmer. There are simply too many pieces and too many needs that must be addressed. And that vision of the "lone coder" conjuring brilliance in a dark room? It simply doesn't map to the realities of modern development. Programming is largely team-based — either in-person or virtual, official or ad hoc — and that means people need to communicate to get things done.
Brian Fitzpatrick (@therealfitz) and Ben Collins-Sussman (@sussman), authors of Team Geek and speakers at next week's OSCON, discuss the intricacies of software collaboration and communication in the following interview. They also explain the difference between "leaders" and "managers" and they consider how the learn-to-code movement will shape future development teams.
Brian Fitzpatrick: It was something that we were doing, but not really giving a lot of thought to because it came naturally to us. We just saw collaboration merely as a part of getting things done. But when we started to think about it and discuss it, we realized that understanding how to work with people makes you a more efficient and more effective programmer. We understand that there are cases where you can brute force something through technical acumen alone, but there are many other situations where even that isn't sufficient.
Think of collaboration skills as a lever — there are many occasions where you can lift something heavy by sheer strength, but if you use a lever and fulcrum, you can accomplish the same thing with considerably less effort. We see collaboration as another tool that's as important as your compiler or text editor.
Ben Collins-Sussman: We believe that the "history of lone coders" is really an elaborate mythology in our industry. Linus Torvalds didn't write Linux — he wrote a prototype kernel and then spent the next 19 or 20 years coordinating volunteers to write the other 99% of the OS. Bill Gates wrote a BASIC interpreter for early home computers, but he didn't create MS-DOS by himself; he coordinated a whole company to expand, sell, and support it. No matter how great your initial idea is, there isn't a single piece of widely-used software out there that wasn't created by a team of dozens (or hundreds) over many years. So, the upshot is that modern developers need to shake off the myth of the "lone genius" and embrace the maxim that software development is a team sport, and always has been.
Ben Collins-Sussman: Working in isolation is an extremely high-risk activity. In the early stages of a project, it's really easy to go off the tracks without realizing it. A programmer would never write 10,000 lines of code and then hit the "compile" button for the first time; he or she writes a bit, compiles, writes a bit more, compiles. A software project has to develop the same way — in a tight feedback loop with trusted collaborators that are sanity-checking each step forward. People working in a cave almost always make a bad design choice early on and discover far too late that it's unfixable. Or often, they just end up reinventing wheels without realizing it.
Brian Fitzpatrick: Many programmers perceive non-programmer jobs as easy or trivial because they don't require deep technical knowledge to perform. This is often exacerbated when they meet non-programmers who aren't very good at their jobs, are overconfident, or just plain treat programmers poorly. The most common example of this is the "crappy salesperson," and many programmers base their opinions of non-technical people off of these encounters. Beyond that, many programmers look at these roles and think "anyone can do that; that's easy" because on the surface, no "hard" technical skill is required as part of the job. The truth is often that not only is there a lot of skill and effort required to do these jobs well, but that many programmers can't do these jobs well — or in many cases, at all!
We've been fortunate in our careers (and especially at Google) to work with some of the most amazing, brilliant, skilled and friendly non-technical people. When you're surrounded by amazing salespeople, lawyers, PR folks, and managers, it's hard for even the most jaded programmer to continue thinking that these non-technical jobs don't require skill or effort.
Ben Collins-Sussman: In our book, we talk about effective tools for communication within a programming team. Some of these tools are specific to software engineering — such as how to effectively use chat rooms, bug trackers, write documentation, etc. But many of our recommendations are useful for communicating "outward" from the team. We talk about how to run effective meetings, gain agreement on project goals, how to structure emails to get exactly what you need from VPs. Collaboration is a skill that needs to be actively practiced at all levels.
Ben Collins-Sussman: This is a common scenario we refer to as the "accidental manager." Some people are naturally charismatic and end up leading a group unintentionally (often pulled in by a power vacuum). The most important thing to do is not panic. "Manager" has become a dirty word among programmers. We instead advocate leadership — and by "leader," we mean someone who clears roadblocks for the team. A manager tries to tell a team how to do its job, which is almost always a bad idea, while a leader worries about what the team is working on and how easy it is for the members to be productive and happy. Be a leader, not a manager.
Brian Fitzpatrick: I think it will if both technical and non-technical people are open-minded about it. In my experience, learning more about another job often leaves me with a lot more respect for the skill and effort involved in doing that job well. Having spent the last four years restoring my 100-year-old house, I've tried my hand at a lot of do-it-yourself projects as well as hired people to do even more projects. The one universal truth that I've found is that after spending a few days doing something myself (whether it's painting, wood-stripping, drywall, or carpentry), I have a much greater respect for the work that the pros do because I begin to understand that there are many techniques and nuances that make it "look easy" to people who don't do that line of work.
I'd like to see a movement to introduce non-technical jobs (e.g. legal, sales, communications) to more technical people as a means to give them a greater understanding of just how hard it is to do those jobs well.
This interview was edited and condensed. Associated photo on home and category pages: Big red lever by moonlightbulb, on Flickr
Related:
]]>Michael B. Farrell at the Boston Globe reported this week on a copyright case making its way through federal court that could change — or establish — copyright laws regarding reselling used copies of digital media, such as music, books and movies.
As Farrell reports, the case involves a lawsuit Capitol Records brought against ReDigi.com (PDF), a website that houses a resale store for digital music. MIT computer science professor Larry Rudolph, who created the site with technology entrepreneur John Ossenmacher, told Farrell he just wants "people to treat virtual goods like physical goods."
Capitol Records, however, argues the digital-physical analogy. Farrell writes:
"'While ReDigi touts its service as the equivalent of a used record store, that analogy is inapplicable: used record stores do not make copies to fill their shelves," according to the record company's lawsuit, filed in January in federal court in New York. ... The company wants ReDigi to strip its recordings from its service and pay the maximum damages of $150,000 per song. ReDigi would not say how many songs it has resold, but about 100,000 people have used the service."
Wired's David Kravets reported on the case back in February when the courts refused to shut down the website:
"The brief ruling (.pdf) by U.S. District Judge Richard Sullivan of New York did not clearly outline the reason for the decision. But in a transcript (.pdf) of a court proceeding Monday, he said that Capitol is likely to prevail at trial."
Farrell reports that oral arguments are scheduled to begin in the fall.
In other copyright news, Mary Long at Mediabistro's All Twitter blog reports that Manhattan Criminal Court Judge Matthew Sciarrino Jr. ordered Twitter to turn over three month's worth of tweets from Malcolm Harris, an Occupy Wall Street protestor. Harris is charged with disregarding police orders, and prosecutors believe his tweets could verify this. In a statement to news organizations, a Twitter spokeswoman said the decision was disappointing and that "Twitter's Terms of Service have long made it absolutely clear that its users own their content." Judge Sciarrino disagreed — Tiffany Kary at Bloomberg News quotes from his ruling: "What you give to the public belongs to the public. What you keep to yourself belongs only to you."
Traditional publishers are taking very bumpy steps into an uncertain future, a future many of them fear won't include them. If your business is destined to become a relic replaced by forward-thinking startups, what better way to survive than to invest in such startups and own that which will replace you? Erin Griffith at PandoDaily took a look this week at Big Six publisher Macmillan, which is doing just that. Griffith writes:
"The company hired Troy Williams, former CEO of early e-book company Questia Media, which sold to Cengage. Macmillan gave him a chunk of money and incredibly unusual mandate:
Build a business that will undermine our own." [Emphasis included in original post.]
Griffith reports that Macmillan gave Williams more than $100 million to buy ed-tech startups for the new business, called Macmillan New Ventures. She writes that "[t]he plan is to let them exist autonomously like startups within the organization as Macmillan transitions out of the content business and into educational software and services" in preparation for the day "textbooks go away completely."
You can read Griffith's entire report here and more about Macmillan New Ventures on the company's blog.
QR codes made a few appearances in publishing news this week. First, Lauren Indvik at Mashable reports that Simon & Schuster (S&S) plan to add QR codes to the glossy jackets of their books starting with releases this fall. Consumers who scan a book's code will be taken to the author's page on the S&S website, where the publisher hopes they'll sign up for newsletters and poke around for other books. Indvik writes that S&S executive vice president and chief digital officer Ellie Hirschhorn says the move is "designed as a low-budget marketing technique."
But will it work? Laura Hazard Owen at PaidContent writes that it may be worth a try, but quotes a Bloomberg Businessweek article that indicates consumers don't care about the codes. Indvik also addresses this question and cites statistics from comScore.com that not only show a low scan rate, but also that scanning consumers were 18 to 34-year-old males — not normally publishing's strongest demographic. All the same, Indvik reports that "Hirschhorn says many publishers who have provided shortcuts to videos within their books have already seen 'good results'."
If a QR link to a corporate website isn't the best idea, what about a link to the first chapter of a book? Julieta Lionetti at Publishing Perspectives reports this week on the recent Bibliotren project in Catalonia, Spain, a campaign sponsored by Random House Mondadori and the Catalan Government Railways that placed QR codes in 10 cars on 10 different trains. The codes led train passengers to the first chapters of 40 different books. The project not only proved a success with riders, but also turned out to be a valuable tool for Random House. Paxti Beascoa, marketing and business director at Random House Mondadori, told Lionetti, "QR codes are trackable and offer a valuable opportunity to gather user data. We have gained a treasury of consumer insight, much of which challenges previous assumptions about our readers."
Related:
Walking the tightrope of visualization criticism
A creative field, such as visualization, will have many different interpretations and perspectives. The resolution and richness of this opinion is important to safeguard.
You still need your own website
Brett Slatkin's hope for a federated social web hasn't worked out as expected, so he's shifting perspective from infrastructure to user behavior. Here he explains why you shouldn't abandon your website for third-party platforms.
Amazon as friend and foe
Greenleaf Book Group founder and CEO Clint Greenleaf shares a unique perspective on working with and competing against Amazon.
Lessons for ecommerce in store closings and old supply chains
An analyst says online commerce is a descendant of — and a return to — the circa-1900s catalog model popularized by Sears.
Sears catalog photo: A Vintage Sears Catalog Jewelry Page! - Free to use! by HA! Designs - Artbyheather, on Flickr
OSCON 2012 — Join the world's open source pioneers, builders, and innovators July 16-20 in Portland, Oregon. Learn about open development, challenge your assumptions, and fire up your brain. Save 20% on registration with the code RADAR.
]]>Jeff Jordan, general partner at Andreessen Horowitz, wrote a guest post this week over at All Things Digital. He focuses on the disruption ecommerce is causing in the physical retail space — what he calls a "sea change in retail" — and makes an argument that ecommerce is going to wipe out physical retail across industries.
Jordan shares some statistics from the U.S. Census Bureau's Annual Retail Trade Survey and takes a look at big box retail's decline in comp store sales. He offers the bankruptcies of Circuit City, B. Dalton, Waldenbooks and Borders as examples and highlights the situation Best Buy has found itself in, with its No. 1 and No. 3 sales categories down 11% and 37%, respectively, over two years. Jordan writes:
"Relatively small declines in comp store sales, if sustained, can quickly prove fatal to physical retailers due to this leverage. The Circuit City example is instructive here. Its bankruptcy was preceded by just six quarters of declining comp store sales. ... Continued share gains by e-commerce players shrink the pie available to physical retailers. Marginal physical players go bust, providing only a temporary boost to the remaining offline players and a sustaining boost to online players. But the underlying market dynamics stay the same, and pressure again builds on the remaining physical players. When their top-lines drift below their highly leveraged water lines, they too drown and liquidate. At that point, e-commerce becomes about the only place where consumers seeking a broad selection of merchandise can go. It's essentially unopposed."
In a similar vein, Michael Hsieh over at Pando Daily took a look at the rise of ecommerce from a supply chain perspective. He argues that online retail isn't a new sales model, but a return to the catalog commerce model that Sears used to revolutionize retailing in the early 1900s. As Hsieh describes the cycle, Sears stocked items typically purchased at high prices in local general stores and made them available at cheaper prices and by delivery both to city and rural populations alike. This model crushed the local general stores. Sears then opened large format stores, which together with the rise of the automobile crushed the catalog model. Today, online retailing is circling back to begin the cycle again. Hsieh writes:
"[W]hen we look at online retailing, it is actually not a new phenomenon but the re-emergence of a previous catalog model with new and more powerful capabilities. The digital format offers unlimited product selection, and more importantly, the scale and coverage of logistics companies like UPS and FedEx have significantly driven down the cost of home delivery. Today, most online merchants deliver products for free, and this makes online shopping much more compelling."
Both posts are compelling reads. You can read Jordan's post here and Hsieh's post here.
Several stories over the past couple weeks have indicated NFC technology may be heating up in the U.S. This week, it's Europe's turn. On Monday, Deutsche Telekom (DT) and MasterCard announced a mobile payment partnership that will allow consumers to make purchases via NFC-enabled mobile phones and devices. Ryan Kim at GigaOm reports:
"The mobile payments system will incorporate a SIM-based chip and will work with a mobile wallet. It will roll out first in Poland while in Germany, DT and MasterCard will start with a trial using NFC stickers and cards this year before a launch in the first half of next year. Eventually, the service will spread out across DT's European footprint."
Kim also notes this wallet partnership is attempting to address many of the issues that have been plaguing the NFC ecosystem so far. First, the "mobile wallet service will be open to other banks and partners, who will be able to access the wallet," so it looks like it won't be a closed, proprietary platform. Analysts also told Kim that DT is "acting as a sales partner for NFC enabled Point of Sale terminals" and is issuing NFC tags for consumers who don't own NFC-enabled mobile devices.
Additionally, Bloomberg reported this week that DT is in talks with Google about partnering with DT's mobile payment system. Thomas Kiessling, DT's head of innovation, confirmed discussions with Google in the Bloomberg interview, but declined to offer further details. It's unclear if Google's Wallet product will figure into the system.
China RFID released a new line of NFC keychains this week designed to work with iPhones and Android phones that lack NFC technology. The shelf life of this particular product may appear to be short, given the growing list of smartphones shipping with NFC technology and the likelihood that a new NFC-enabled iPhone will arrive this fall. Nonetheless, this type of product could be an answer to a global mobile commerce and mobile wallet conundrum: most worldwide cellphone users aren't carrying smartphones.
I've written here before about this dilemma, quoting Nick Hughes at TechCrunch, that though 85% of the world's population carries cellphones, 4.5 billion people aren't using smartphones. If the technology used in these keychains — or some other sort of item that commonly accompanies or even attaches to one's cellphone — were designed to work with any cellphone, perhaps one hurdle to worldwide mobile payments could be overcome.
News tips and suggestions are always welcome, so please send them along.
Related:
Be sure to check out pages 373 to 377 of Cooking for Geeks for warnings about using liquid nitrogen, advice on where and how to buy it, the science behind the recipe in this podcast, and other culinary applications.
Related:
]]>