Not The Wizard

Oz Solomon's Blog

Month: February 2012

A Story About Positive Feedback

Let’s play a little social psychology guessing game, and see how well you do.

Imagine you have an app on Facebook where users can hit a Contact Us link and write you anything that’s on their mind.  Further imagine that you get between 7,000 and 10,000 such contact requests a year.

How many of those users do you think clicked on Contact Us to say something nice?

Before I tell you the answer, let me be clear that all the numbers are real.  Status Shuffle has a lot of users, and those users write us that much every year.  In fact, I’m surprised they don’t write us more given the sheer usage volume Status Shuffle has.

The Shocking Truth

The answer is one.  One person a year writing something nice.  That’s less than 0.02% of contacts.  That’s a statistical anomaly.  We actually got one last week, and I can’t even remember the last time it happened before that.

When positive feedback arrives at our inbox, it’s such a celebration that it immediately gets bubbled up he same escalation path reserved for those OMG-we’re-down-and-the-whole-company-needs-to-know tickets.  And I doubt our users or product are unique.  I remember the same exact thing happening at a previous employer.

It’s All About the Effort

I don’t think people lack positive feelings about products they use, but it’s as though it’s not worth the effort to go out of one’s way to say so.

Point at hand: As soon as we got the aforementioned nice email, I posted it on our Facebook page and quickly got 5,000 likes and many positive comments.  Thumbs up is easy, so people do it.

Interestingly, inside Status Shuffle users are given the opportunity to thumbs up or down content.  It’s the same amount of effort.  We consistently see 10 thumbs up for every thumb down.  So given equal feedback mechanisms, both effortless, people tend to be positive.

Why are people are more inclined to contact you in order to complain? Maybe it’s because we get nothing in return for just saying something nice, but if we complain we have a vested interest: We might get the company to change it’s product.  I’m sure this guy or his friends have research about this.

What Have We Learned?

Wouldn’t it be nice if right about now I had an actionable recommendation for you?  Well, I have two.  The first is a bit weak and the second you won’t like.

My first advice is to make it as frictionless as possible for your customers to show their love for you.  It may be as simple as popping up a single question dialog: Do you love us?  Yes?  No?  Aggregate the result and share with your team.  Hopefully they’ll smile at the results.

My second advice is to go out and spread the love.  Your love.  When was the last time you wrote a nice blog post about a positive experience?  When was the last time you left a nice review for a restaurant you frequent or a book you enjoyed?  We all need to help the world lose the negativity bias.

As for myself, I’m as guilty as anyone1.  That, I’m going to fix with the next post.

 

1 In fact, I plan to use this very blog as a platform to complain and rant about many things.

nginx Reverse Proxy Can Cause IE to Fail

Some Background

For years I had Apache serving up Status Shuffle.  It wasn’t perfect, but it worked.  In fact, it worked for so well that it handled a million users a day on one box with plenty of room to spare.  However, in late 2011, Facebook started requiring HTTPS support from all it’s publishers, us included.  We bought an SSL certificate, made the necessary configuration changes, then restarted Apache.  It all seemed to work as planned.

Over the next few days we watched our server logs closely and discovered that our error rate has gone up.  It seemed that the extra overhead caused by the SSL negotiation step was enough to dramatically increase the failure rate for some of our users (probably those with unstable Internet connections).  Ideally, we would use HTTP keep-alives to allow everyone to open the connection once and make multiple requests through it, thereby offsetting the SSL negotiation overhead.  Alas, trying to hold thousands of connections with Apache’s pre-fork MDM was a sure way to eat up all the RAM in our box.  Instead, I decided to put up an nginx box in front of Apache in a reverse proxy setup.

The idea is simple and well documented.  You use nginx to handle connections with the end users.  nginx in turn calls Apache to actually do the work, then finally hands off the data to the end user.  nginx can hold thousands of connections open, thus keep-alives are no problem.  (Note: Apache 2.4 was just released and it can nativity do all of this using “event” MPM.  Alas, “event” doesn’t support SSL connections so we’ll be sticking with this nginx setup for now).

Redmond, We Have a Problem

At first we didn’t realize anything was wrong.  As I mentioned above, the amount of connection issues we were logging went down dramatically so we were very happy.  But then the complaints started: “It doesn’t work” and “When I load Status Shuffle it looks funny”.  When more and more complaints started piling up, we couldn’t ignore them anymore, even though the app worked perfectly on every machine and browser we could get our hands on.

All we know was that all the people complaining were using Internet Explorer (versions, 6, 7, 8 and 9).  I added extra client side logging, and it appeared that IE was failing to execute our JavaScript files, while spitting out “Invalid character” errors.

A Google search regarding the “Invalid character” error failed to add clarity.  Some said this error would be returned if IE failed to download the file at all.  Some implicated an old (now fixed) bug in IE 6 where it wasn’t properly decoding gzip’ed web pages properly.  And that couldn’t be it because we were seeing the issue in Internet Explorer versions up to 9 (the latest version at the time of writing).

User Visible Symptoms

We finally caught a break when a user who reported the issue agreed to remoting session with us.  Over the course of an hour we poked around her computer, trying to figure out what’s wrong.  The strangest thing of all was that she shared the computer with her husband, and when she logged in through his account, the problem didn’t exist.  I concluded that there must be a corruption somewhere in her local user registry settings, but couldn’t tell what it was in the time I had with her.

As Dan and I poked around the user’s computer, we noticed IE was in fact downloading the JavaScript files.  In fact we were able to save them on to her desktop through IE, and they looked fine.  However, when we viewed them in the context of the application using the developer toolbar, they looked like a garbled binary stream.  It was obvious the browser wasn’t decoding the compressed response properly.

Piecing It Together

I had a theory that nginx’s proxy was causing the issue.  We quickly confirmed this by temporarily taking nginx out of the loop and hitting Apache directly.  The errors stopped coming in.

Scanning the nginx documentation for the proxy_cache directive revealed this innocuous looking sentence:

nginx does not handle “Vary” headers when caching.

The Vary header is used to tell caching proxies that a response is tied to a particular request header format.  For example, when your browser requests a web page, it will tell the web server that it will accept (understand) compressed results by using this request header:

The web server will then happily compress the page and return it with (at least) the following two response headers:

The first means that the response is compressed using gzip.  The second means: This response is only valid for requests that have the same Accept-Encoding value that you just sent me.

As stated in the documentation, nginx doesn’t handle the Vary response header.  The sensible thing to expect from nginx is that it would not cache responses that contain Vary.  Instead, what nginx does is cache the result of the first request and serve it to everyone, even if they don’t have the same request header.

As an experiment, I make a normal request through nginx using Firefox and got a compressed response as expected.  I then modified FireFox by changing the about:config setting network.http.accept-encoding to blank and reissued my request.  By setting network.http.accept-encoding to blank, I’m advertising to the server that I don’t support gzip or deflate encoding.  Surely enough, nginx served me the compressed cached copy and FireFox showed binary garbage instead of my JavaScript file.

I assume this is what was happening with IE for all those people.  I estimate that 0.5%-1% of our Internet Explorer users were affected.  If Status Shuffle didn’t have the massive request volume that it does, we would have probably never caught on to this.

Fixing It

I definitly consider this to be an nginx bug.  But at least there is an easy fix.  We moved all compression away from Apache and into nginx.

This means that in Apache, we removed:

And in nginx we added:

We now have no more errors and no more angry users.

 

Update Feb 29, 2012: I filed a bug against nginx 1.0.12 on the nginx bug tracker.  Maxim Dounin of nginx offered two interesting ways of working around the issue, so please follow the bug report link to read his comment.

© 2024 Not The Wizard

Theme by Anders NorenUp ↑