SEO for Software Companies

This is a rough outline of my verbal remarks while giving a presentation at the Software Industry Conference.  Regular readers of my blog will notice it has a lot of overlap with previous posts on the topic, but I thought posting it would save presentation watchers from having to take copious notes on URLs and expand the reach of the presentation to people who couldn’t attend this year.

Brief Biography

My name is Patrick McKenzie.  For the last six years I was working in Japan, primarily as a software engineer.  In the interim, I started a small software company in my spare time, at about five hours a week, which recently allowed me to quit my day job.  Roughly half my sales and three quarters of my profits come as a result of organic SEO, and the majority of the remainder come from AdWords.  If you need to know about AdWords, talk to Dave Collins, who is also attending. This presentation is about fairly advanced tactics — if you need beginner-friendly SEO advice, I recommend reading SEOMoz’s blog or taking a look at SEOBook.

Bingo Card Creator makes bingo cards for elementary schoolteachers.  This lets them, for example, teach a unit on chemistry and then, as a fun review game, call out the names of compounds like “ozone” and have students search out the chemical symbols on their bingo cards.  Students enjoy the game because it is more fun than drilling.  Teachers enjoy the game because it scales to any number of students and slides easily into the schedule.  However, making the cards by hand is a bit of a pain, so they go searching on the Internet for cards which already exist and find my website.  I try to sell them a program which automates card creation.

SEO can be a powerful tool for finding more prospects for your business and increasing sales.

SEO In A Nutshell

People treat SEO like it is black magic, but at the core it is very simple: Content + Links = You Win.

Content: Fundamentally, users searching are looking for keywords, and Google wants to send searchers to content which is responsive to the intent of the searcher.  Overwhelmingly, this means to content directly responsive to the keywords.  This is particularly true on the long tail, meaning queries which are not near the top of the query frequency distribution.  Many more people search for “credit cards” than for “How do I make a blueberry pie?”

For the most popular queries, the page that ranks will likely not be laser-targeted on “credit cards”.  However, for the long tail, a page that is laser-targeted will tend to win if it exists.  The reason is that Google thinks that your wording carries subtle clues of your intent, so it should generally be respected.  Someone looking for “How to make a blueberry pie?” isn’t necessarily as sophisticated a cook as someone who searches for “blueberry pie recipe” — they might not even be looking with the intention of making a blueberry pie, but rather out of curiosity as to how it is made, and so a recipe does not directly answer their intent.

Links: With billions of pages on the Internet, there needs to be a way to sift the wheat from the chaff and determine who wins out of multiple close pages.  The strongest signal for this is how trusted a site and how trusted a page is, and this is overwhelmingly measured by links.  A link from a trusted page to another page says “I trust this other page”, and the aggregate graph shows you which pages are most trusted on the Internet.  Note that trust is used as a proxy for quality because it is almost impossible to measure quality directly.

It is important to mention that links to one bit of content on your site help all other content — perhaps not as much as the linked content, but still substantially.  Wikipedia’s article on dolphins doesn’t necessarily have thousands of links pointing to it, but over their millions of articles like the History of the Ottoman empire, they have accumulated trust sufficient that a new page on wiki is assumed to be much better than a new page on a hobbyist’s blog.  Note that because Wiki ranks for nearly everything they tend to accumulate new citations when people are looking for someone to cite.  This causes a virtuous cycle (for Wiki, anyway): winners win.  You’ll see this over and over in SEO.

Despite this equation looking additive, SEO very rarely shows linear benefits.  Benefits compound multiplicatively or exponentially.  Sadly, many companies try to develop their SEO in a linear fashion: writing all content by hand, searching out links one at a time, etc.  We’ll present a few techniques to do it more efficiently.

The Biggest Single Problem

The biggest single problem with software company’s SEO is that they treat their website like a second class citizen.  The product gets total focus from a team of talented engineers for a year, and then the website is whipped up at 2 AM in the morning on release day and never touched again.  You have to treat your website like it was a shipping software product of your company.

It needs:

  • testing
  • design
  • strategic thought into feature set
  • continuous improvement
  • for loops

“For loops?”  Yes, for loops.  You’d never do calculations by counting on your fingers when you have a computer available to do them for you.  Hand-writing all content in Notepad is essentially the same.  Content should be separated from presentation — via templates & etc — so that you can reuse both the content and presentational elements.  Code reuse: not just for software.

Scalable Content Generation

Does anyone have a thought about how large a website’s optimal size should be?  10 pages?  A hundred pages?  No, in the current environment, the best size for a website is “as large as it possibly can be”, because of how this helps you exploit the long tail.  As long as you have a well-designed site architecture and sufficient trust, every marginal topic you cover on your website generates marginal traffic.  And if you can outsource or automate this such that the marginal cost of creating a piece of content is less than the marginal revenue received from it, it makes sense to blow your website up.

This is especially powerful if you can make creation of content purely a “Pay money and it happens” endeavor, which lets you treat SEO like a channel like PPC: pour in money, watch sales, laugh all the way to the bank.  The difference is that you get to keep your SEO gains forever rather than having to rebuy them on every click like PPC.  This is extraordinarily powerful if you do it right.  Here’s how:

Use a CMS

The first thing you need to enable scalable content generation is a CMS.  People need to be able to create additional content for the website without hand-editing it.  WordPress is an excellent first choice, but you can get very, very creative with custom CMSes for content types which are specific to your business if you have web development expertise.

Note that “content” isn’t necessarily just blog posts.  It is anything your customers perceive value out of, anything which solves problems for them.  That could be digitizing your documentation, or answering common questions in your niche (“How do I…” is a very common query pattern), or taking large complex data sets and explaining their elements individually in a comprehensible fashion.  Also note that it isn’t strictly text: you can do images and even video in a scalable fashion these days.

For example, using Flickr Creative Commons search, you can tap millions of talented photographers for free to get photos, so illustrating thousands of pages is as simple as searching, copying, and crediting the photographer.  You can use GraphicsMagick or ImageMagick to create or annotate images algorithmically.  You can use graphing libraries to create beautiful graphs from boring CSV files — more on that later.

The reasons why you’d use a CMS are they make content easy to create and edit, so you’ll do more of it.  Additionally, by eliminating the dependency on the webmaster, you can have non-technical employees or freelancers create content for you.  This is key to achieving scale.  You can also automate on-page SEO optimization — proper title tags, interlinking, etc — so that content creators don’t have to worry about this themselves.

Outsource Writing

You are expensive.  English majors are cheap.  Especially in the current down economy, stay at home moms, young graduates, the recently unemployed, and many other very talented folks are willing to write for cheap, particularly from home.  This lets you push the marginal cost of creating a new page to $10 ~ $15 or lower.  As long as you can monetize that page at a multiple of that, you’ll do very well for yourself.  Demand Media is absolutely killing it with this model.

Finding and managing writers is difficult.  If you use freelancers and find good ones, hold onto them for dear life, since training and management are your main costs.  Standardize instructions to freelancers and find folks who you can rely on to exercise independent thinking.

You can also get content created as a service, using TextBroker.  Think of the content on your website as a pyramid: you have a few pages handwritten by domain experts with quality off the chart, and then a base of the pyramid which is acceptable but perhaps not awe-inspiring.  At the 4 star quality level, you can get content in virtually infinite quantity at 2.2 cents per word.  You can either have someone copy/paste this into your website or do a bit of work with their API and automate the whole process.

You can use software to increase the quality of outsourced content.  For example, putting a picture on it automatically makes it better.  You can automate that process so your editors can quickly do it for all pages.  You can remix common page elements — calls to action, etc — which are polished to a mirror shine with the outsourced content.  You can also mix content from multiple sources to multiply its effectiveness: if you have 3 user segments and 3 features they really value, that might be 9 pages.  (If you use 2 features per page, that is 18.  As you can see, the math is gets very compelling very quickly.)

Milk It

Now that you’re set up to do content at scale, you can focus on doing it well.  The best content is:

Modular: You can use it in multiple places on the website.  You paid good money for it.  If you can use it in two places, the cost just declined by half.

Evergreen: The best possible value for an expiration date is “never”.  Chasing the news means your content gets stale and stops providing value for the business.  Answering the common, recurring, eternal problems and aspirations of your market segment means content written this year will never go out of style.  That lets you treat content as durable capital.  Also, because it tends to pick up links over time, it will get increasing traffic over time.

The first piece of content I made for my website took me two hours to write.  It made $100 the first month.  Not bad, but why only get paid once?  It has gone on to make me thousands over the years, and it will never go out of style.

Competitively Defensible: One of the tough things about blog posts is that any idiot can get a blog up as easily as you can.  Ideally, you want to focus on content which other people can’t conveniently duplicate.  OKCupid’s blog posts about dating data are a superb example of this: they use data that only they have, and they’ve made themselves synonymous with the category.  No wonder they’re in the top 3 for “online dating”.  Proprietary data, technical processes which are hard to duplicate, and other similar barriers establish a moat around your SEO advantage.

Process-oriented: If something works, you want to be able to exploit it in a repeatable fashion.  Novelty is an excellent motivational factor and you can’t lose it, but novelty that can be repeated is a wonderful thing to have.  You also want to have a defined step where you see what worked and what didn’t, so that you can improve your efforts as you go on.

Tracking:

Track what works!  Do more of that!  Install Google Analytics or similar to see what keywords people are reaching for on your site.  Keywords (or AdWords data) are great sources of future improvements.  Track conversions based on landing page, and create more content based on the content which is really winning.  If content should be winning but isn’t, figure out why for later iterations — maybe it needs more external or internal promotion, a different slant, a different target market, etc.

Case Study

Getting into the heads of my teachers for a moment — a key step — most teachers have a lesson planned out and need an activity to slot into it.  For example, they know they have a lesson about the American Revolution coming up.  Some of them, who already like bingo, are going to look for American Revolution bingo cards.  If my site ranked for that, that would be an opportunity to tell them that they could use software to create not just American Revolution activities but bingo for any lesson if they just bought my software.

So I made a CMS which, given a list of words and some explanatory text, would create a downloadable set of 8 bingo cards (great for parents, less great for teachers) on that topic, make a page to pitch that download in, and put an ad for Bingo Card Creator on the page.  Note how I’m using this content to upsell the user into more of a relationship with me: signing up for a trial, giving me their email address, maybe eventually buying the software.

I have a teacher in New Mexico who produces the words and descriptions for me.  The pages end up looking like this for the American Revolution.  She produces 30 activities a month for $100, and I approve them and they go live instantly.  This has been going on for a few years.  In the last year, I’ve started doing end-to-end conversion tracking, so I can attribute sales directly to the initial activity people started with.

This really works.  Some of the activities, like Summer bingo cards or Baby Shower Bingo cards, have resulted in thousands of dollars in sales in the last year.  $3.50 in investment, thousands in returns.  And there is a long tail of results:

This graph shows the 132 of the 900 activities which generated a sale in the last year.  You can see that there is a long tail which each generated one sale — in fact, a hundred of them.  Sure, you might not think that Famous Volcano bingo cards would be that popular, but I’ll pay $3.50 to get a $30 sale as often as the opportunity is offered.  These will also continue producing value in the next years, as they already have over the last several years: note that roughly half of these which produced a sale in the last 12 months were written in 2007 or 2008.

This took only a week or two to code up, and now 5 minutes a month sending my check and a thank-you note to the freelancer.  I’ve paid her about $3,000 over the last few years to write content.  In the last year alone, it has generated well over $20,000 in sales.  If you do things this efficiently, SEO becomes a channel like PPC — put in a quarter, get out a dollar, redeploy the profits to increase growth.

Any software company can create content like this, with a bit of strategic thinking, some engineering deployed, and outsourced content creation.  Try it — you can do an experiment for just a few hundred dollars.  If it works, invest more.  (Aaron Wall says that one of the big problems is that people do not exploit things that work.  If you’ve got it, flaunt it — until it stops working.)

Linkbait

Linkbait is creating content intended to solicit links to your website.  This can be by exploiting the psychology of users — they show things to friends because they agree with them strongly, or they hate them.  They create links because it creates value for them, not value for you — it increases their social status, it flatters their view of the world, it solves their problems.

All people are not equal on the Internet: twenty-something technologists in San Fransisco create hundreds of times more links per year than retired teachers in Florida.  All else being equal, it makes sense to create more of your linkbait targeted at heavy linking groups.  They’ve been labeled the Linkerati by SEOMoz, and I recommend the entire series of posts on them highly.

Software developers have some unique, effective ways to create linkbait.  For example:

Open Source Software

OSS developers and users are generally in very link-rich demographics.  OSS which solves problems for businesses tends to pick up links from, e.g., consultants deploying it — they will cite your website to justify their billing rate.  That is a huge win for you.  There are also numerous blogs which cover practically everything which happens in OSS.

OSS is fairly difficult to duplicate as linkbait, because software development is hard.  (Don’t worry about people copying it — you’ll be the canonical source, and the canonical source for OSS tends to win the citation link.  Make sure that is on your site rather than on Github, etc.)

OSS in new fields in software — for example, Rails development the last few years — has landgrab economics.  The first semi-decent OSS in a particular category tends to win a lot of the links and mindshare.  So get cracking!  And keep your eyes open for new opportunities, particularly for bits of infrastructural code which you were going to write for your business needs anyhow.

Case Study: A/Bingo

I’m extraordinarily interested in A/B testing, and wanted to do more of it on my site.  At the time, there was no good A/B testing option for Rails developers.  So I wrote one.  It went on to become one of the major options for A/B testing in Rails, and was covered on the official Rails blog, Ajaxian, and many other fairly authoritative places on the Internet.  It is probably the most effective source of links per unit effort I’ve ever had.

Some tactical notes:

  • Put it on your website.  You did the work, get the credit for it.
  • Invest in a logo — you can get them done very cheaply at 99designs.  Pretty things are trusted more.
  • Spend time “selling” the OSS software.  Documentation, presentation of benefits, etc.
  • OSS doesn’t have to be a huge project like Apache.  You can do projects in 1 day or 1 week which people will happily use.  (Remember, pick things which solve problems.)

Conclusion

I’m always willing to speak to people about this.  Feel free to email me (patrick@ this domain).

26 Comments

Speaking at Software Industry Conference

I’m currently in Dallas at the Software Industry Conference, where I’ll be giving a presentation about SEO strategies on Saturday.  In the meanwhile, if you’re at the conference or feel like coming out to the Hyatt Regency, feel free to get in touch with me.

As you have probably guessed, I’ll be posting the presentation and some textual elaboration on it right after I finish delivering it.  (I don’t know if video will be available this time.)

1 Comment

Running Apache On A Memory-Constrained VPS

Yesterday about a hundred thousand people visited this blog due to my post on names, and the server it was on died several fiery deaths. This has been a persistent issue for me in dealing with Apache (the site dies nearly every time I get Reddited — with only about 10,000 visitors each time, which shouldn’t be a big number on the Internet), but no amount of enabling WordPress cache plugins, tweaking my Apache settings, upgrading the VPS’ RAM, or Googling lead me to a solution.

However, necessity is the mother of invention, and I finally figured out what was up yesterday. The culprit: KeepAlive.

Setting up and tearing down HTTP connections is expensive for both servers and clients, so Apache keeps connections open for a configurable amount of time after it has finished a request.  This is an extraordinarily sensible default, since the vast majority of HTTP requests will be followed by another HTTP request — fetch dynamically generated HTML, then start fetching linked static assets like stylesheets and images, etc.  Look, of 43 requests, 42 were not the last request in a 3 second interval.  It is a huge throughput win.  However, if you’re running a memory constrained VPS and get hit by a huge wave of traffic, KeepAlive will kill you.

When I started getting hit by the wave yesterday, I had 512MB of RAM and a cap (ServerLimit = MaxClients) of 20 worker processes to deal with them.  Each worker was capable of processing a request in a fifth of a second, because everything was cached.  This implies that my throughput should have been close to 20 * 60 * 5 = 60k satisfied clients a minute, enough to withstand even a mighty slashdotting.  (That is a bit of an overestimation, since there were also static assets being requested with each hit, but to fix an earlier Reddit attack I had manually hacked the heck out of my WordPress theme to load static assets from Bingo Card Creator’s Nginx, because there seems to be no power on Earth or under it that can take Nginx down.)

However, I had KeepAlive on, set to three seconds.  This meant that for every 250ms of a worker streaming cached content to a client, it spent 3 seconds sucking its thumb waiting for that client to come back and ask for something else.  In the meantime, other clients were stacking up like planes over O’Hare.  The first twenty clients get in and, from the perspective of every other client, the site totally dies for three seconds.  Then the next twenty clients get served, and the site continues to be dead for everybody else.  Cycle, rinse, repeat.  The worst part was people were joining the queue faster than their clients were either getting handled or timeouted, so it was essentially a denial of service attack caused by the default settings.  The throughput of the server went from about 60k requests per second to about 380 requests per second.  380 is, well, not quite enough.

Thus the solution: turning KeepAlive off.  This caused CPU usage to spike quite a bit, but since the caching plugin was working, it immediately alleviated all of the user-visible problems.  Bingo, done.

Since I tried about a dozen things prior to hitting on this, I thought I’d quick write them down in case you are an unlucky sod Googling for Apache settings for your VPS, possibly Ubuntu Apache settings, or that sort of thing:

  • Increase VPS RAM: Not really worth doing unless you’re on 256MB.  Apache should be able to handle the load with 20 processes.
  • Am I using pre-fork Apache or the worker MPM? If  you’re on Ubuntu, you’re probably using the pre-fork Apache.  MPM settings will be totally ignored.  You can check this by running apache2 -l .  (This is chosen at compile time and can’t be altered via the config files, so if — like me — you just apt-get your way around getting common programs installed, you’re likely stuck.)
  • What should my pre-fork settings be then?

Assuming 512 MB of RAM and you are only running Apache and MySQL on the box:

<IfModule mpm_prefork_module>
StartServers          2
MinSpareServers       2
MaxSpareServers      5
ServerLimit          20
MaxClients           20
MaxRequestsPerChild  10000
</IfModule>
You can bump ServerLimit and MaxClients to 48 or so if you have 1GB of RAM.  Note that this assumes you’re using a fairly typical WordPress installation, and you’ve tried to optimize Apache’s memory usage.  If you see your VPS swapping, move those numbers down (and restart Apache) until you see it stop swapping.  Apache being inaccessible is bad, swapping might slow your server down bad enough to kill even your SSH connection, and then you’ll have to reboot and pray you can get in fast enough to tweak settings before it happens again.
  • How do I tweak Apache’s memory usage? Turn off modules you don’t need.  Go to /etc/apache2/mods-enabled.  Take note of how many things there are that you’re not using.  Run sudo a2dismod (name of module) for them, then restart Apache.  This literally halved my per-process memory consumption last night, which let me run twice as many processes.  (That still won’t help you if KeepAlive is on, but it could majorly increase responsiveness if you’ve eliminated that bottleneck.)  Good choices for disabling are, probably, everything that starts with dav, everything that starts with auth (unless you’re securing wp-admin at the server layer — in that case, enable only the module you need for that), and userdir.
  • What cache to use? WordPress Super Cache.  Installs quickly (follow the directions to the letter, especially regarding permissions), works great.  Don’t try to survive a Slashdotting without it.
  • Any other tips?  Serve static files through Nginx.  Find a Rails developer to explain it to you if you haven’t done it before — it is easier than you’d think and will take major load off your server (Apache only serves like 3 requests of the 43 required to load a typical page on my site — and two of those are due to a plugin that I can’t be bothered to patch).
  • My server is slammed and I can’t get into the WordPress admin to enable the caching plugin I just installed:  Make sure Apache’s KeepAlive is off.   Change your permit directive in the Apache configuration to

<Directory /var/www/blog-directory-getting-slammed-goes-here>

Options FollowSymLinks

AllowOverride All

Order deny,allow

Deny from all

Allow from <your IP address goes here>

</Directory>

This will have Apache just deny requests from clients other than yourself (although Apache will keep the connection open if you’re using KeepAlive, which won’t due you a lick of good since it will still hold the line open so that it can deny their next request promptly — don’t use KeepAlive).  That should let you get into the WordPress admin to enable and test caching.  After doing so, you can switch to Allow from All and then test to see if your site is now surviving.

Sidenote: If you can possibly help it, I recommend Nginx over Apache.  I use Apache because a couple of years ago it was not simple to use Nginx with PHP.  This is no longer the case.  The default settings (or whatever  you’ve copied from the My First Rails Nginx Configuration you just Googled) are much more forgiving than Apache’s defaults.  It is extraordinarily difficult to kill Nginx unless you set out to do so.  Apache.conf, on the other hand, is a whole mess of black magic with subtle interactions that will kill you under plausible deployment scenarios, and the official documentation has copious explanations of What the settings do and almost nothing regarding Why or How you should configure them.

Hopefully, this will save you, brave Googling blog owner from the future, from having to figure this out by trial and error while your server is down.  Godspeed.

19 Comments

Falsehoods Programmers Believe About Names

[This post has been translated into Japanese by one of our readers: 和訳もあります。]

John Graham-Cumming wrote an article today complaining about how a computer system he was working with described his last name as having invalid characters.  It of course does not, because anything someone tells you is their name is — by definition — an appropriate identifier for them.  John was understandably vexed about this situation, and he has every right to be, because names are central to our identities, virtually by definition.

I have lived in Japan for several years, programming in a professional capacity, and I have broken many systems by the simple expedient of being introduced into them.  (Most people call me Patrick McKenzie, but I’ll acknowledge as correct any of six different “full” names, any many systems I deal with will accept precisely none of them.) Similarly, I’ve worked with Big Freaking Enterprises which, by dint of doing business globally, have theoretically designed their systems to allow all names to work in them.  I have never seen a computer system which handles names properly and doubt one exists, anywhere.

So, as a public service, I’m going to list assumptions your systems probably make about names.  All of these assumptions are wrong.  Try to make less of them next time you write a system which touches names.

  1. People have exactly one canonical full name.
  2. People have exactly one full name which they go by.
  3. People have, at this point in time, exactly one canonical full name.
  4. People have, at this point in time, one full name which they go by.
  5. People have exactly N names, for any value of N.
  6. People’s names fit within a certain defined amount of space.
  7. People’s names do not change.
  8. People’s names change, but only at a certain enumerated set of events.
  9. People’s names are written in ASCII.
  10. People’s names are written in any single character set.
  11. People’s names are all mapped in Unicode code points.
  12. People’s names are case sensitive.
  13. People’s names are case insensitive.
  14. People’s names sometimes have prefixes or suffixes, but you can safely ignore those.
  15. People’s names do not contain numbers.
  16. People’s names are not written in ALL CAPS.
  17. People’s names are not written in all lower case letters.
  18. People’s names have an order to them.  Picking any ordering scheme will automatically result in consistent ordering among all systems, as long as both use the same ordering scheme for the same name.
  19. People’s first names and last names are, by necessity, different.
  20. People have last names, family names, or anything else which is shared by folks recognized as their relatives.
  21. People’s names are globally unique.
  22. People’s names are almost globally unique.
  23. Alright alright but surely people’s names are diverse enough such that no million people share the same name.
  24. My system will never have to deal with names from China.
  25. Or Japan.
  26. Or Korea.
  27. Or Ireland, the United Kingdom, the United States, Spain, Mexico, Brazil, Peru, Russia, Sweden, Botswana, South Africa, Trinidad, Haiti, France, or the Klingon Empire, all of which have “weird” naming schemes in common use.
  28. That Klingon Empire thing was a joke, right?
  29. Confound your cultural relativism!  People in my society, at least, agree on one commonly accepted standard for names.
  30. There exists an algorithm which transforms names and can be reversed losslessly.  (Yes, yes, you can do it if your algorithm returns the input.  You get a gold star.)
  31. I can safely assume that this dictionary of bad words contains no people’s names in it.
  32. People’s names are assigned at birth.
  33. OK, maybe not at birth, but at least pretty close to birth.
  34. Alright, alright, within a year or so of birth.
  35. Five years?
  36. You’re kidding me, right?
  37. Two different systems containing data about the same person will use the same name for that person.
  38. Two different data entry operators, given a person’s name, will by necessity enter bitwise equivalent strings on any single system, if the system is well-designed.
  39. People whose names break my system are weird outliers.  They should have had solid, acceptable names, like 田中太郎.
  40. People have names.

This list is by no means exhaustive.  If you need examples of real names which disprove any of the above commonly held misconceptions, I will happily introduce you to several.  Feel free to add other misconceptions in the comments, and refer people to this post the next time they suggest a genius idea like a database table with a first_name and last_name column.

544 Comments

Detecting Bots with Javascript for Better A/B Test Results

I am a big believer in not spending time creating features until you know customers actually need them.  This goes the same for OSS projects: there is no point in overly complicating things until “customers” tell you they need to be a little more complicated.  (Helpfully, here some customers are actually capable of helping themselves… well, OK, it is theoretically possible at any rate.)

Some months ago, one of my “customers” for A/Bingo (my OSS Rails A/B testing library) told me that it needed to exclude bots from the counts.  At the time, all of my A/B tests were behind signup screens, so essentially no bots were executing them.  I considered the matter, and thought “Well, since bots aren’t intelligent enough to skew A/B test results, they’ll be distributed evenly over all the items being tested, and since A/B tests measure for difference in conversion rates rather than measuring absolute conversion rates, that should come out in the wash.”  I told him that.  He was less than happy about that answer, so I gave him my stock answer for folks who disagree with me on OSS design directions: it is MIT licensed, so you can fork it and code the feature yourself.  If you are too busy to code it, that is fine, I am available for consulting.

This issue has come up a few times, but nobody was sufficiently motivated about it to pay my consulting fee (I love when the market gives me exactly what I want), so I put it out of my mind.  However, I’ve recently been doing a spate of run-of-site A/B tests with the conversion being a purchase, and here the bots really are killers.

For example, let’s say that in the status quo I get about 2k visits a day and 5 sales, which are not atypical numbers for summer.  To discriminate between that and a conversion rate 25% higher, I’d need about 56k visits, or a month of data, to hit the 95% confidence interval.  Great.  The only problem is that A/Bingo doesn’t record 2k visits a day.  It records closer to 8k visits a day, because my site gets slammed by bots quite frequently.  This decreases my measured conversion rate from .25% to .0625%.  (If these numbers sound low, keep in mind that we’re in the offseason for my market, and that my site ranks for all manner of longtail search terms due to the amount of content I put out.  Many of my visitors are not really prospects.)

Does This Matter?

I still think that, theoretically speaking, since bots aren’t intelligent enough to convert at different rates over the alternatives, the A/B testing confidence math works out pretty much identically.  Here’s the formula for Z statistic which I use for testing:

The CR stands for Conversion Rate and n stands for sample size, for the two alternatives used.  If we increase the sample sizes by some constant factor X, we would expect the equation to turn into:

We can factor out 1/X from the numerator and bring it to the denominator (by inverting it).  Yay, grade school.

Now, by the magic of high school algebra:

If I screw this up the math team is *so* disowning me:

Now, if you look carefully at that, it is not the same equation as we started with.  How did it change?  Well, the reciprocal of the conversion rate (1 – cr) got closer to 1 than it was previously.  (You can verify this by taking the limit as X approaches infinity.)  Getting closer to 1 means the numerators of the denominator get bigger, which means the denominator as a whole gets modestly bigger, which means the Z score gets modestly smaller, which could possibly hurt the calculation we’re making.

So, assuming I worked my algebra right here, the intuitive answer that I have been giving people for months is wrong: bots do bork statistical significance testing, by artificially depressing z scores and thus turning statistically significant results into null results at the margin.

So what can we do about it?

The Naive Approach

You might think you can catch most bots with a simple User-Agent check.  I thought that, too.  As it turns out, that is catastrophically wrong, at least for the bot population that I deal with.  (Note that since keyword searches would suggest that my site is in the gambling industry, I get a lot of unwanted attention from scrapers.)  It barely got rid of half of the bots.

The More Robust Approach

One way we could try restricting bots is with a CAPCHA, but it is a very bad idea to force all users to prove that they are human just so that you can A/B test them.  We need something that is totally automated which is difficult for bots to do.

Happily, there is an answer for that: arbitrary Javascript execution.  While Googlebot (+) and a (very) few other cutting edge bots can execute Javascript, doing it on web scales is very resource intensive, and also requires substantially more skill for the bot-maker than scripting wget or your HTTP library of choice.

+ What, you didn’t know that Googlebot could execute Javascript?  You need to make more friends with technically inclined SEOs.  They do partial full evaluation (i.e. executing all of the Javascript on a page, just like a human would) and partial evaluation by heuristics (i.e. grep through the code and make guesses without actually executing it).  You can verify full evaluation by taking the method discussed in this blog post and tweaking it a little bit to use GETs rather than POSTs, then waiting for Googlebot to show up in your access logs for the forbidden URL.  (Seeing the heuristic approach is easier — put a URL in syntactically live but logically dead code in Javascript, and watch it get crawled.)

To maximize the number of bots we catch (and hopefully restrict it to Googlebot, who almost always correctly reports its user agent), we’re going to require the agent to perform three tasks:

  1. Add two random numbers together.  (Easy if you have JS.)
  2. Execute an AJAX request via Prototype or JQuery.  (Loading those libraries is, hah, “fairly challenging” to do without actually evaluating them.)
  3. Execute a POST.  (Googlebot should not POST.  It will do all sorts of things for GETs, though, including guessing query parameters that will likely let it crawl more of your site.  A topic for another day.)

This is fairly little code.  Here is the Prototype example


  var a=Math.floor(Math.random()*11);
  var b=Math.floor(Math.random()*11);
  var x=new Ajax.Request('/some-url', {parameters:{a: a, b: b, c: a+b}})

and in JQuery:


  var a=Math.floor(Math.random()*11);
  var b=Math.floor(Math.random()*11);
  var x=jQuery.post('/some-url', {a: a, b: b, c: a+b});

Now, server side, we take the parameters a, b, and c, and we see if they form a valid triplet.  If so, we conclude they are human. If not, we leave continue to assume that they’re probably a bot.

Note that I could have been a bit harsher on the maybe-bot and given them a problem which trusts them less: for example, calculate the MD5 of a value that I randomly picked and stuffed in the session, so that I could reject bots which hypothetically tried to replay previous answers, or bots hand-coded to “knock” on a=0, b=0, c=0 prior to accessing the rest of my site.  However, I’m really not that picky: this isn’t to keep a dedicated adversary out, it is to distinguish the overwhelming majority of bots from humans. (Besides, nobody gains from screwing up my A/B tests, so I don’t expect there to be dedicated adversaries. This isn’t a security feature.)

You might have noticed that I assume humans can run Javascript.  (My site breaks early and often without it.)  While it is not specifically designed that Richard Stallman and folks running NoScript can’t influence my future development directions, I am not overwrought with grief at that coincidence.

Tying It Together

So now we can detect who can and who cannot execute Javascript, but there is one more little detail: we learn about your ability to execute Javascript potentially after you’ve started an A/B test.  For example, it is quite possible (likely, in fact) that the first page you execute has an A/B test in it somewhere, and that you’ll make an AJAX call from that page you register your humanness after we have already counted (or not counted) your participation in the A/B test.

This has a really simple fix.  A/Bingo already tracks which tests you’ve previously participated in, to avoid double-counting.  In “discriminate against bots” mode, it tracks your participation (and conversions) but does not add them to the totals immediately unless you’ve previously proven yourself to be a human.  When you’re first marked as a human, it takes a look at the tests you’ve previously participated in (prior to turning human), and scores your participation for them after the fact.  Your subsequent tests will be scored immediately, because you’re now known to be human.

Folks who are interested in seeing the specifics of the ballet between the Javascript and server-side implementation can, of course, peruse the code at their leisure by git-ing it from the official site.  If you couldn’t care less about implementation details but want your A/B tests to be bot-proof ASAP, see the last entry in the FAQ for how to turn this on.

Other Applications

You could potentially use this in a variety of contexts:

1) With a little work, it is a no interaction required CAPCHA for blog commenting and similar applications. Let all users, known-human and otherwise, immediately see their comments posted, but delay public posting of the comments until you have received the proof of Javascript execution from that user. (You’ll want to use slightly trickier Javascript, probably requiring state on your server as well.) Note that this will mean your site will be forever without the light of Richard Stallman’s comments.

2) Do user discrimination passively all the time. When your server hits high load, turn off “expensive” features for users who are not yet known to be human. This will stop performance issues caused by rogue bots gone wild, and also give you quite a bit of leeway at peak load, since bots are the majority of user agents. (I suppose you could block bots entirely during high load.)

3) Block bots from destructive actions, though you should be doing that anyway (by putting destructive actions behind a POST and authentication if there is any negative consequence to the destruction).

11 Comments

The Most Radical A/B Test I've Ever Done

About four years ago, I started offering Bingo Card Creator for purchase.  Today, I stopped offering it.

That isn’t true, strictly speaking.  The original version of Bingo Card Creator was a downloadable Java application.  It has gone through a series of revisions over the years, but is still there in all its Swing-y glory.  Last year, I released an online version of Bingo Card Creator, which is made through Rails and AJAX.

My personal feeling (backed by years of answering support emails) is that my customers do not understand the difference between downloadable applications and web applications, so I sold Bingo Card Creator without regard to the distinction.  Everyone, regardless of which they are using, goes to the same purchasing page, pays the same price, and is entitled to use either (or both) at their discretion.  It is also sold as a one-time purchase, which is highly unusual for web applications.  This is largely because I was afraid of rocking the boat last summer.

The last year has taught me quite a bit about the difference between web applications and downloadable applications.  To whit: don’t write desktop apps.  The support burden is worse, the conversion rates are lower, the time through the experimental loop is higher, and they retard experimentation in a million and one ways.

Roughly 78% of my sales come from customers who have an account on the online version of the software.  I have tried slicing the numbers a dozen ways (because tracking downloads to purchases is an inexact science in the extreme), and I can’t come up with any explanation other than “The downloadable version of the software is responsible for a bare fraction of your sales.”  I’d totally believe that, too: while the original version of the web application was rough and unpolished, after a year of work it now clocks the downloadable version in almost every respect.

I get literally ten support emails about the downloadable application for every one I get about the web application, and one of the first things I suggest to customers is “Try using the web version, it will magically fix that.”

  • I’m getting some funky Java runtime error.  Try using the web application.
  • I can’t install things on this computer because of the school’s policies.  Try using the web application.
  • How do I copy the files to my niece’s computer?  By the way it is a Mac and I use a Yahoo.  Try using the web application.

However, I still get thousands of downloads a month… and they’re almost all getting a second-best experience and probably costing me money.

Thus The Experiment

I just pushed live an A/B test which was complex, but not difficult.  Testers in group A get the same experience they got yesterday, testers in group B get a parallel version of my website in which the downloadable version never existed.  Essentially, I’m A/B testing dropping a profitable product which has a modest bit of traction and thousands of paying customers.

This is rather substantially more work than typical “Tweak the button” A/B tests: it means that I had to make significant sitewide changes in copy, buttons, calls to action, ordering flow, page architecture, support/FAQ pages, etc etc.  I gradually moved towards this for several months on the day job, refactoring things so that I could eventually make this change in a less painful fashion (i.e. without touching virtually the entire site).  Even with that groundwork laid, when I “flipped the switch”  just now it required changing twenty files.

Doing This Without Annoying Customers

I’m not too concerned about the economic impact of this change: the A/B test is mostly to show me whether it is modestly positive or extraordinarily positive.  What has kept me from doing it for the last six months is the worry that it would inconvenience customers who already use the downloadable version.  As a result, I took some precautions:

The downloadable version isn’t strictly speaking EOLed.  I’ll still happily support existing customers, and will keep it around in case folks want to download it again.  (I don’t plan on releasing any more versions of it, though.  In addition to being written in Java, a language I have no desire to use in a professional capacity anymore, the program is a huge mass of technical debt.  The features I’d most keenly like to add would require close to a whole rewrite of the most complex part of the program… and wouldn’t generate anywhere near an uptick in conversion large enough to make that a worthwhile use of my time, compared to improving the website, web version, or working on other products like Appointment Reminder.

I extended A/Bingo (my A/B testing framework) to give a way to override the A/B test choices for individual users.  I then used this capability to intentionally exclude from the A/B test (i.e. show the original site and not count) folks who hit a variety of heuristics suggesting that they probably already used the downloadable version.  One obvious one is that they’re accessing the site from the downloadable version.  There is also a prominent link in the FAQ explaining where it went, and clicking a button there will show it.  I also have a URL I can send folks to via email to accomplish the same thing, which was built with customer support in mind.

I also scheduled this test to start during the dog days of summer.  Seasonally, my sales always massively crater during the summer, which makes it a great time to spring big changes (like, e.g., new web applications).  Most of my customers won’t be using the software again until August, and that gives me a couple of months to get any hinks out of the system prior to them being seen by the majority of my user base.

My Big, Audacious Goal For This Test

I get about three (web) signups for every two downloads currently, and signups convert about twice as well as downloads do.  (Checking my math, that would imply a 3:1 ratio of sales, which is roughly what I see.)  If I was able to convert substantially all downloads to signups, I would expect to see sales increase by about 25%.

There are a couple of follow-on effects that would have:

  • I think offering two choices probably confuses customers and decreases the total conversion rate.  Eliminating one might help.
  • Consolidating offerings means that work to improve conversion rates automatically helps all prospects, rather than just 60%.

Magic Synergy Of Conversion Optimization And AdWords

Large systemic increases in conversion rates let me walk up AdWords bids.  For example, I use Conversion Optimizer.  Essentially, rather than bidding on a cost per click basis I tell Google how much I’m willing to pay for a signup or trial download.  I tell them 40 cents, with the intention of them actually getting the average at around 30 cents, which implies (given my conversion from trials/signups to purchase) that I pay somewhere around $12 to $15 for each $30 sale.  Working back from 30 cents through my landing page conversion rate, it turns out I pay about 6 cents per click.

Now, assuming my landing page conversion is relatively constant but my trial to sale conversion goes up by 25%, instead of paying $12 to $15 a sale I’d be paying $9.60 to $12 a sale.  I could just pocket the extra money, but rather than doing that, I’m probably going to tell Google “Alright, new deal: I’ll pay you up to 60 cents a trial”, actually end up paying about 40 cents, and end up paying about 8 cents per click.  The difference between 6 and 8 will convince Google to show my ads more often than those of some competitors, increasing the number of trials I get per month out of them.  (And, not coincidentally, my AdWords bill.  Darn, that is a bloody brilliant business model, where they extract rent every time I do hard work.  Oh well, I still get money, too.)

We’ll see if this works or not.  As always, I’ll be posting about it on my blog.  I’m highly interested in both the numerical results of the A/B test as well as whether this turns out being a win-win for my customers and myself or whether it will cause confusion at the margin.  I’m hoping not, but can’t allow myself to stay married to all old decisions just out of a desire to be consistent.

15 Comments

Unveiling My Second Product (Demo Included)

Earlier this week, I went to a small massage parlor which is located at the mall next to my house.  There are three attendants on duty at any given time.  I was a “walk-in” (no appointment) and ordinarily would not have been able to see someone quickly, but luckily for me two clients had failed to make their appointments.  That was rather unlucky for the firm, though: two clients not at their appointments means two massage therapists not seeing anyone, and that costs them $2 a minute in direct economic losses.

What that business needs is some way to reduce the number of no-shows and get earlier warning when no-shows happen, so that they can rearrange the schedul, actively solicit walk-in customers, or invite one of their regular customers to have her weekly massage early.  That would minimize the lost revenue from missed appointments.  For example, they could call every client the day before their appointment.

But calling customers is an expensive proposition, when you think of it: three staff members times an average of 10 clients each per day means they’d need to make 30 phone calls a day.  That is thirty opportunities to hit the answering machine, thirty voicemails to leave, thirty “We’re sorry, our customer is not in the service area” messages to hear.  And, since there is no dedicated receptionist at the store, that is time that has to come from an expensive therapist — and when their hands are on a receiver, they aren’t working the knots out of someone’s shoulders for $1 a minute.

There are many, many businesses like my local massage parlor: massage therapists, hair salons, auto mechanics, private tutors, and large segments of the healthcare industry.  They all have an appointment problem…  and it is about to get a bit better.

Introducing Appointment Reminder

For the last several weeks, since quitting my day job, I’ve been hard at work on Appointment Reminder.  (You can tell that I haven’t lost the same panache for inspiring, creative names that bought you Bingo Card Creator.)  It is a web application that handles scheduling and automated appointment reminders via phone, SMS, email, and post cards.  The phone and SMS reminders are through the magic of Twilio, which lets you make and receive phone calls and SMSes using simple web technology.

My value proposition to my customers is simple:

  1. Schedule your appointments in the easy-to-use web interface.  That’s all you have to do.
  2. Prior to the appointment, we’ll automatically remind your customer of the date and time of their appointment.
  3. The customer will be asked whether they’re coming or not.  If not, we’ll notify you of that immediately, so that you can reschedule them and rescue the emptied slot.
  4. This makes you money.  Plus, it is another opportunity to touch your customers, hopefully improving your commercial relationship.

How This Is A Lot Like Bingo Cards

My assumption, which has been borne out a bit in talking to potential customers, is that the market for this sort of thing is overwhelmingly female.  The personal services industry is mostly female, and dedicated receptionists (who I am replacing making more efficient in one facet of their jobs) are almost universally female.  That is one point in common with the market for Bingo Card Creator.

In addition, the competition is similar to the competition for educational bingo cards in 2006:  they’re structurally incapable of addressing huge segments of the market and I am going to go after those segments with a vengeance.  For example, if you look for appointment reminder services online, you’ll find that most of them go after the healthcare market — that is, after all, where the money is.  (My dentist got $770 for 15 minutes of his time and 30 minutes from the dental hygienist — he’s losing a car payment every hour he doesn’t have his hands in a mouth.)

This is mostly sold as enterprise software, with the long sales cycle, non-transparent pricing, and general cruftiness to match.  It almost has to be, because the software has to plug into patient records systems (the typical enterprise morass of dead languages, horrible interfaces, and software which would have fallen to pieces years ago if it hadn’t sold its soul to the devil).  Additionally, healthcare is a very regulated industry, and compliance with HIPPA and other regulatory requirements inevitably drives costs up.

From my research, the cheapest options available cost about $300 a month as a service (often involving a contract with an actual call center) or ~$1,000 as installable software and hardware.  Those do not strike me as viable options for a hair salon, piano teacher, or small massage parlor.  However, thanks to Twilio incurring the capital expenditures on my behalf, I can afford to offer a superior service for a fraction of that price.

And because my cost structure is absurdly better than my competitors, I don’t need to have a sales force to close the deals.  Instead, I can use the skills I’ve built up over the last several years of selling B2C software, and consummate transactions online on the strength of passive sales techniques like a demo, free trial, and website.  My guess is that the low-friction nature of this is going to help me with the less enterprise-y segment of the market, as they’re least in the mood for “Give us your phone number, address, and financial particulars so we can have a salesman set up a meeting to talk to your office manager about how much this is going to cost you.”

Demo / Minimum Viable Product

Appointment Reminder is not actually ready yet.  Having been on something of a Lean Startup kick recently, I thought that getting the software ready prior to showing it around to customers would put the cart in front of the horse: why spend 2-3 months getting v1.0 of the software ready if it turns out that users are cool on the entire concept.  Instead, I took a bit of inspiration from Dropbox’s minimum viable product, which was just a video showing how awesome your routine tasks would be if the product actually worked.  (They had a working prototype at the time but not one which would 100% reliably keep people’s data, which is sort of a key consideration if you’re making a backup product.)

The way I figure it, since the demo of my software is what ultimately makes the sale, everything that happens after the demo is essentially irrelevant to getting someone’s credit card number.  So everything that happens after the demo is out of scope for the MVP: I can demonstrate the sizzle without actually cooking the steak.  The sizzle for Bingo Card Creator was cards coming off your printer.  The sizzle for Appointment Reminder is demonstrating that I can make a phone ring on command if you type a number into your computer.

I’ve been programming for more than a decade now and very, very rarely get the “kid in a candy store” feel these days from it, but the first time I made my phone ring with an API call, I got all kinds of giddy.  I’m figuring that my customers will likely be the same: this is new and magical territory for them.  And unlike a certain technology company which specializes in new and magical telephone equipment, this will credibly promise to make people money.

You can try the demo of Appointment Reminder yourself.  Get your cellphone out, you’ll need it.  The flow goes something like:

  1. Open the demo page.  (I may eventually capture email addresses here, but folks visiting in the first few weeks are mostly going to be my tech buddies, so I’ll hold off for now rather than collecting a lot of mailinator.com addresses.)
  2. Take a look at the simple calendar interface.
  3. Type in your phone number and hit Schedule Fake Appointment.
  4. Your phone rings and you get a combination sales pitch and product demo in the guise of informing you of your fake appointment.  At the end of the call, you’ll be given an option to confirm or cancel the appointment.
  5. As soon as you confirm or cancel the appointment, that is reflected on your computer screen.
  6. You’ll be asked for a conversion here.  At the moment, it just asks for your email address.  Once the site is live, I’ll be pushing for the sale right there.

In real life I’d be using a more immediate way to contact the customer than updating their web interface, since they won’t be on Appointment Reminder when their customer gets the phone call, but that didn’t make sense for the demo — I’ve already credibly demonstrated my ability to make the phone ring.  Now I just need to credibly demonstrate that I can get information from the phone calls to the computer.

Pricing

I have some tentative thoughts on pricing for the service.  This was mostly a marketing decision — I want to be able to say something similar to “Appointment Reminder will pay for itself the first time it prevents a no-show.”  There is also quite a bit of daylight between the value of an appointment among my various customer groups — for a low-end salon that might only be $10, for a massage therapist $50, and for a lawyer or dentist “quite a bit indeed.”

My plan breakdown is mostly to do price discrimination among those user groups:

  • Personal ($9 / month): This is for folks who either want to send reminders to themselves/family or folks who have a part-time business like piano tutoring on the side.  Candidly, I think the value of these customers is going to be minimal, but I wanted to have this plan available for marketing reasons.  (It gives me a shot at appealing to the web worker/productivity/etc blogging folks, for example.)
  • Professional ($29 / month): The bread and butter plan.  This is intentionally sized to be decent for a low-intensity full-time business, such as a hair salon or single massage therapist.  Of note, putting the ability to record custom reminders here rather than in the personal plan provides a strong incentive for folks to upgrade.
  • Small business ($79 / month): This is where I expect to get the majority of my revenues and profits from (notice how I recommend it?).  It should be sufficient to cover most businesses smaller than a busy dentistry practice.  Speaking of dentistry practices…
  • Enterprise ($669 / month): So here’s a trick I’ve learned in Japan: there are a million ways to tell people “no”.  One way to tell people “No, I don’t want your business if you’re in health care” is to make them check a box certifying they are not in healthcare at signup.  That increases friction and demonstrates contempt for potential customers.  Instead, I’ll say yes, I do want their business eventually (after I get the kinks out of my system and have hardened the security and legal representation enough to feel comfortable with soliciting their business), but it won’t be today and it won’t be cheap.

Common among all plans: the first month will be free (capture credit card on day 1, bill on day 30 for month #2, etc etc), and I’ll have my usual 30 day money-back guarantee.  I’d like to offer discounts for multi-month signups, but I think that Paypal may not be too keen about that until I have some history with this business, so I’ll be avoiding it.  (Oh, trivia note: Paypal Website Payments Pro + Spreedly for subscription billing.)

At these price points, the cost of providing the service (Twilio calls) would cap out at about 30% if customers routinely rode their plans to the limit.  On experience and knowledge of the industry, I think that is highly unlikely, and expect to pay something much closer to 5 ~ 10%.  Obviously, I’m never going to characterize this as a 20x markup on Twilio services to my customers, as my customers don’t care beans about Twilio: they care about making sure their expensive professionals don’t idle for lack of work.

Early Reception From Potential Customers

Sadly, I’m not in a position to get this localized into Japanese at the moment (Twilio doesn’t quite have first-class Japanese support, and I have severe doubts about my ability to market effectively domestically).  This means that I can’t exactly walk over to the massage parlors around town and ask them to try it out.  However, I’ve been talking to friends from high school who work in service industries to verify that my assumption of what their problems are is indeed accurate, lurking on message boards (the number of posts deleted for excessive vituperation about missed appointments suggests to me that there may indeed be a market need for this service), and have been doing keyword research.  There appear to be healthy volumes in the core keywords, although I wish it had a built-in longtail search strategy like bingo cards did.

This summer I’m going back to America for about a month to visit family, do a bit of consulting, and have something of a vacation.  Over that time, I’m going to be taking the demo (or the product) on my laptop and showing it to as many service providers as I can stomach seeing.  Thankfully, I expect that they’ll indulge my request for an interview — I intend to pay them their normal hourly, so coffee and a discussion of their industry and opinions about the software will work out just as well as offering a haircut/massage/etc would.

Business Plan

I didn’t write a formal business plan last time and I have no intention to devolve.  That said, I do intend on documenting and revisiting assumptions.  As usual, I’ll be doing most of that on my blog, so you guys are welcome along for the ride.  I’ll also continue my general transparency policy with regards to what works, what doesn’t, and what my statistics are looking like.  That probably won’t rate automatic reporting for a few months yet — no sense building things to track the sales I don’t have.

I’ve essentially got two notes for marketing and intend to be hitting both of them: SEO and AdWords.  AdWords is likely going to be a tough nut to crack in this market due to high spending by companies with very, very high average ticket prices, but most of my competitors do not strike me as extraordinarily web savvy, and I think I can out-think their outsourced SEO/SEM teams.

In terms of reasonably achievable goals, I’d like to have v1.0 of the service open and accepting money by the end of June.  I think that two hundred paying customers is a very achievable target for a year from now, although I’m still not sure yet how being full time affects my skill at marketing, so that might be understating things by a bit.  The last time I made a sales prediction for a new product was when I expressed the wild desire that Bingo Card Creator eventually sell a whole $200 per month.  I hope to be every bit as mistaken.

A Quick Request

I value your opinions tremendously.  If you have suggestions related to the business or forthcoming feature set, I’d love to hear them.  If you have any particular areas you’d like to hear about in my upcoming blog posts, I’d love to hear that, too.

I’m coming to the market several years behind most of my competitors and will be playing catchup for quite some time.  I would be indebted if you took a few minutes to blog about Appointment Reminder.

57 Comments

Interviewed by Andew Warner On Entrepreneurship [Video]

The interviewed I mentioned earlier got rescheduled due to technical difficulties, but it is now up on Mixergy’s site.  You can see it here.

Topics include:

  • Why would teachers want to play bingo anyhow?
  • How did you pull this off while full-time employed?
  • What is it like being a Japanese salaryman?
  • What is the next product?  (Spoiler: Not telling you yet, come back in May.)
  • How did you get traction early at the start?
  • How do you make your processes more reliable to maximize on the effectiveness of your time?

I’m pretty happy with how it came out, although given that it was about 2 in the morning when I recorded it due to time zone differences, sometimes my ability to speak in coherent sentences leaves a bit to be desired.  If you have any questions, feel free to comment here or there.

Comments Off

Peldi from Balsamiq Interviewed For An Hour

Peldi from Balsamiq, who is hugely inspiring to the rest of the uISV community and myself, was interviewed for over an hour earlier this week on Mixergy.  Go watch it.  Everything he says about customer service, building remarkable products, early marketing (his post on the subject contains some of the best advice I’ve ever read), and competition just knocks it out of the park.

For folks here who have been reading me for a while but do not know about Mixergy yet: Andrew Warner does interviews with successful Internet business folks.  Most of them are inspiring, and many have killer, actionable tips that you can use in your businesses.  (I particularly like the one with the Wufoo guys, Peldi’s, and this one by Hiten Shah of Kissmetrics and, earlier, CrazyEgg, which I’ve mentioned a time or three here.)

Andrew interviewed me earlier, too.  The interview and transcript will be up one of these days, after the editors have made me sound intelligible.  (It is amazing what you can do with computers!)

Comments Off

Dropbox-style Two-sided Sharing Incentives

Last weekend, among a whole schedule of other great presentations at the Startup Lessons Learned conference (you can watch the video here), the folks behind Dropbox had a presentation (video) about how they went about growing their business.  Apparently search ads were too expensive for them (due to bidding up by other venture-funded firms in their space) and the long tail of search was not panning out, but their referral program worked out really well for them.  Really, really superflously well for them.

(For those who haven’t used it, Dropbox is absurdly well-implemented file storage, backup, and sharing in the clooooooooooooud.  They have saved my life twice when hard drives died and save me hours of time schlepping files from my Windows PC to my Ubuntu box and back again.  Go try them out if you haven’t already — I can’t imagine not having them anymore.  Anyhow: the business model is “We’ll give you 2 GB of space for free, or you pay us to get more than that.”)

The biggest single thing about their referral program is that it has a two-sided incentive for sharing: the person who signs up for Dropbox through your referral link gets a better deal than they would have gotten from the homepage, and you also get a free bonus yourself.  That is marvelous, marvelous psychology there: it gives both parties a benefit so the email seems less like spam, and social relations being what they are, the person receiving the gift feels a wee bit obligated to accept it.  (This same dynamic is used by many of the social gaming companies on Facebook, to degrees which almost make me feel icky.)

This works great for Dropbox because they have a product which they can easily make more useful in a granular fashion: just add more space and stir.  The cost of the marginal space is truly miniscule as a cost of customer acquisition: a few pennies a month if the user actually fills it, and most will not.  However, the passionate freebie seeking techies in the audience will use their disproportionate-sized online megaphones to scrape another few gigs out of their account.

The more I thought about it, the more impressed with this idea I was.  I had considered and rejected “Tell a friend” as a marketing scheme for BCC a few years back, on the theory that it just creates more spam and few of my customers would use it, but the double-sided incentive addresses both of those issues for me.  Plus I thought I could potentially implement it very quickly.  (I had a funny idea for a minimum viable tell-a-friend page: just ask for the friend’s email address and then have it ping me when someone submits anything.  I’d send the emails and credit both users manually.  That would have taken my development time from about six hours to one hour, but I decided to do it the “right way” on the theory that a few hours isn’t much of a risk to me anymore.)

So I decided to test out a version of this for Bingo Card Creator.  Historically, I have given free trial users 15 bingo cards for free.  (This neatly segments my markets between parents, who very rarely have 16 children, and teachers and professional users, who rarely play bingo with under a dozen players.)  I’m allowing them to invite friends: each successful invite gives both parties 3 extra cards, with a cap at 12 gained from inviting.  This theoretically will allow a large portion of my core customer base to get their program for free, but I think that paying is ridiculously more efficient for most of them, so it will only be the truly inveterate skinflints who sign up four of their closest friends so that they can get 27 cards for class.

The cost of allowing users to print extra bingo cards is, of course, too low to measure.

This Feature Is Surprisingly Hard

I’m used to pushing changes which only require ten lines of code, but this feature was a monster:

  • Tell a friend page
  • Processing email addresses put into that form
  • Actually sending emails
  • Facebook sharing integration
  • Customized signup page
  • I-Can’t-Believe-They’re-Not-Affiliate URLs
  • Properly crediting people for signups
  • Anti-abuse measures (mostly making it so that folks can’t use it to spam)
  • Minimum viable stats tracking (no charts yet)

All in all, it took a solid day.

I’m really pleased with a couple of implementation choices:

Getting the user’s first name:

My gut feeling (yeah yeah, A/B test incoming) is that users will be overwhelmingly more likely to respond to an invitation from Jane than from jsmith@example.com or from “a friend.”  So I asked customers to provide it if they haven’t already.  It is totally optional but I’m thinking they’re overwhelmingly going to comply.

Highlighting the offer on the signup page:

I hit the social proof fairly hard on the signup page: mentioning again that Bob or whomever sent the invitation, and that both Bob and the user will benefit from accepting the offer.  This page could stand to be a lot prettier, and I could probably throw a testimonial in here somewhere…  In this example, our generous inviting user’s name is Bingo.

Facebook integration:

Recently, having spent far too much of my time playing Facebook games (market research, I swear!) and scouting out the ecosystem more, I’ve noticed something.  One, a quarter of the female members of my family aged 30+ are currently sheering sheep, planting pumpkins, or throwing pigs at each other.  Two, my friends seem to comment on things they share… a lot.  Whoa.  This whole Facebook thing might actually have legs.

It turns out that getting folks to share links on Facebook is child’s play: one line of code that you can copy/paste.  With a bit more work, you can customize the text Facebook will pull out of the page.  I customized the text to include a strong call to action with added social proof, naturally.  Facebook sharing: not just for blog posts.


One feature that I particularly like about the Facebook option is that it only requires two mouse clicks from users (one to open, one to confirm — assuming they’re already cookied on FB), and it doesn’t require them to understand or recall email addresses.  My users have enough problems remembering and managing their own email addresses — I don’t want to include a “look up Anne’s email address” step in the workflow.

Finding folks when they’re ready:

Here’s an idea ripped straight off of the better Facebook games: give folks an opportunity to share stuff right when they hit the wall.  (Well, most of the Facebook games artificially construct the wall such that you have to share to get around it…  but I’m not that tricky.)  For example, if someone wants to print 22 cards and only has 15 quota, that would be a great time to remind them of the incentive.

Metrics Tracking

At the moment I put in very, very basic stats tracking:

  • Who ever accessed the invite page.
  • Who sent invites via email, and how many.
  • How many folks signed up as a result of invites.  (No source tracking, but GA should show me that, easily — referrer is Facebook, etc.)
  • Daily counts of all of the above.
  • As you could probably have guessed, I turned this on via an A/B test and will be watching to see if it hurts conversion to purchases.  (This is just a first cut, since it is possible that shaving 10% off sales from inviters would be worth getting the invitees, if invitees turn out to convert well.)
  • I’ve laid the groundwork for tracking the viral coefficient, although I strongly, strongly suspect it will be far below 1.  (I am not promoting this very aggressively, at all.)

Future Directions

In addition to the obvious (testing to see if this actually works), I have a few ideas for how to improve this in the future.

One obvious thing which I will probably not do is to ask folks for their webmail login details, grab their contact lists, and assist them in selecting folks to receive emails.  That would be stupidly effective, but it teaches bad Internet practices (do not give your Gmail details to random websites!) and frankly I don’t want it to be that easy to send invites.  We’ve all got that one aunt who has not figured out netiquette for Farmville and sends 14 lost kittens a day: I do not want to be her enabler.

I’ll probably also work on placement of the offer to invite, copy on the invite page, invite email, and invitation signup page (including graphical design), and will do some much more sophisticated metrics on this if early results look promising.

Will It Work?

Your guess is as good as mine.  In favor of it working, most of my customers and many of my trial users are very thrilled with Bingo Card Creator.  Many of them have figured out how to share it with friends despite me not giving them any good way to do so (not a trivial thing for elementary school English teachers — one bragged to me that she found out how to make a link to the website on her desktop and then bring it to her sister on a floppy, and if they’re getting over barriers to conversion that high, you know there must be something going on).  There is also the natural penny-pinching nature of teachers operating in my favor — the fact that the program is not free is far and away my #1 user complaint — and the fact that they tend to travel in packs.

In favor of the idea not working out so well: these are not very plugged in people as compared to Dropbox’s early adopter userbase, the actual mechanics of sharing still require non-trivial technical expertise (understanding email addresses and knowing those of your friends, for the option I’m giving highest billing to), and there are non-trivial business risks if it either becomes too popular or if folks feel that the invitation emails are an imposition.

Speaking of which: I capped the number of invites I’ll send out per user at 5 (hard-capped at the moment), capped the number of invitations any individual will receive at 1, and have capped the system at a total of 500 a day until I have some idea of how many is safe to send.

Bug of the Year Award:

It is early, but I think this already won it: a poorly considered after_save callback on my user model caused users’ mailing list settings at MailChimp to be updated if their user record was updated.  That was previously desirable, since there was nothing in the user model which could be updated without touching either their email address or their mailing list settings, and all updates were at the user’s personal request.  However, when I put the users’ card limit in there, then updated that for 50,000 users to set it to the default value, the callback fired and suddenly I got about 30,000 Delayed jobs all waiting to ping MailChimp.  I was ignorant of this until — thank God for checklists — I was testing the deploy and found out that I could not print bingo cards.  I assumed I had botched the Delayed Job worker processes again, but no, they were up… and right after I confirmed that I got the email saying “Delayed Job has spiked to 30,000 jobs in the queue!”

As soon as I realized what had happened I hit the Big Red Button on DJ, but not before a few thousand of them had been processed.  For users who had actually confirmed the signup to the mailing list before, I don’t think anything bad happened.  For those who had second thoughts before double-optin, they were all hit with another email from MailChimp on my behalf, seemingly out of the blue.  I’ve now got an inbox of “Who are you and why are you spamming me?” to deal with.  *sigh*

On the plate for tomorrow: figure out how I could have seen this one coming.  My testing and staging environments simply ignore API calls to MailChimp for the obvious reason — I wonder if I should have them throw exceptions instead unless they’re explicitly expected behavior.

Comments Off