Summary: I got a false alert that the issues were resolved. Foolishly, I didn’t look into it further and didn’t realize we had an ongoing outage.
My deepest apologies for the two outages we’ve had in the past day. The first happened yesterday at around 5pm pacific time, the second today at around 6am. Both were caused by the same simple, easy to resolve issue. In both cases, I did not resolve the issue causing the outage because I didn’t realize the outages were happening. Our monitoring system sent me alerts saying the issues had resolved themselves just moments after it pointed out the problems in the first place. I should have looked into the issues upon getting the alerts, regardless of their supposed resolution, and I didn’t. Thus, we had hours of downtime when we should have had minutes.
I am ashamed of the situation, and stressed by the potential repercussions on the trust of our users. Due to my actions, tens of thousands of people around the world were blocked from doing their work over the past day. My apologies to all of you.
Here is what I will do to prevent this happening in the future:
- I’ve added two new alerts to our monitoring which are direct indicators of our most common issues. These metrics will not incorrectly resolve themselves while an outage remains active.
- I will always look into outage alerts, even if I get an automated alert saying an issue is resolved.
- After an alert, I will look at our core metrics dashboards as opposed to my own app responsiveness to determine whether the app is healthy. Loading the home page is an incredibly dumb way to verify the health of our servers. I did it simply because I was away from my computer and have never seen a real outage falsely resolved without additional alerts. That was a huge mistake.
- After an alert, I will monitor our user communication channels such as Twitter and email, to get a sense of how widespread an issue is.
Again, I am truly sorry for these completely unnecessary outages. We will do our best to maintain our previously wonderful reliability.
P.S. Please consider using our offline Chrome Desktop Application. This way you will have access to your WorkFlowy lists even if we have an outage, or if you’re away from the internet.
An admirable response! Cheers!
Never experienced the outage – but your apology just says massive amounts about your character. Plus you show what your going to change for the future.
Long live WorkFlowy.
Ditto this! Long live Workflowy.
Luckily wasn’t affected by the outage, but your response to this makes me love WorkFlowy even more. <3
Hi Jesse,
I didn’t even notice a blip on my end. No problem here. I have been curious about future development of features in workflowy. I used to love coming on to workflowy every few weeks and seeing a new feature added that made my experience 10% better each time one was added. I just haven’t seen much in terms of feature upgrades the past year that I can remember (maybe my memory fails me). Is there a monthly upgrades board where you could give a short blip on feature upgrades? I do love workflowy, and would also love to see (or at least read about) current progress of development. I do remember a couple of messages you sent out months ago about that you are working on dates and also collaboration. Perhaps a small paragraph each month where you describe current progress is all.
Each time I go out exploring other task management apps, I can’t find one that compares to the simplicity and power of workflowy. I will likely be a long time customer. I’ve just been hoping to see that my subscription is working to improve the software (which I’m sure it is).
I agree with what a couple of other guys have already written: Let the community expand this excellent tool by making an API! I really do understand that this is a passion project, and that you probably REALLY want to implement those “upcoming features” you have hinted at for a very long time now. But maybe it is time to set the bird free and open source this bad boy so we all can build upon this really solid foundation you guys have made?
Yeah, I am thinking about this general idea, and I hope to figure something out related to harnessing people’s energy around it.
It amazes me that WorkFlowy has such a large number of subscribers, including apparently highly technical extension contributers. Perhaps extensions, that are optional, and not overly built in features help keep WorkFlowy functioning with minimal glitches. It also is interesting that WorkFlowy is so popular and functional even though there is only a staff of two developers, while other apps requiring teams of varying sizes to keep the app running.
Thank you so much for the explanation. Much appreciated.
Awesome, really appreciate this update. Very much reassured.
It’s great that there was no data loss, but I’m curious how safe our data actually is and what kind of alert should we get as the users. I added isitdownrightnow com into my bookmarks in Chrome, exported my data to the clouds and temporarily switched back to the Simplenote. But I can’t be unreasonably confident that everything’s rock solid as I used to be (even though that might be a good thought).
Chrome Desktop Application for the offline/outage at the desktop situations – that’s clear.
But how safe out data actually is? How the data is stored?
The data is very safe. It is both backed up daily in a snapshot stored redundantly and in realtime to a second database.
appreciate the frankness / transparency – you’re doing a great job! Couldn’t live without WF now im hooked 🙂
Things happen. You take measures to make sure they don’t happen again. Apparently some people live in a perfect world. They don’t deserve to use your product. I have been using the free product for some time. I will be paying to use it from here on in. Thank you for your honesty and diligence in trying to provide such a great product.
I thought it was my sucky internet because it was having problems at the same time!
Hello
No problem for the outage. I use workflowy for more than one year and it was the first outage I have seen. And one of the two did not impacted me.
So no problem. It can happen.
Thanks for the email anyway.
I have been using Chrome Desktop Application so didn’t recognize this issue occurred until I checked the blog.
Didn’t notice the outage but also didn’t know about the offline Workflowy Chrome extension. Am going to give it a try now.
Don’t sweat, appreciate the honesty and corrective process steps. Respect!
Thank you for letting us know, Jesse. Next to not having access the worst thing is not knowing what is going on.
Truly appreciate this post and i love workflowy too! And If Workflowy were to glitch forever and I lost my entries, I would have a nervous breakdown. So, does installing the Chrome Desktop Application prevent loosing all our data if something were to ever happen to Workflowy? Or…. is there a way to backup all the data we enter, in case that were to ever happen one day? (yikes!)
There is an option to sync to Dropbox, but it may be a premium option, unsure. Either way, that exists and I use it. It happens automatically, no muss no fuss.
Hi Jesse
Thanks for taking ownership of that issue so responsibly and humbly – great leadership 🙂
As with some other commentators, I was also made aware of how reliant I am on your Workflowy awesomeness. Here’s to your company growth and product development! Hope you and your team are doing great – you’ve got a lot of fans out there!
Thanks for your honesty ! Yet it was suprising yesteradya, we know that nobody’s perfect and it’s not a problem 😉
Your comments are well appreciated. But please make a decent Android app, with proper background sync. Or let others make one (API)…
I’ve used WorkFlowy ever since Slate did its write-up some years ago. I’ve been a paid annual member since nearly that time. You’ll continue to get my money for a product that has literally changed my life. Thanks for all you do.
The numbered list in this blog article is aligned in a weird way. Here’s a fix:
http://dr2050.com/automatic-images/workflowy-blog-left-align.png
And we get it, Workflowy is a serious enterprise that takes our data very seriously. Most of us knew that already. But… are all the developers dead? If they are not dead but just moribund, please provide an API, so we can make a proper iOS client, please.
Thanks for inventing Workflowy and for keeping it running perfectly.
Typography troll, eh?
While I appreciate the honesty, outage monitoring of workflowy shouldn’t be a one man show.
My co-founder was on vacation for two days, so it was bad timing.
Why not a one man show though? I haven’t had any issues over the past few years, barring the last week.
Thanks for the explanation.
All the more reason to have a desktop app so we are not at the mercy of those servers.
I researched for all the desktop outlining apps to export my OPML but none matched the convenience of Workflowy interface. Chrome extension was the closest for the desktop app for Mac.
But I want to stay with Safari. Could you please consider introducing a Mac app with offline abilities. Something like the chrome extension which loads the webpage and caches/keeps it in local storage to manipulate offline? or is there any other way to do this without installing Chrome?
I’ve been using workflowy since 2013 and this is the first time I’ve been inconvenienced by an outage – a great track record so far. I enjoy the simplicity and functionality of the tool and particularly following this message, I appreciate the thoughtfulness and commitment of the people behind it.
Great work getting the system back online. The preventative measures should help. Speaking for most of my fellow software engineers / IT Managers / System Admins, we have all experienced grey area with our homemade monitoring systems.
You are also rather understaffed for hosting such a popular app, which means you have no choice but to rely heavily on monitoring. I know this and I accept this.
To echo what everyone else has said, I appreciate the transparency on the issue, and also appreciate the apology and communication of lessons learned and steps taken to correct it. Love this service–keep up the good work.
A+++mmmmaaaazing response to this “crisis” Jesse. Keep up the good work!