Jan 04, 2020
I have been steadily reducing my Google footprint over the past year. I switched from gmail. I no longer use Google drive or photos. And, most recently, I migrated this blog from Blogger to a statically generated site hosted on Amazon S3.
I use a framework called Pelican to generate the full site from content files written in Markdown. I am writing this post using a Markdown editor called Typora, although most text editors have very good Markdown support. Then I run a command that generates HTML files and pushes them to S3.
Migration was incredibly easy. Google Takeout allows you to export all your posts as an Atom file. Then I wrote a script that turned the Atom feed into Markdown files. It was easy to keep the same URLs thanks to the "blogger:filename" elements in the Atom file. For a template, I chose a theme called Blue Penguin. After changing a couple of colors, I was done!
I am still working out the finer points of the workflow. So far, the main limitation I see is that I can't post from any computer like I was able to do with Blogger (and before that Wordpress). Before being able to generate the site, you need to set up a local environment -- not hard thanks to GitHub and Pipenv, but not something that I would want to do on my work computer. Probably the next time I get inspired to blog while traveling, I will email myself a post to publish later.
Overall, I am pretty happy with this setup!
Dec 09, 2019
When I start working with an established software development team, my favorite tool for understanding their process is a "hot lot." Hot lot is a manufacturing term where an order is expedited by being allowed to jump the queue. Hot lots are closely watched for progress and potential delays. In the world of software development, a hot lot can be a feature request that goes through a process of design, implementation, testing, enablement (documentation), release, promotion, and evaluation. A hot lot should accelerate the process by adjusting priorities but it should not circumvent the process by breaking rules or cutting corners.
By prioritizing a feature and watching it go through the process at top speed, you can learn many things. For example, you can learn...
- Whether the process is even responsive enough to accept a hot lot. Sometimes engineering is tied to rigid roadmaps and nothing, no matter how important, can jump the line. This is concerning if those roadmaps stretch out beyond the opportunities you can reliably anticipate.
- Whether there even is a defined process. Is there a mechanism for ensuring all of the tasks (qa, documentation, deployment, etc.) are completed? Or maybe there is a process but nobody follows it. If there is no practiced process, you can't trust the integrity of the system or whether anything is really "done."
- How the process is structured into steps, roles, and hand-offs. How many people touch the feature? How much time does each step take? How much time is spent waiting between steps? Is there excessive back and forth? Lean Six Sigma has a great framework for studying process cycle time, wait times, and waste across the value stream.
- The theoretical speed limit of the current process. You will never know how responsive your process can be when you always slow it down with competing priorities. Often actual speed is much slower than potential speed because of various delays, distractions, and interruptions that are not the fault of the process.
- Whether there are structural blockers like "we only release every 3 months." Or maybe the team is distributed over many time zones with little overlap for hand-offs and feedback.
- Whether there are capacity blocks like "Joe is the only person who can do that step and he is not available."
- How easy it is to monitor the process. Can you go to one place and see the completed and remaining work?
- The amount of managerial overhead that the process requires. For example, is there a project manager that needs to track and delegate every task?
- The artifacts the process creates. Can you go back and see what was done and why?
- How the response to the feature was measured and incorporated into future improvement ideas.
After running through a couple of these experiments, I have a pretty good understanding of the process structure, its theoretical speed, its strengths, and its flaws. At that point, we can start to come up with ideas for improvement. The low hanging fruit is usually pretty obvious ... especially to people who have been participating in the process but not paying attention to the overall throughput. Optimizations can be designed collaboratively and tested by future hot lots. I find that teams are generally comfortable with this evaluation because it doesn't target people as much as the framework that people work in. Usually processes (at least as they are practiced) form organically so nobody feels threatened by process improvements -- especially if they are clearly supported by empirical evidence.
Even if you have been working with a team for a while, try pushing through a hot lot and pay close attention to it. There is really no better way to understand the execution of a process.
Nov 07, 2019
Before building any functionality, a product team should first start by fully understanding the problem they are being asked to solve. This may sound obvious but I can’t tell you how many times I see one-liner Jira tickets that ask for something without explaining why. But the “why” is the most important part for a number of reasons.
- The team has to agree that the problem exists and is worth solving. The impact and urgency is a primary factor in prioritization.
- Being grounded in the “why” informs creativity to answer the “what” and the “how.” Design begins with empathy and you can’t have empathy if you don’t know what your users are struggling with.
- Solutions should be evaluated on how well they address the problem. This evaluation should drive design, QA, and post-release review.
To help people focus on the problem, I use a simple tool that I call a “problem definition.” This is a document (preferably a wiki page) that describes the problem and why it is important: inefficiency, risk, etc. There is also a section for proposed solutions where the author can suggest their ideas. The problem definition then becomes a focal point for clarification and learning. Stakeholders can ask questions to explore the use case.
I think this type of document was the original intent behind the “User Story” used in various agile methodologies. But over time, the User Story has been corrupted into a formulaic and useless “As a _____, I want to ________ so I can ________”; I have yet to read a User Story that really got to the heart of the problem and why it was worth solving.
Problem definitions are precursors to project artifacts like specifications and work items. They should be easy for anyone to write in their own language. No commitment is made to implement a solution. Sometimes problems can be solved with training or better documentation. Even if no action is taken, expressing and hearing these issues is important in bridging the gap between the product team and its users.
Everyone on the team should be able to answer the question “why are we doing this?” If they can’t, they can’t be expected to be contribute to an effective solution.
Oct 23, 2019
One of my favorite tech presentations is Rich Hickey’s Simple Made Easy. The premise of the talk is that simple and easy are not the same thing. In fact, you often sacrifice simplicity in pursuit of easiness. As Rich says, we consider something easy if it is “at hand.” Like the Staples Easy Button. Simplicity is something totally different. It is the absence of complexity - lots of moving parts entwined together in intricate ways. If an “easy button” really existed, it would be supported by a complex network of solutions that could take care of any problem.
Rich was talking about programming and how to keep code maintainable. Simple code is easier to understand and extend. But I apply this perspective to lots of things. For example, a bicycle is a simple machine. A quick glance reveals how it works and what every part does. But pedaling up a hill is not easy. A modern car is complex. There is a lot of stuff going on under the hood and nearly all drivers accept that they have no hope of understanding it all.
In building software, I have come to realize that users only value ease. A user wants the features he/she likes “at hand.” In a mature, multi-featured application, UI design is mainly focused on hiding some features to make the frequently used ones stand out. Users don’t want simple. Take away any feature and there will be complaints.
Developers want simple. They want to work with code that is understandable and behaves predictably. They realize that every new feature is supported by hundreds of lines of code that need to be tested with every modification. Much of Rich's talk deals with programming styles that unnecessarily create complexity. But some requirements will force even well designed code to become complex. Ironically, these requirements are often driven by a desire for a "simple and easy" user experience (personalization, natural language inputs, voice control...).
Why does this matter?
If we don't acknowledge that "simple and easy" are in conflict, there will be unmet expectations that lead to friction between stakeholders. Users can become impatient discussing complex details about their "simple feature." Development teams can feel under-appreciated for the effort required to do a "simple thing." The time taken to wrestle with ignored complexity can look like incompetence.
Take, for example, the Google search box. It is easy to use... just type in some text and click the button. But it is anything but simple. There is an art to constructing effective queries and a whole industry (SEO) dedicated to manipulating what comes back. There is also a set of features that makes the search box like the command line for the web. I can't tell you how many times I have heard "I just want something simple like Google." Google isn't simple. But it is easy. What makes Google search feel easy is that the core functionality is obvious and it gives useful feedback to help you get what you want. You may not get what you want on the first try, but it is easy to refine your search to hone in on your target. Voice assistants aim for the same level of ease but I find the trial/error loop to be more frustrating. That is probably because the system can return only one response and the feedback loop takes longer.
I know how annoying it can be for someone to pick at language and I am not advocating to constantly correct people on their word choices. But I do think it is important for everyone to understand what they are asking for and what they are giving up when they get it. We can do that by probing into what the user means by "simple." That question is reasonable because both "simple" and "easy" are subjective terms that require elaboration. When we document requirements, we should avoid all subjective language. After all, most of the work to achieve the perception of ease and simplicity is through iteration and refinement. These qualities are not intrinsic to the feature but rather to the sentiment of the user.
May 23, 2019
Aberdeen is a Market Intelligence company. We provide market data (Firmographic, Technographic, Leads, and Intent) as well as quantitative and qualitative insights based on those data. My primary role as Chief Technology Officer is to develop and improve products that deliver and maximize the value of these data and insights. This is really the same "right content, right context, right time" problem that I have been working on for years as a content management professional.
Our strategy for detail data is to push them directly to the systems where they are actionable. For example, our Aberdeen Intent for Salesforce app creates Salesforce opportunities out of companies that are showing intent. The Salesforce app also includes some charts to visualize summaries and trends. We also have other apps to help Salesforce users interact with our firmographic and technographic data. But Salesforce accounts are often rationed and not everyone spends their time there. The conventional answer to reach other users is a customer portal.
But does the world really need yet another portal?
Technical and non-technical roles are forced to work in so many different platforms. I feel especially bad for marketers (queue the scary Brinker SuperGraphic). But every office job today seems to involve logging into different systems to check in on various dashboards or consoles.
Yes, single sign-on can make authentication easier. But SSO is rarely configured because so few of these systems are owned by Corp IT. Plus, you need to remember where to go.
Yes, an email alert can suggest when it may be worthwhile to check in on a dashboard. But establishing the right threshold for notification involves time consuming trial and error that few have the patience for. It only takes a few "false alarm" notifications to make you hesitate before following a link.
Corporate portal technologies tried to solve this problem by assembling views (portlets) into one interface but the result was both unwieldy and incomplete. There is a constant flow of BI initiatives that try to solve this problem by bringing the data together into a unified place. Too complicated. Too cumbersome. And yet another place to go.
So most users are doomed to flit from portal to portal like a bee looking for nectar.
I am starting to believe that we already have the unified, integrated portal we have been looking for. It is the email client where we spend hours of every work day. Rather than develop a dashboard or portal that people need to go to, deliver simple glance-able email reports that highlight what is new/different and needs attention.
Longtime readers of this blog may be aware of my contempt for email as a collaboration and information management tool. However, even in the age of Slack, there is no more reliable way to notify business users than email. Decision makers live in their email clients. If you want to get information in front of someone, the best place to put it is in their inbox.
Designing email-friendly business intelligence is not trivial. Beyond the technical limitations of email clients' HTML rendering capabilities, you also have to consider the context. People are already overloaded with email so the reports need to minimize cognitive load. They need to quickly convey what is important within a couple distracted seconds. Perhaps even on a mobile phone in between meetings. Less is more - just a few Key Performance Indicators (KPIs) to make the user feel like he is in the loop and can answer questions about the status of the program or take action if necessary.
Frequency is also an important factor. The cadence should align with the decision making cycle. These emails are not for catching daily anomalies. Those types of warnings are better handled by system alerts that only go out when thresholds are met (behind schedule, over budget, no data detected...).
As I think about a portal to deliver Aberdeen's market intelligence insight, I keep going back to the question, what if our BI portal wasn't a portal at all? Wouldn't it be better to put our data into user interfaces that our clients are already looking at?
Jul 12, 2018
I have been working on an interesting data set about web content management system (WCMS) installs. From these data I am able to identify events when an organization rebuilds their website on a new WCMS. As anyone who has been involved with a web development project knows, a website re-platforming represents a lot of time, expense, and decision making. So these events are important market signals -- especially when you consider the platform they are leaving and how long ago it was deployed.
I am starting to publish interesting observations on the Aberdeen Blog. This first post lists which WCMSs most commonly replace WordPress. I am doing similar analysis on other software categories such as eCommerce.
Subscribe to the Tech Pro Essentials channel of the Aberdeen blog if you want to see more posts like these.
Jun 20, 2018
At Aberdeen, I am taking vast amounts of data and working them into a unified data model that encompasses company information, web traffic, survey results, etc.. The actual workflow is nicely summarized in this classic dataists post called “A Taxonomy of Data Science”: OSEMN (Obtain, Scrub, Explore, Model, iNterpret).
Here are the tools and tricks that I use on a daily basis.
Command Line Tools
Massive data files are troublesome for most editing programs (such as Excel or even VIM). It takes too much memory to hold all of that data in an editable state. Command line tools don’t have this problem because they work with data as a stream so they only need to load one line at a time.
Kade Killary wrote an excellent article called “Command Line Tricks for Data Scientists” . The tips range from simple to advanced. On the simple end, I learned about “wc -l” which is the fastest way to get the number of lines in a file. Split is also a simple but powerful command for doing things like breaking up a large file into smaller batches for things like Mechanical Turk (more on that later).
When working with CSV files (the lingua franca of data science), I couldn’t live without CSVKit. It doesn’t do anything you can’t do with AWK but the syntax is optimized for working with CSV files and is much simpler. For example, “csvcut -n filename.csv” lists the names of each column in filename.csv. “csvcut -c 1,3,4 filename.csv > newfile.csv” exports columns 1,3, and 4 into a new CSV file called newfile.csv. csvformat is useful for handling delimiters and escapes so that the file can be ingested by other systems.
As an aside, I always work with plain text formats such as CSV because they are more accessible to different tools than binary formats such as Excel.
Most data scientists throw away data that they can identify as bad. Unfortunately, I don’t have that luxury. For example, if I discover that the URL we have a company is incorrect, I need to fix it because I use domain to link to other data. But what do you do if you have over 100,000 missing or bad URLs? Automation can only take you so far. After a certain point, you need an actual human to do some research. I have found that Mechanical Turk is the fastest way to get help with these manual tasks. Using Mechanical Turk effectively is an art that I am just starting to get proficient with.
When working with data files, there is a tendency to save copies in various steps in the process so you can compare what has changed, recover from a mistake, or take a different approach. Before long, you get directories full of cryptically named files. Some people have developed good systems for organizing and naming these files but I think the best approach is to use a source control system like GIT. With GIT, you can commit a version of the same file with a comment about what you did with it. And, of course, Git helps you work with others.
While GIT comes with comparison functionality to show the difference between versions, I don’t think it is particularly easy to use. VisualDiffer is a cheap and simple tool to show side-by-side comparisons of text files like CSV. More advanced (and expensive) tools like Beyond Compare, Araxis, and DeltaWalker can handle binary formats such as Excel and even merge differences. But I have not found a need for those yet. My most common use case is to see changes that a script or someone else made to a file.
I use a lot of AWS tools in my work. S3, DynamoDB, Lambda…. At the bare minimum EC2 is a quick and cheap way to set up a computer that I can execute a long running process on. For example, I have one automated process that goes through hundreds of thousands of records and uses various APIs to gather additional data. The process literally took weeks. Using EC2 and screen sessions is infinitely better than chaining my own workstation to an internet connection and having it run continuously for days.
Pandas and Jupyter Notebooks
Since I am already a Python programmer, Pandas and Jupyter Notebooks were an obvious choice for exploring and modeling data. I love how you can build a step by step annotated process to assemble and visualize the data.
At Aberdeen, we add another step onto the end of the OSEMN process: Publish. This is where we use the output of our research to deliver interactive data products to our customers. Those products include embeddable dashboards and alerts that customers can use to make better decisions and seize opportunities. PowerBI is a rapidly improving platform for delivering interactive reports. We have PowerBI experts on staff so I mainly send data for them to turn into customer facing tools.
Jun 15, 2018
I am excited to announce that I am now working at Aberdeen where I am the Vice President of Research Products. This is a big time at Aberdeen because we are shifting from a traditional analyst firm to what we are calling a "Market Intelligence" company. What that means is that our analysis is based on quantitative data rather than anecdote and opinion.
In particular, we are focusing on three categories of performance indicators:
- Awareness quantified through surveys that ask respondents whether they are familiar with a product or brand.
- Consideration, which is based on intent data.
- Market Share, which is based on install data.
In this role I have gone deep into data science and also tapped into my own nerdy creativity and curiosity. If you like FiveThirtyEight
, you would love my job. It's especially great for me because it allows me to leverage many skills I have developed over the years as an industry analyst, software developer, and database administrator.
Lately I have been exploring historical install data to analyze events where one web content management system replaces another. For example, when a company replaces a website running on WordPress with a new website running on Sitecore. Re-platforming events such as these are significant because they represent customer requirements, product strengths (at least strengths perceived during the selection process), and big investments of time and money. And then when you combine that with intent data that shows which customers are showing signs of looking for a replacement technology, you get highly actionable insight.
These data will be incorporated as features in our subscription products but I do plan to post tidbits on the Aberdeen blog. So stay tuned!
Nov 17, 2017
It is fairly typical for a consultant or new leader to walk into an organization and see nothing but silos. These leaders regard silos as a barrier to efficiency and make them a target for change. What they often wind up doing is replacing organically formed structures with new ones that look better on powerpoint than in practice.
Why does this happen? Let's start by digging into what a silo is. "Silo" is usually used as a derogatory term to describe a grouping that you don't like. But groupings are important in large organizations because the number of possible point to point connections makes communication too noisy and prioritization too difficult. If everyone is talking to everyone all the time, nothing gets done. Teams naturally form to confront this challenge. Complementary capabilities are assembled and scaled in highly focused work groups. Process is continuously refined because of a tight feedback loop.
To the outsider, trying to navigate these structures is confusing and frustrating. People seem unaware of what is happening outside of their group. They appear oblivious rather than focused. The reactionary impulse is to criticize the duplication of what appear to be identical functions. The ego feels good when you think you see obvious dysfunction that nobody else recognizes. It certainly feels better than having to slog through complexity that everyone else understands.
But there is great risk in introducing sweeping plans to achieve synergy before really understanding how these teams function. Even if the reasoning is valid, it is incredibly disruptive to blow up any working system and make it re-form under stress and uncertainty.
Before eliminating "silos," you need to understand why they formed. Were they imposed from the top down in order to make the organization easier to understand from the top? Or did these structures develop naturally to solve operational problems related to coordination and focus? Can the same benefits be achieved more efficiently?
You can't fix a working system until you fully understand why it is the way it is. You need to understand what is working right now and what obstacles stood in the way from the system naturally adapting to solve its broken parts. When you hypothesize dysfunction, you need to introduce your corrections scientifically and measure the results. But most importantly, you need to find the best parts and figure out a way to expand on them.
Oct 23, 2017
As a long time remote worker and manager of both distributed and co-located teams, I think about virtual teams a lot. While I have had great personal experiences with remote teams, there seems to be little consensus about whether it is a good idea. You have some articles touting the health, retention, and productivity benefits of letting people work from home. And you have other articles, like the recent Atlantic piece "When working from home doesn't work," that indicate a shift back to traditional office environments. Based on my own experience, I find it hard to imagine large companies succeeding by dictating enterprise-wide policies around remote workers. The effectiveness of distributed teams depend on critical factors that will vary from team to team. Here are three things that undermine the effectiveness of distributed teams.
1. Hybrid teams do not work
A team should be either all colocated or all remote. A remote member of a predominantly colocated team will always be neglected. It is unavoidable. Co-located employees build habits that depend on seeing each other. They look around the room to decide who to include in a discussion. They respond to visual clues that a colleague may be struggling. The interactions that are available to remote team members tend to be restricted to events that are either boring (like standing meetings) or stressful (like performance reviews). But relationships are formed in between these two extremes when people can be themselves and have the space to curious about each other and build trust.
2. You can't convert a colocated team to a distributed one
A team is not just a collection of people. It is an ecosystem that is shaped by individual talent, chemistry, goals, and an environment that presents constraints and opportunities. The environment plays a huge role in how people interact. And by interact, I don't just mean communication (although that is part of it) but also how responsibilities are divided and handoffs happen. If all of the sudden people start working remotely, you need to treat the group as a new team. You need to establish new norms and ways of working together. Roles will change. You need to use different methods to develop camaraderie and create an engaging work experience.
3. Not everyone will thrive as a remote worker
It takes a special type of person to be an effective and happy remote worker. Their work environment has to be conducive to productivity. They need to be goal oriented and invested in the success of the team. They should be committed to their craft and want to build mastery by continuous refinement. I have also recently begun to appreciate the importance of being in the right phase of one's career. At some point in your career, it is helpful to go into an office to do things like: build professional social skills; find a mentor; bond with people; try different roles; get lots of feedback; and have the general sensation that you work for a company. It is harder for remote workers to advance into new roles because they don't get to see other people doing those roles. Personally, I am grateful that I got to work at a number of different kinds of offices and I have some great professional connections from that time. I think I would be a wreck if my early managers had to deliver feedback over phone and email without being able to modulate tone and provide support based on my reaction.
Based on these three observations, a smart executive will not dictate working style based on business journal articles or office leases. Instead, he/she should empower teams to construct and distribute/consolidate themselves for optimal efficiency.