Saturday, January 16, 2016

Daddy, What Does an Enterprise Architect do?

I am an Enterprise Architect. I help companies migrate data centers. That can mean moving from one city to another, or it can mean upgrading their existing data center to newer hardware, different or more current versions of applications and middleware; or it can mean moving from in-house platforms to the cloud. The title Enterprise Architect (EA) can be daunting; the concept is relatively new to our discipline. There isn’t much written  describing how it fits conceptually into the various roles that comprise normal IT operations. Over dinner, I described what my job is to a friend by using this analogy:
Suppose you wanted to move from one house to another. Typically you would pack everything up, rent a truck or schedule a moving van, load it up, drive to the new house, unpack everything, get settled in, and maybe go back to the old place and clean it out. The Moving Architect would figure out the number of different types of boxes to buy, for books, for clothing and bedding, for the china, for any art. The "bill of materials" would list the furniture, the number and sizes of the boxes, and any other items needing special handling. The Moving Architect would suggest how big a truck to rent, how many movers to hire, how long the move would take, and how much it would cost.
Moving a data center is more complicated.  Most organizations cannot tolerate an extended outage, so the challenge is more like moving from one house to another without disrupting the daily routines of the people living in the house, and the people who help out at either house like the landscapers, the realtors who want to show the old house, and the trash collectors whose vehicles can not be blocked.
The Enterprise Architect has to consider the family’s daily activities. When does the bus pick up the kids? On moving day, will the kids know to get on to the new bus to take them home to the new house? The EA needs to know which days are school days and which aren’t, and which days the kids might have after-school activities and how long they might take. To move non-disruptively, the EA will stock the fridge and the cupboard in advance, but some foods spoil over time, so that preparation step has to be timed to not waste resources.
Since moving furniture cannot happen instantaneously, the EA will have to fit out the new house with furniture, bedding, towels, and some clothing in advance. The EA has to make sure the utilities are on and the home is ready to occupy. And in preparation for the move, the EA has to lead the family in a dry run for the move, without interrupting their normal daily activities. The EA will provide documentation on how to use the new home features.
The Enterprise Architect has to understand the patterns of use of the IT resources across time, to create a safe, secure, recoverable plan to migrate work non-disruptively from one set of IT infrastructure to another. So the EA will ask questions of the workers that seem as trivial and pointless as asking the kids what they want for breakfast: if the answer is oatmeal, then the shopping list needs to be updated, the utilities have to be up and in place, the cookware that was used on the old gas range may have to be replaced to avoid damaging the new electric radiant heat ceramic stovetop, and the recipe may need to be updated to accommodate that new stove’s different heating and cooking times. Knowing how the IT resource is used in detail helps the EA guide the migration.

This analogy is structured specifically to evoke parallels to the Zachman Framework. What does an Enterprise Architect do? The EA generates an optimal isomorphic mapping of one instantiated Zachman Framework into another.

Wednesday, April 22, 2015

Begin Again, starring Mark Ruffalo and Kiera Knightley, written and directed by John Carney

First, a confession: I didn’t know that Kiera Knightley could sing. She can. And, I didn’t know that Adam Levine could act. He can. 
Begin Again starts with a seemingly random evening in an open-mike night at any Village dive. A British girl sings a pretty ballad, and a drunk gets transfixed by the song. Then the magic starts to happen. We jump back to the start of the drunk‘s day. He’s the co-founder of an independent music label, having a rough time. By the time he ends up at the dive, he’s been fired, punched, embarrassed in front of his 14-year-old daughter, and reminded that he can talk to God. “But what if God doesn’t answer?” He is close to ending his life. That guy is played by Mark Ruffalo. He hears her song and magic happens again: He sees what that pretty ballad could be with the right production behind it.
Then we jump back to the circumstances that brought the Brit to the dive. She’s the girlfriend and co-writer of a musician who has just broken through into the big time – record deal, fawning assistants from the label, US tour in the works, making an album. She gets deftly moved into the background as fame swallows up the singer. He has an affair, she leaves, and ends up at the dump belonging to a friend, where magic happens. He suggests she get out of the place and go hear him sing at this dive. There, he talks her into singing one of the songs she’s written, “For anyone who’s ever been alone in New York.” You will remember that song. That girl is Kiera Knightly. The unfaithful boyfriend is Adam Levine. I found that not only could he act, but he did such a good job that I really don’t like him – based on the character he portrayed! 
The story is exquisitely crafted – honest, with full characters in every role. I wanted to know more about each individual who spoke – there were no incidental interactions. There were wonderful moments of homage to great films, though. One of my favorites followed Kiera asking a seemingly innocent question of Mark, which caused him to freeze up and walk away. He steps outside, then stops as if to turn back and explain, then he turns forward and walks away. This beautiful moment captures and extends the moment in Love, Actually where Kiera Knightly realizes that her new husband’s friend doesn’t hate her, but actually is deeply in love with her. He nearly runs out of his apartment, then stops, turns, and turns away: Choreographically identical, emotionally powerful and honest – and artful. 
In a few words the rapper Troublegum (played by CeeLo Green) gives us the theme of the movie: “When a man like that falls on hard times, people forget who he is. They don’t give him the respect he deserves.”   
The title recalls Robert Preston and Mary Tyler Moore’s Finnegan, Begin Again, with echoes the lines from Finnegan’s Wake: “Us, then. Finn, again! Take.” The movie is strong enough to carry both references.  
The broader theme of the movie is Renewal. Mark’s character, with the help and support of his friends old and new, picks up the pieces and begins again, even stronger than he was before. Events transform him. His healing resurrects his family, his business, his friends, and the city of New York.  It is one of the best movies I‘ve ever seen.

Sunday, December 22, 2013

How Fast Does Water Freeze?

Recently there have been some comments on Facebook and elsewhere explaining why hot water freezes more rapidly than cold water. Since hot water does not freeze more rapidly than cold water, these comments allow us to think about the nature of heat, and the use of a scientific theory. 

Heat and temperature are related but different phenomena. Heat refers to the quantity of energy of a particular kind in a lump of matter of a particular size. Temperature talks about the distribution of heat in a lump of matter regardless of size. That is, a large cup of coffee may have the same temperature as a small cup of coffee, but the large cup has more heat than the small one.  

In my high school physics class, our teacher explained that the amount of heat generated by an engine was a constant – regardless of where the heat went. One of the students asked, “Then how come drag racers put wide tires on the back? Wouldn’t the heat be the same whether the tires were large or small?” The class was stumped for a moment. The answer is that since the amount of heat is the same, distributing it through a broad tire would cause less of a rise in temperature than distributing it through a small tire. Since the total amount of heat was the same, putting it into a larger volume of matter would cause less of a temperature change – the bigger tire wouldn’t melt.

The same argument applies to the freezing phenomenon. The classic conundrum states that if you put a cup of hot water next to a cup of cold water in a freezer, they both freeze at the same time. This means that the hot water froze faster than the cold water.

This is an example of a flawed theory. A scientific theory offers an explanation of an observed phenomenon that can be disproved. In the case of the freezing water, the idea that the hot water freezes more quickly can be disproved simply by placing the hot water in the freezer, and seeing how long it takes to freeze; then putting the cold water in the freezer, and seeing how long it takes to freeze. While the hot water is cooling, at some point it will have the same temperature as the cold water. From that point, the question is how rapidly will two identical amounts of water, at the same temperature, take to freeze. The water’s history doesn’t matter. When the hot water and the cold water are in the same freezer, the hot water warms up the cold water while the freezer removes heat from all its contents.

A theory that cannot be disproved is not a scientific theory. The theory of evolution, like the theory of gravity, explains an observed set of phenomena and allows for predictions that can be either validated or disproved. Dismissing “a mere theory” because it is only a theory is fine. Dismissing a scientific theory as “a mere theory” is thoughtless.

Monday, June 17, 2013

Parables for Architects: Untangling Requirements

The business user comes to you and says, “I need you to transport something from Poland to New York.”
You ask, what is it?  
“It’s worth $30,000,000.”  
Can you tell me anything more about it?
“It is a form of carbon.”
Thanks for that. Are there any other constraints?
“I need it in New York within six months.”
So now you have enough information to begin … what? Nothing, in fact. If the client wants you to transport $30M of coal, you’ll need five cargo ships. If the client wants you to transport $30M of diamonds, you’ll need a bonded courier with a briefcase.
The business user describes what they feel are the most relevant aspects of the problem they want IT to help solve. The business user does not know what decisions IT actually has to make on the way to delivering the solution.
You, the IT architect, perform a complex role. You need to understand the comparative value of the pieces of your organization’s IT infrastructure. You need to understand the IT elements of the business problem the user has to solve. You need to translate the business requirement into the most fit-for-purpose technological solution. You need to flesh out the technical requirement, understand the user’s priorities, and you need to nail down the missing requirements. User input is crucial, but it is generally not sufficient. Users should not understand technology trade-offs. IT architects must understand technology trade-offs.
An IT architect that only knows one technological capability will add no value. Like a clock that’s stopped, this IT-architect-in-utero has only one answer to every question. An effective IT architect must evaluate technology alternatives based on experience and fact. An IT architect must go beyond lore, bias, or taste.
If the business customer has decided to deploy only one IT technology, they will not need an IT architect – there is no problem to solve. The advantage of this simplification is that it saves time and cost. The only disadvantage to this strategy is that all IT infrastructures are not the same. Misaligning the technology and the business gives the users a brittle and ineffective infrastructure. You, the IT architect, know that you don’t cut meat with scissors.
Even if the user does not ask, you, the IT architect, must develop a solution that meets the functional and non-functional requirements the user relies on. Users do not understand the difference between 99.9% availability and 99.9999% availability. They may ask for six-nines, but they may not realize that the cost of the solution may be an order of magnitude greater than the cost of the three-nines alternative. They may not understand the implications of rapid problem determination, or ease of migration, or resiliency, or fast failover. You, the IT architect, must use your mental checklist of systems and network management requirements to complete the user’s requirements before your job is done.
Some corporations deploy a configuration management system or CMDB to help track the infrastructure pieces they have; some even put together a service catalog listing groups of those services. A few have service catalogs that actually speak to users in user terms, but none have a robust, automated alternative to you, a knowledgeable and experienced IT architect. You build that mental checklist during your professional career. When you take on a new project, you will re-evaluate that checklist to identify obsolete capabilities and remove them, and learn new capabilities to add them. That is why you chose to become an IT architect: the job is self-renewing, the job demands your continuous awareness, clear thinking, reasoned analysis, and communications skills. No other industry in human history has the richness and continuously evolving knowledge base that IT offers. You are surfing the wave of the future. You, the IT architect, know that you don’t use five cargo ships to move a handful of diamonds.

Friday, June 7, 2013

Time to Prune those Processes

Anyone who has worked in IT for five years or more knows the truth of the statement: “If you automate a mess, you get an automated mess.” What can we do, and what should we do, about these productivity-sapping kludges? This discussion offers a few short-term actions and some useful longer-term approaches to get some of that productivity back.
One common automation shortcut is to transplant a paper-based process into an exact copy on web screens. The screens look just like the original paper forms, and the process reproduces the exact steps that people used.
The rationale for this relies on the “ease-of-use” and “proven design” fallacies. The business automated the process because it was inefficient, or overly burdensome, or got in the way of needed changes in other business activities. “Ease of use” in this case really means “no learning curve,” which is supposed to reassure the business area that none of their people will have to learn anything new or different. It presumes that the original system was easy to use. Often it was not.
“Proven design” in this case means, “we don’t want to do any business process analysis.” The savings in design and analysis will shorten the front-end of the implementation process, but by not making the design explicit, those well-worn processes remain obscure. This is a problem because unless the implementation team accurately describes the process, the translation from hard copy to soft-copy will create defects. How could that be? In manual processes, users learn to avoid many options they should not take. In computerized processes, the software must handle every possible decision a user could make. The implementation team does not understand that tribal lore the users follow. The implementers must guess what to do in every undocumented case.
That is, the implementation team ends up doing business process analysis under the worst possible conditions:
  •          They do not have the skills to do the job
  •          They do not have the time to do the job
  •          They do not have the tools or procedures to do the job
  •          They do not have formal access to user experts to help with the job, and
  •          They are not evaluated on how well they do the job

Another way to state this is to observe that every system has an architecture. The only question is, does the implementation team know what it is before they begin? The default architecture is inconsistent and incomplete.
Rushing to get from the old (paper-based) process to the new (automated) process on time, the IT organization abandons its core competence, which is to make complex processes clear. Fixing a kludge like this means first identifying and removing gross process inefficiencies, then second streamlining inputs and outputs.
One easy way to find and fix inefficiencies is to measure the elapsed time that process steps take, and use that measurement to extract any activities that are not adding value to the underlying business activity. For instance, some organizations automate approval processes, replacing a physical form with an e-mail link to an approval web page. Look at how long each approver spends on each request over a few weeks. If you find that a person always approves a certain type of request, and does so within 30 seconds, you can conclude that they are not adding any value to the process. They probably got approval rights because they wanted to be aware of requests. The solution is to replace the “action” link with an “informational” link – and not hold up the process by waiting for that rubber stamp.
Streamlining inputs and outputs can save lots of processing time and complexity. Some paper processes produced intermediate reports to measure a team’s workload. Automating these reports and – what is worse – evaluating a team based on that measurement, locks in the original inefficiencies. Every output should have a natural “consumer” who needs that information to do some other job. The other job may be in-line, meaning it waits for the output before it can start, or it may be cyclical, meaning that at some future time the other job starts using a collection of those outputs. Users regard the in-line type (sometimes called “straight-through processing”) as the real application. They may overlook the cyclical type because they do not usually do that part of the work, such as aggregating financial results, auditing, or quality assurance.
By giving the supervisor a measure of the activity, rather than a copy of the results, the process lets the supervisor focus on the business value of the activity rather than the sheer mass of the activity. This moves towards that oft-sought alignment between the IT organization and the business. If the business gets its job done with little extra activity, few procedural glitches, and optimal efficiency, the IT systems that support it are by definition aligned.
It is now spring 2013, a good time to consider pruning those processes.

Thursday, July 26, 2012

The Economic Failure of Public Cloud

The public cloud business will face severe economic challenges in 2014 and 2015, as the business model collapses. Three converging trends will rob the market of profits. First, the barrier to entry, that is, the cost of the technology that makes up public cloud, will continue to drop, following Moore’s Law. Second, the steady increase in personnel costs will attack margin performance. Finally, the commoditization of cloud services will inhibit brand loyalty. Cloud consumers will not want to become locked in to a specific cloud provider. Any attempt to distinguish one cloud from another weakens portability.

This will result in an economic model we are quite familiar with: airlines. As the larger, more mature airline companies sought better margin performance, they sold their planes and leased their fleets back from leasing companies. The airlines do not own the planes or the airports: they own information about customers, routes, demand, and costs. The largest cost airlines face is staff, and as staff longevity increases the cost of personnel steadily grows. So the airline business over the post-deregulation era consists of a regular cycle:

1. Mature airlines enter bankruptcy  
2. The industry consolidates 
3. A new generation of low-cost airlines arises 

All players experience a calm period of steady growth as lower aircraft cost, better fuel efficiency, debt relief from bankruptcy, and lower personnel costs from younger staff make the rejuvenated industry profitable for a while. 

Then the cycle starts again.

One significant difference between airlines and public cloud is the difference in the cost improvements each sector faces. Airlines improve costs in small amounts – a few percent in fuel efficiency, a few dollars more revenue from luggage, from food, increasingly extravagant loyalty programs, and so on. But technology costs have no lower boundary: Within ten years an individual consumer could buy a single computing platform with more storage and processing capacity than most current public cloud customers need.

It will be as though the aircraft leasing companies could lease each passenger their own plane, bypassing the airlines entirely.

So early entrants must cope with collapsing prices just when their potential market moves to a lower cost ownership model. Time-sharing met its market’s needs for a brief while – that moment when early demand for computing capacity far exceeded the consumer’s price range - then disappeared. 

Public cloud computing will dissipate within five years.

Monday, February 13, 2012

On the Use, and Misuse, of Software Test Metrics

“You will manage what you measure” – Fredrick W. Taylor
Testing verifies that a thing conforms to its requirements. A metric is a measurement, a quantitative valuation. So a test metric is a measurement that helps show how well a thing aligns with its requirements.

Consider a test you get in school. The goal of the test is to show that you understand a topic, by asking questions of you about the topic. Depending on the subject, the questions may be specific, fact-based (When did the USSR launch Sputnik?); they may be logic-based (Sputnik orbits the earth every 90 minutes, at an altitude of 250 km. How fast is it moving?); or they may be interpretative (Why did the Soviet Union launch the Sputnik satellite?)

Or they can be just evil: write an essay about Sputnik. Whoever provides the longest answer will pass the test.

Note that by asking similar questions we learn about the student's capabilities in different dimensions. So when a piece of software shows up, the purpose of testing should not be to find out what it does (a never-ending quest) but to find out if it does what it is supposed to do (conformance to requirements). The requirements may be about specific functions (Does the program correctly calculate the amount of interest on this loan?); about operational characteristics (Does the program support 10,000 concurrent users submitting transactions at an average rate of one every three minutes, while providing response times under 1.5 sec for 95 percent of those users as measured at the network port?); or about infrastructural characteristics (Does the program support any W3C-compliant browser?)

These metrics follow from the program's intended use. Management may use other metrics to evaluate the staff: How many bugs did we find? Who found the most? How much time does it take, on average, to find a bug? How long does it take to fix one? Who created the most bugs?

The problem with these metrics is they generally misinform managers, and lead to perverse behaviors. If I am rated on the number of bugs I write, then I have a reason to write as little code as possible, and stay away from the hard stuff entirely. If I am rated on the number of bugs I find, then I am going to discourage innovations that would improve the quality of new products. So management must focus on those metrics that will meet the wider goal - produce high quality, low defect code, on time.

Software testing takes a lot of thinking: serious, hard, detailed, clear, patient, logical reasoning. Metrics are not testing - they are a side effect, and they can have unintended consequences if used unwisely. Taylor advised care when picking any metric. Often misquoted as "you can't manage what you do not measure," Taylor's intent was to warn us. Lord Kelvin said "You cannot calculate what you do not measure" but he was talking about chemistry, not management. Choose your metrics with care. 

Friday, February 10, 2012

Beyond Risk Quantification

For too many years information security professionals have chased a mirage: the notion that risk can be quantified. It can not. The core problem with risk quantification has to do with the precision of the estimate.
Whenever you multiply two numbers, you need to understand the precision of those numbers, to properly state the precision of the result. That is usually described as the number of significant digits. When you count up your pocket change, you get an exact number, but when you size a crowd, you don't count each individual, you estimate the number of people.

Now suppose the crowd starts walking over a bridge. How would you derive the total stress on the structure? You might estimate the average weight of the people in the crowd, and multiply that by the estimated number of people on the bridge. So you estimate there are 2,000 people, and the average weight is 191 pounds (for men) and 164.3 pounds (for women), and pull out the calculator. (These numbers come from the US Centers for Disease Control, and refer to 2002 data for adult US citizens).

So let's estimate that half the people are men. That gives us 191,000 pounds, and for the women, another 164,300 pounds. So the total load is 355,300 pounds. Right?
No. Since the least precise estimate has one significant digit (2,000) then the calculated result must be rounded off to 400,000 pounds.

In other words, you cannot invent precision, even when some of the numbers are more precise than others.

The problem gets even worse when the estimates are widely different in size. The odds of a very significant information security problem are vanishingly small, while the impact of a very significant information security problem can be inestimably huge. When you multiply two estimates of such low precision, and such widely different magnitudes, you have no significant digits: None at all. The mathematical result is indeterminate, unquantifiable.

Another way of saying this is that the margin of error exceeds the magnitude of the result.

What are the odds that an undersea earthquake would generate a tsunami of sufficient strength to knock out three nuclear power plants, causing (as of 2/5/12) 573 deaths? Attempting that calculation wastes time. (For more on that number, see

The correct approach is to ask, if sufficient force, regardless of origin, could cripple a nuclear power plant, how do I prepare for such an event?

In information security terms, the problem is compounded by two additional factors. First, information security attacks are not natural phenomena; they are often intentional, focused acts with planning behind them. And second, we do not yet understand whether the distribution of intentional acts of varying complexity (both in design and in execution) follow a bell curve, a power law, or some other distribution. This calls into question the value of analytical techniques - including Bayesian analysis.

The core issue is quite simple. If the value of the information is greater than the cost of getting it, the information is not secure. Properly valuing the information is a better starting place than attempting to calculate the likelihood of various attacks. 

Thursday, December 9, 2010

The Coming Data Center Singularity: How Fabric Computing Must Evolve

The next generation in data center structure will be fabric-based computing, but the fabric will be two full steps beyond today’s primitive versions. First, the fabric will include network switching and protection capabilities embedded within. Second, the fabric will incorporate full energy management capabilities: electric power in, and heat out.
Ray Kurzweil describes the Singularity as that moment when the ongoing increase in information and related technologies provides so much information that the sheer magnitude of it overwhelms traditional human mental and physical capacity. Moore’s law predicts this ongoing doubling of the volume of available computing power, data storage, and network bandwidth, at constant cost. There will come a time when the volume of information suddenly present will overwhelm our capacity to comprehend it. In Dr. Kurzweil’s utopian vision, humanity will transcend biology and enter into a new mode of being (which has resonances with Pierre Teilhard de Chardin’s Noosphere).
Data centers will face a similar disruption, but rather sooner than Dr. Kurzweil’s 2029 prediction. Within the next ten years, data centers will be overwhelmed. Current design principles rely on distinct cabling systems for power and information. As processors, storage, and networks all increase capacity exponentially (at constant cost) the demands for power and the need for connectivity will create a rat’s nest of cabling, compounded with ever-increasing requirements for heat dissipation technology.
There will be occasional reductions in power consumption and physical cable density, but these will not avoid the ultimate catastrophe, only defer it for a year or two. Intel’s Nehalem chip technology is both denser and less power-hungry than its predecessor, but such improvements are infrequent. The overall trend is towards more connections, more electricity, more heat, and less space. These trends proceed exponentially, not linearly, and in an instant our data center capacity will run out.
Steady investment in incremental improvements to data center design will be overrun by this deluge of information, connectivity, and power density. Organizations will freeze in place as escalating volumes of data overwhelm traditional configurations of storage, processors, and network connections.
The only apparent solution to this singularity is a radical re-think of data center design. As power and network cabling are the symptoms of the problem, an organizational layout that eliminated these complexities would defer, if not completely bypass, the problem. By embedding connectivity, power, and heat (collectively called energy management) in the framework itself, vendors will deliver increasingly massive compute capabilities in horizontally-extensible units – be they blades, racks, or containers.
The next generation in data center structure will be fabric-based computing, but the fabric will be two full steps beyond today’s primitive versions. First, the fabric will include network switching and protection capabilities embedded within. Second, the fabric will incorporate full energy management capabilities: electric power in, and heat out. 

Wednesday, December 8, 2010

The Software Product Lifecycle

Traditional software development methodologies end when the product is turned over to operations. Once over that wall, the product is treated as a ‘black box’: Deployed according to a release schedule, instrumented and measured as an undifferentiated lump of code.
This radical transition from the development team’s tight focus on functional internals to the production team’s attention to operational externals can impact the delivery of the service the product is intended to deliver. The separation between development and production establishes a clear, secure, auditable boundary around the organization’s set of IT services, but it also discourages the flow of information between those organizations.
The ITIL Release Management process can improve the handoff between development and production. To understand how, let’s examine the two processes and their interfaces in more detail.
Software development proceeds from a concept to a specification, then by a variety of routes to a body of code. While there are significant differences between Waterfall, RAD, RUP, Agile and its variants (Scrum, Extreme Programming, etc) the end result is a set of modules that need to supplement or replace existing modules in production. Testing of various kinds assesses the fitness for duty of those modules. During the 1980s I ran the development process for the MVS operating system (predecessor of zOS) at IBM in Poughkeepsie and participated in the enhancement of that process to drive quality higher and defect rates lower. The approach to code quality improvement echoes other well-defined quality improvement processes, especially Philip Crosby’s “Quality is Free.” Each type of test assesses code quality along a different dimension.
Unit test verifies the correctness of code sequences, and involves running individual code segments in isolation with a limited set of inputs. This is usually done by the developer himself.
Function/component test verifies the correctness of a bounded set of modules against a comprehensive set of test cases, designed jointly with the code, that exercise both “edge conditions” and expected normal processing sequences. This step validates the correctness of the algorithms encoded in the modules: for instance, the calculated withholding tax for various wage levels should be arithmetically correct and conform to the relevant tax laws and regulations.
Function/component test relies on a set of test cases, which are programs that invoke the functions or components with pre-defined sets of input to validate expected module behavior, side effects, and outputs. As a general rule, for each program step delivered the development organization should define three or four equivalent program steps of test case code. These test cases should be part of the eventual release package, for use in subsequent test phases – including tests by the release, production, and maintenance teams.
(Note that we avoid the notion of “code coverage” as this is illusory. It is not possible to cover all paths in any reasonably complex group of modules. For example, consider a simple program that had five conditional branches and no loops. Complete coverage would require 32 sets of input, while minimal coverage would require 10. Moderately complex programs have a conditional branch every seven program steps, and a typical module may have hundreds of such steps: coverage of one such module with about 200 lines of executable code would require approximately 2**28 variations, or over 250 million separate test cases.
For a full discussion of the complexity of code verification, see “The Art of Software Testing", by Glenn Meyers. This book, originally published in 1979 and now available in a second edition with additional commentary on Internet testing, is the strongest work on the subject.)
System test validates operational characteristics of the entire environment, but not the correctness of the algorithms. System test consists of a range of specific tests of the whole system, with all new modules incorporated on a stable base. These are:
Performance test: is the system able to attain the level of performance and response time the service requires? This test involves creating a production-like environment and running simulated levels of load to verify response time, throughput, and resource consumption. For instance, a particular web application may be specified to support 100,000 concurrent users submitting transactions at the rate of 200 per second over a period of one hour.
Load and stress test: How does the system behave when pressed past its design limits? This test also requires a production-line environment running a simulated workload, but rather than validating a target performance threshold, it validates the expected behavior beyond those thresholds. Does the system consume storage, processing power, or network bandwidth to the degree that other processes cannot run? What indicators does the system provide to alert operations that a failure is imminent, ideally so automation tools could avert a disaster (by throttling back the load, for instance)?
Installation test: Can the product be installed in a defined variety of production environments? This test requires a set of production environments that represent possible real-world target systems. The goal is to verify that the new system installs on these existing systems without error. For instance, what if the customer is two releases behind? Will the product install or does the customer first have to install the intermediate release? What if the customer has modified the configuration or provided typical add-on functionality? Does the product install cleanly? If the product is intended to support continuous operations, can it be installed non-disruptively? Can it be installed without forcing an outage?
Diagnostic test: When the system fails, does it provide sufficient diagnostic information to correctly identify the failing component? This requires a set of test cases that intentionally inject erroneous data to the system causing various components to fail. These tests may be run in a constrained environment, rather than in a full production-like one.
The QA function may be part of the development organization (traditional) or a separate function, reporting to the CIO (desirable). After the successful completion of QA, the product package moves from development into production. The release management function verifies that the development and QA teams have successfully exited their various validation stages, and performs an integration test, sometimes called a pre-production test, to ensure that the new set of modules is compatible with the entire production environment. Release management schedules these upgrades and tests, and verifies that no changes overlap. This specific function, called “collision detection”, can be incorporated into a Configuration Management System (CMS), or Configuration Management Database (CMDB), as described in ITIL version 3 and 2 respectively.
Ideally this new pre-production environment is a replica of the current production one. When it is time to upgrade, operations transfers all workloads from the existing to the new configuration. Operations preserves the prior configuration should problems force the team to fall back to that earlier environment. So at any point in time, operations controls three levels of the production environment: the current running one, called “n”, the previous one, called “n-1”, and the next one undergoing pre-production testing, called “n+1”.
Additional requirements for availability and disaster recovery may force the operations team to also maintain a second copy of these environments – an “n-prime” copy of the “n” level system and possibly and “n-1-prime” copy of the previous good system, for fail-over and fall-back, respectively.
When operations (or the users) detect an error in the production environment, the operations team packages the diagnostic information and a copy of the failing components into a maintenance image. This becomes the responsibility of a maintenance function, which may be a separate team within operations, within development, or simply selected members of the development team itself. Typically this maintenance team is called “level 3” support.
Using the package, the maintenance function may provide a work-around, provide a temporary fix, develop a replacement or upgrade, or specify a future enhancement to correct the defect. How quickly the maintenance team responds follows from the severity level and the relevant service level agreements governing the availability and functionality of the failing component. High severity problems require more rapid response, while lower severity issues may be deferred if a work-around can suffice for a time.
Note that the number of severity levels should be small: three or four at most. Also, the severity levels should be defined by the business owner of the function. Users should not report problems as having high severity simply to get their issues fixed more rapidly; the number of high severity problems should be a small percentage of the total volume of problems over a reasonable time period.
The successful resolution of problems, and the smooth integration of new functions into the production environment, requires a high degree of communication and coordination between development, QA, and production. Rather than passing modules over a wall, they should be transferred across a bridge: The operations team needs access to test cases and test results along with the new code itself, for diagnostic purposes; and the development team benefits from diagnostic information, reports on failure modes and problem history, and operational characteristics such as resource consumption and capacity planning profiles. The ITIL release management function, properly deployed, provides this bridging function.

Sunday, September 26, 2010

The Blanchard Bone and the Big Bang of Consciousness

Found at a cave in southwestern France, the Blanchard Bone is a curious artifact. It appears to be between 25,000 and 32,000 years old. It is only four inches long. It is engraved with about 69 small figures, arranged in a sequence of a flattened figure eight. Archeologists tell us that the carving required twenty-four changes of point or stroke.

What is it? Looking closely at the carving, it seems that the 69 images represent the phases of the moon over two lunar months (image courtesy of Harvard University, Peabody Museum). It isn’t writing: That was still 20,000 to 27,000 years – over four hundred generations and two ice ages – in the future.

What would the night sky mean to our ancestors so long ago? The Sun was directly responsible for heat and light, and defined the rhythm of days. But the Moon moved so slowly, in comparison. What was it? What did our fathers and mothers think they were looking at, when the Moon rose and traveled across the sky, changing its shape from day to day, but always following the same pattern?

Yet the Moon’s travels had implications and meanings: The Sea responded to the Moon in its tides – or did the tides somehow pull the Moon along? How did that happen? What was going on between the Moon and the Sea?

The ancient artist/scientist/priest who carved this artifact carved what she saw – and more. The artifact was useful, for knowing when to plant, when the birds, the herds, or the fish were migrating, when it might be a good time to find a warm cave. The Moon measured fertility and gestation. When people speculated on this, they began to think – about what they saw, and what it meant.

Some wondered if the bone might be magically linked to the Moon and the Sea. Who among them could not be perplexed by the gymnastics, by the dance, of the Moon?

What would that inevitable nighttime procession inspire? How many nights could people look at the slow but predictable change, observe its correlations, and not be challenged to wonder? The first instances of human reasoning could have been inspired by this persistent phenomenon.

In “Hamlet’s Mill: An Essay Investigating the Origins of Human Knowledge and its Transmission through Myth,” Giorgio Desantillana and Hertha von Dechen propose that the myths, as Aristotle taught, are about the stars. The authors trace the myth of Hamlet back to Amlodhi, who owned a magical mill. Once it ground out peace, but it fell off its axle and on the beach ground out sand, and now it has fallen into the Sea where it grinds out salt. I the author’s essay, they reveal that this myth is a story to capture and preserve the observation of the precession of the equinoxes. This is a 25,950 year long cycle, during which the Earth’s North Pole traces a great circle through the heavens. Now the North Pole points to the Pole Star in Ursa Minor, but in 13,000 years it will point to Vega. Only a medium as persistent as a story could span the ages, capturing and preserving this observation.

When the Blanchard bone was formed, the sky was much as it will appear tonight. Between then and now we have passed through one Great Year. When the North Pole pointed to Vega last, our species was beginning to colonize the Western hemisphere, the ice age was capturing water and lowering the seas, and the Blanchard bone had been lost for ten thousand years.

Let us remember that ancient scientist/artist/priest, let us regard her qualities of observation, synthesis, and imagination with wonder: her discovery in the sky urged us to consciousness, communication, and endless wonders beyond.

Sunday, August 23, 2009

Goulash Recipe

This recipe was distributed to Culinary Institute of America students in Chef Robert Steiner's adult education class "Around the World in Five Nights," taught in the spring of 1983 or 1984. The material includes my notes incorporated in the original document.



Beef chuck cut in 1/4" cubes 1 lb.

Olive Oil 1 Tblsp

Butter 1 Tblsp

Onions diced 1/4” 1/2 cup

Flour 2 oz.

Ground Cumin 1 tsp

Chili powder 1/2 tsp

Paprika 2 Tblsp

Cayenne pepper 1 tsp

Fresh ground pepper & salt 1 tsp (to taste)

Chicken stock 5 cups

Boiled potato in 1/4" dice 3 medium

Sour cream 2 oz

Egg noodles 1 lb.


Sauté beef in some oil and butter, till lightly brown. Set aside, keep warm. Add oil to pan and add onions and sauté till golden brown (clear). Set aside with beef. Add oil and butter to pan, add flour, cumin, chili powder, paprika, salt, pepper, and cayenne pepper. Blend 5 or 6 minutes over low heat. Remove from heat and mix in meat and onions. Add 2 cups of stock and mix until smooth, let sit for 10 minutes off heat. Cover pan and return to heat, simmer for 45 minutes or until meat is tender. Add remaining stock and potatoes. Skim fat, correct seasonings (add salt and pepper to taste). Serve over egg noodles.

Let melted butter cover top to hold from afternoon till dinner. You may add a little tomato paste to thicken, if you like. When serving, add a dollop of sour cream or chives to top if you like.

Sunday, April 5, 2009

My Journey Through Total Knee Replacement

In 1991 I started playing racquetball with a friend who was much better than me. After analyzing the problem, I bought very light-weight shoes to get an extra step on the court. They did not help much, but within a few weeks I was getting a lot of swelling around my knees after our early morning matches. I eventually found my way to an orthopedist who told me I’d damaged the meniscus in my left knee. I had an arthroscopic repair in Nov 1992.

The repair left the meniscus tapered, thinner on the outside than on the inside. Over the years, that unevenness increased, and in December 2008 the discomfort passed my threshold of tolerance. I spoke with a new orthopaedist (the other doctor had retired), Dr. John Crowe, of Orthopaedic and Neurological Services, who informed me that a knee replacement can last 20 years or more – not the seven to ten I’d learned from some inaccurate article floating around the Internet. I asked him if he were busy next Tuesday. He laughed and said that he was booked for the next few weeks, so we scheduled the surgery for after the turn of the year. On January 19th I drove myself to Greenwich Hospital, registered, and had the TKA (total knee arthroplasty). Greenwich Hospital has the lowest rate of post-operative infection of all hospitals in the state of Connecticut. This is especially important for any kind of implant surgery: things get bleak if there is an infection involving an implant.

The path to the surgery took a good deal of time and effort. The hospital offered a two hour class on the day before Christmas to guide people facing a knee or hip replacement. After the class, I arranged with the local Red Cross to donate a pint of blood on Dec. 30th that the hospital would hold for the surgery. I started a course of exercise to strengthen the muscles around the knee. Getting the insurance company aligned took exceptional patience, but eventually they were able to provide some measure of support for my impending adventure. I was very fortunate to arrange post-operative recovery and rehabilitation through Waveney Care Center, very near my apartment.

Monday morning was cold. I was up well before dawn, as I had to drive to Greenwich and check in, bringing a bag with clothing and my laptop. The kind folks at the registration desk processed my arrival as courteously and efficiently as at a high-class hotel. I was brought to a pre-op room where I stowed my bags and changed into the hospital gown. I lay down on the hospital bed and a nurse started the IV. After I talked with the OR nurse, I was wheeled into the operating room. The surgery began shortly after 7:00 AM. I remember the bright lights and very cool air in the OR – the temperature suppresses infection, I was told. The anesthesiologist told me that I was about to go under and then I went out – in the middle of a word.

In the course of the operation the surgeon replaces the lower surface of the femur (thigh) joint, the upper surface of the tibia (shin) joint, and the back of the patella (kneecap). The operation could more accurately be called a knee resurfacing, but the common name is Total Knee Replacement or Total Knee Arthroscopy (TKA). As part of the operation, the surgeon removes the anterior cruciate ligament (ACL) since its attachment point is replaced by the mechanical bearing surface. Its function is replaced by elements of the implant. In my case, the surgeon selected the Zimmer Legacy System LPS-flex.

I have not posted my pre-op or post-op X-rays. I might at some future time. Suffice it to say that the surgery was necessary, as noted in this excerpt from the surgeon’s report:

“… There were advanced degenerative changes noted. There was bone exposed on both the lateral femoral condyle and lateral tibial plateau with grooving of the bone….”

I awoke in recovery three hours later, feeling disoriented but not that bad. A nurse helped me stand for a moment to show me that I could bear weight on my new knee, then guided me back into bed. The pain medicine seemed to not be working, which caused me a sense of panic, but the nurse was able to increase the dose a bit which helped somewhat. Some time later I noticed that my left leg was in a mechanical apparatus (continuous passive motion, or CPM, machine) that gently bent my left knee and then extended it, to keep it mobile while the wound healed. I used that machine for more than an hour each day, increasing the flex a few degrees at each session. The machine’s action was not painful: The discomfort from the surgery was not localized, and seeing my leg actually move was comforting. I did feel some discomfort when the physical therapist increased the range of motion.

By Thursday Jan. 22nd I was beginning to feel more alert, at least enough to complain that the pain medicine was clearly not working, and could they try something else? I was moved to Darvocet, then Vicodin (the drug of choice for my anti-hero Dr. Gregory House). Later I learned that the drugs were working just fine, the problem was that the operation is, frankly, painful. I remained somewhat light-headed and my blood pressure wasn’t coming up very rapidly, so the doctor decided to transfuse my pint of blood back. This helped: my vital signs improved. With the exception of the CPM sessions, my left leg was immobilized using a fabric-lined thigh-to-ankle sleeve, open along the front, strengthened with four or five very firm plastic strips, and secured with five Velcro bands. The first few days, it was hard to close – my leg was quite swollen. Once I got moving, and the swelling began to subside, it closed quite easily. I used the immobilizer into my first week at Waveny.

I brought my laptop with me. That Thursday I wrote my first post-op e-mail. It contained 24 words, and I got seven of them wrong. At the time I thought I was quite lucid. I later forwarded it to my friend and asked him if he could figure out what I might have meant. The pain-killers were stronger than I thought. I wonder what the hospital staff thought I was saying: I felt I was being very clear. That day, I achieved 76 degrees of flex, not quite the 80 degrees the physical therapist at Greenwich Hospital had hoped for, but close enough.

On Friday the 23rd I was helped into a wheelchair and driven from the hospital to Waveny in New Canaan. The driver was a Spanish immigrant, and we talked about Madrid and the opportunities that brought him to the US during the short ride. I got to Waveny in the early afternoon and they had already set a lunch aside for me. The desert was a home-made cream puff – it was so tasty! In fact, all the food was great. I hadn’t eaten so well since my last good vacation – and the meals were healthy and the portions just right. One of the physical therapists – Trisha – visited me that first afternoon. Her concern and sensitivity to my discomfort and uncertainty was profoundly comforting. On Saturday one of the occupational therapists – Gus – stopped by to see how I was doing. I said that I’d been working out with free weights at home and hadn’t had a chance to do anything for a while – could he get me a small weight I could use? He left for a moment and brought me a weight, then he sat alongside me and we talked while I did some upper body work, lying in my recuparatory bed, happy to feel that something was unchanged and more would be getting back on track. It is hard to express how emotionally moving it was, and still is in reflection, to feel that genuine compassion and care. I had not felt such a pervasive sense of concern for my well-being since I lived at home as a young boy.

Dr. Crowe said that I should remain in the immobilizer until I could do a straight leg lift. That took into the weekend, six or so days after surgery. Friday was discharge day from the hospital and admittance day at Waveny. Saturday they offered one hour of rehab, but I don’t recall doing much. Sunday was for rest, but I was able to get out of bed unassisted that evening.

On Monday Jan 26th, I began my regimen of two daily one-hour physical therapy sessions, the first at 9:00 AM, the second at 1:00 PM. My physical therapist, Wrenford, wore a shirt labeled “Physical Terrorist” asserting his determination and gusto. And so the work began. It seemed that every day I achieved another milestone. In a few days I started using a cane rather than the walker.

On the evening of Wednesday Jan 28th, after dinner, my daughters visited me with their Mom and stayed till 8:30 PM. When I walked with them to the exit, one remarked with surprise that my legs were straight! I had begun looking at my toes on my left foot anew. Even though my knee was still swollen, it was properly aligned over my ankle, just where it was supposed to be. For years I had grown used to my left foot being a bit further out to the side, and here it was right next to the right foot, where it belonged. I thought, “I love my new knee!”

Initially I took my meals in my room, watching TV or working on the computer. Encouraged by the nursing staff, I began walking to the cafeteria. During meals I got to know others who had knee or hip replacements. There were eight or nine of us, and we would usually sit together at two or three tables. One male patient was a schoolteacher from Westchester County. He was a gregarious NY sports fan. One female patient was joined at lunch by her husband, a cultured, charming gentleman from central Europe. They had raised six girls, and he had authored two books: one on gardening and one on the gardens of Moravia – with insightful commentary on the history and politics of the region.

I did not expect to get involved in occupational therapy, but the Waveny Care Center wants its patients to get along after returning home. The difference between OT and PT was put simply: PT is for the waist down and OT is from the waist up. I asked what the specific goals of OT were, and it turned out that I could meet them by using their kitchen. So I took the opportunity to make a batch of my Black Bean Soup (the recipe is posted elsewhere in this blog). One of the OTs bought the ingredients! I cooked it up and it was pretty good. I was able to use the stove, blender, tools and sink; reach items in cabinets overhead and load the dishwasher; and cook without getting off-balance or fumbling with the cane.

On Tuesday February 3rd, during the morning PT session, I was able to get 90 revolutions on the stationary bike, and - for the first time since the surgery - I walked without a cane. I was still apprehensive on the stairs, fearing that I might get my toe caught and trip. But each day I would do a little better, breaking down the motion of walking up a step into its components: minor weight shift (but don’t rock the hips), lift up from the knee, move the heel back then up, place the ball of the foot squarely on the next tread, shift weight (but don't rock the hip), lift with the quadriceps, bring the other leg to the next step, keeping the knee pointing forward. Repeat.

On Wednesday, I was able to achieve 86 degrees of flex in the knee. Stairs were challenging, I didn’t have the strength to climb normally but with the cane I could make my way up and down, one step at a time, haltingly. Thursday February 5th was my check-out from Waveny.

New Canaan has a program called GetAbout – residents can request transportation within the town by phoning in a few days in advance. They have a small bus and a van, and I used both. I had PT three times a week. On Tuesday the 10th, I got a ride with my ex to see our daughters’ choir concert in Weston. I was able to ride in the passenger seat both ways, and the girls were surprised and delighted to see me – and I was so proud of their performances! I used a cane to walk from the car to the auditorium, and got a spot on the end of a row so I could stretch out my leg. I never got too good with the cane. My goal was not to get good with the cane, but to get rid of the cane.

That weekend I picked up my car from Greenwich Hospital and drove home – freedom! My first stop was a car wash: Four weeks in the garage had left a remarkably thick layer of dust over the whole car, and someone had drawn a bit of art in the window-panes. I stopped at the grocery store and picked up a few things. Peapod was an enormous help during my immobile phase.

Over the next few sessions I documented my knee's progress on Facebook:

Friday, Feb 13th: 91 degrees.

Monday, Feb 16th: 99 degrees. My primary physical therapist, Jane, noticed that I was very tight on the outside of my left leg. Prior to the surgery I had become knock-kneed by over 10 degrees in my left leg. Now that my leg was straight, the muscles and tendons on the outside of my left leg were stretched taut. She recommended calf stretches and a particular massage across the tendons. It was acutely uncomfortable for about 55 seconds – and then it felt great. I started doing the massage at home. I was never able to get as much relief as that first time, but every time helped a bit more.

Wednesday, Feb 18th: 105 deg (but was only able to get to 104 on Friday Feb 20th).

I started walking up and down the stairs in my apartment, haltingly.

Tuesday, Feb 24th: 108 degrees.

Wednesday, Feb 25th: 110 degrees. This is an important milestone – once I was able to get 110 degrees I could move my foot enough to safely go up and down stairs.

On Tuesday, March 3, six weeks post-op, I had a follow-up visit with Dr Crowe. I had achieved 110 degrees of flex and in the office, with no warm-up, I was past 105 degrees. Dr. Crowe advised me that my goal was to reach 120 degrees, so I was well along. This was a significant relief – I had assumed that I was behind schedule on my way to 135 degrees. It turns out I was on schedule for 120, and all was well.

Friday, March 20: 118 degrees.

Over time the swelling in my left ankle diminished rapidly, while my calf took longer. I still have swelling around my knee, and I’m told that will persist into the summer. On Wednesday, March 25, I achieved 120 degrees of flex. The discomfort as of April 5 is minor, mostly associated with the swelling and weakness around the knee. I am able to go up and down stairs with just a minor, diminishing halt to the downstairs gate. The biggest problem I have in day-to-day life now is remembering to get up and walk a bit every twenty minutes or so. By the end of the day, my knee is sometimes a bit stiff. I’m told that by June I should be able to golf.

I graduated from physical therapy on Tuesday, March 31. Wrenford (while I was an in-patient), Jane (my lead physical therapist while I was an out-patient), Hillary and Trisha and the nursing, occupational therapy, and support staff at Waveny were profoundly helpful, supportive, understanding, and positive. You are all amazing people and make a superior team!

If you want to understand the surgical procedure involved in a total knee replacement, see this lecture by Dr. Seth Leopold of the University of Washington in which he discusses both the total knee and uni-compartmental knee surgery, and also discusses conventional hip and minimally invasive hip repair. His lecture includes a brief edited video showing elements of the procedure.

I met three others during my stint at Waveny who had both knees done simultaneously. I could not imagine that degree of discomfort – but they each said they wanted to get through it. One said that if she hadn’t done both at the same time, she probably wouldn’t have had the courage to get the second one done at all. On the other hand, a neighbor of mine had one knee done last fall and the other a few weeks before I had my surgery. He and I met at the pre-Christmas class at Greenwich Hospital. He’s doing very well.

If you want to talk about your TKA please post to this blog and I’d be happy to hear your story, or share more about mine. I’m done with the drugs, except for an occasional ibuprofen and some ice. I took a walk around the block this afternoon and it felt great! It’s been years since I was last able to do that. I'm looking forward to golfing this summer with my daughters and my doctors.