CIOs Extending Their Responsibilities to Clinical Care

July 6, 2022

CIOs Extending Their Responsibilities to Clinical Care

Order ID:89JHGSJE83839 Style:APA/MLA/Harvard/Chicago Pages:5-10

Instructions:

CIOs Extending Their Responsibilities to Clinical Care

All systems down http://www.computerworld.com/s/article/print/78803/All_systems_down…

1 of 11 3/7/2013 11:33 AM

All systems down http://www.computerworld.com/s/article/print/78803/All_systems_down…

Scott Berinato, CIO magazine February 25, 2003 (CIO)

Among the 30-odd CIOs who serve Boston’s world-famous health-care institutions, John Halamka is a star among stars. He has been CIO of the CareGroup health organization and its premier teaching hospital—the prestigious Beth Israel Deaconess Medical Center—since 1998. He helps set the agenda for the Massachusetts Health Data Consortium, a confederation of executives that determines health-care data policies for New England.

Until 2001, the 40-year-old Halamka also worked as an emergency room physician, but he gave that up to take on the additional responsibilities of being CIO of Harvard Medical School in 2002. However, as a globally recognized expert on mushroom and wild plant poisonings, he is still called when someone ingests toxic flora.

All of this has earned Halamka a considerable measure of renown. For two years running, InformationWeek named Halamka’s IT organization number one among hospitals in its yearly ranking of innovative IT groups. In September 2002, CareGroup was ranked 16th on InformationWeek’s list of 500.

Two months later, Beth Israel Deaconess experienced one of the worst health-care IT disasters ever. Over four days, Halamka’s network crashed repeatedly, forcing the hospital to revert to the paper patient-records system that it had abandoned years ago. Lab reports that doctors normally had in hand within 45 minutes took as long as five hours to process. The emergency department diverted traffic for hours during the course of two days. Ultimately, the hospital’s network would have to be completely overhauled.

This crisis struck just as health-care CIOs are extending their responsibilities to clinical care. Until recently, only ancillary systems like payroll and insurance had been in the purview of the CIO. But now, in part because of Halamka and his peers, networked systems such as computerized prescription order entry, electronic medical records, lab reports and even Web conferencing for surgery have entered the life of the modern hospital. These new applications were something for health-care CIOs to boast about, and Halamka often did, even as the network that supported the applications was being taken for granted.

“Everything’s the Web,” Halamka says now. “If you don’t have the Web, you’re down.”

Until last Nov. 13, no one, not even Halamka, knew what it really meant to be down. Now, in the wake of the storm, the CIO is calling it his moral obligation to share what he’s learned.

“I made a mistake,” he says. “And the way I can fix that is to tell everybody what happened so they can avoid this.”

Sitting in his office three weeks after the crash, Halamka appears relaxed and self-possessed. There’s another reason he’s opening up, talking now about the worst few days of his professional

2 of 11 3/7/2013 11:33 AM

All systems down http://www.computerworld.com/s/article/print/78803/All_systems_down…

life at CareGroup. “It’s therapeutic for me,” he says, and then he begins reliving the disaster.

Wednesday: The network flaps

On Nov. 13, 2002, a foggy, rainy Wednesday, Halamka was alone in his office at Beth Israel when he noticed the network acting sluggishly. It was taking five or 10 seconds to send and receive e-mail. Around 1:45 p.m., he strolled over to the network team to find out what was up.

A few of his 250 IT staff members, who range from low-level administrators to senior application developers, had already noted the problem. They told him not to worry. There was a CPU spike—a sudden surge in traffic. RCA, one of the core network switches, was getting pummeled. From where, they didn’t know. It might have to do with a consultant who was working on RCA, preparing it for a network remediation project.

“We happened to have had a guy in there,” recalls Russell Rusch of Callisma, the company leading the remediation project. “We knew the hospital had had similar incidents in the past few months.” Those previous CPU spikes lasted anywhere from 15 minutes to two hours, he says. Then they worked themselves out. Like indigestion.

Halamka’s team decided to begin shutting down virtual LANs, or VLANs. They would turn off switches to isolate the source of the problem, much in the same way one would go around a house shutting off lights to find out which one was buzzing. Halamka thought the plan sounded reasonable.

It was a mistake.

Shutting switches forced other switches to recalculate their traffic patterns. These calculations were so complex that those switches gave up doing everything else.

Traffic stopped. The network was down.

Within 15 minutes, by 2 p.m., the team reversed course and turned all the switches back on. A sluggish network, they figured, was preferable to a dead one.

For the rest of the day and into the night, the network flapped—a term Halamka uses to describe the network’s state of lethargy dotted by moments of availability and, more often, spurts of dead nothing. The team searched for the cause. Around 6 p.m., when most of the doctors, nurses, staff and students left, the network settled down. Finally, at 9 p.m., the IT staff found its gremlin: a spanning tree protocol loop.

Spanning tree protocol is like a traffic cop. Data arrives at a switch and asks spanning tree for directions. Say, from John’s server to Mary’s desktop. Spanning tree calculates the shortest route. It then blocks off every other possible route so that the data will go straight to its destination without having to make decisions at other crossroads along the way.

3 of 11 3/7/2013 11:33 AM

All systems down http://www.computerworld.com/s/article/print/78803/All_systems_down…

But spanning tree will look only as far out as seven intersections. Should data reach an eighth intersection, called a hop in networking, it will lose its way. Often, it will drive itself into a loop. This clogs the network in two ways. First, the looped traffic itself gums up the works. Then, other switches start to use their computing horsepower to recalculate their spanning trees—to make up for the switch that is directing traffic in a loop—instead of directing their own traffic.

That’s what happened at Beth Israel Deaconess. On Wednesday, a researcher uploaded data into a medical file-sharing application, and it looped. The data was several gigabytes, so it clogged the pipes. Then, when Halamka’s team turned off a switch at 1:45 p.m., it was as if one cop closed an intersection and every other cop stopped traffic in all directions to figure out alternate routes.

Halamka’s team now knew what happened, if not where it happened. Standard troubleshooting protocol for spanning tree loops calls for cutting off redundant links on the network. “What you’re doing is eliminating potential spots where there are too many hops, and creating one path from every source to every destination,” Callisma’s Rusch says. “It might make for a slower environment”—without backup—”but it should make for a stable environment.”

“We cut the links,” Halamka says. “It seemed to work. We went home feeling great. We had figured it out.”

Thursday: Clogged arteries

Hospitals come alive early. By 7 a.m., doctors and nurses started to send some of Beth Israel Deaconess’s 100,000 daily e-mails. The pharmacy began filling prescriptions, transferring the first bits of the 40 terabytes that traverse the network daily. Some of the 3,000 daily lab reports were beginning to move.

By 8 a.m., the network again started acting as if it were flying into a headwind. Halamka realized the network had settled down the night before only because hardly anyone was using it. When the workday began in earnest, CPU usage spiked. The network started flapping. The problem hadn’t been fixed.

Halamka’s team scrambled to find other possible sources of the trouble. One suspect was CareGroup’s network of outlying hospitals in Cambridge, Needham, Ayer and elsewhere in Massachusetts. They operated as a distinct network that plugged into Beth Israel Deaconess. The community hospitals’ network was sluggish, and a billing application wasn’t working, according to Jeanette Clough, CEO of Mount Auburn Hospital in Cambridge, which serves as the hub for the outlying hospitals’ network.

The easiest thing to do would be to cut the links, eliminating the potential for spanning tree loops. But that would isolate the outlying hospitals. Instead, the IS team, along with Callisma engineers, chose a more complex option. They would try converting from switching to routing between the core network and the outlying hospitals. That would eliminate spanning tree issues while keeping those hospitals connected.

4 of 11 3/7/2013 11:33 AM

All systems down http://www.computerworld.com/s/article/print/78803/All_systems_down…

They tried for seven hours, and, for arcane reasons that have to do with VLAN Trunking Protocol (VTP), they never got the routing to work. The network flapped all day.

Around midmorning, as Halamka was explaining the routing strategy to CareGroup executives in an ad hoc meeting, a patient, an alcoholic in her 50s, was admitted to Beth Israel Deaconess’s ICU. Dr. Daniel Sands, a primary care physician and director of the hospital’s clinical computing staff, saw her. She had what Sands calls “astounding electrolyte deficiencies,” a problem common to people who drink their meals. In fact, Sands says, “It was incredible she was alive.

“I needed to be careful with this woman. I needed to try treatments based on lab reports and then monitor progress and adjust as I went,” recalls Sands. “But all of a sudden, we couldn’t operate like that. Usually I get labs back in less than an hour; they were taking five hours, and here I have a patient who could die. I was scared.” (The patient would survive.)

At 4 p.m., Halamka met with a minicrisis team that included the head of nursing, the heads of the lab and the pharmacy, and hospital COO Dr. Michael Epstein. “Even then,” Halamka says, “I’m still saying, ‘We’re one configuration change away,’ and my assumption is things will be back up soon.”

But his team was tense and frustrated. CareGroup’s help desk had been flooded with calls. They were hearing everything from “I can’t check my e-mail” to “I don’t know if the blood work I just requested went through.”

At 3:50 p.m., Beth Israel closed its emergency room. It stayed closed for four hours, until 7:50 p.m., according to Massachusetts Department of Public Health documents.

It was at the 4 p.m. meeting that COO Epstein says he realized “this was more than a garden- variety down-and-up network.” Clinical users, like Sands, were signaling that they were worried. Epstein and Halamka, along with hospital executives and network consultants, decided to take extreme measures. They called Cisco Systems, the hospital’s San Jose, Calif.-based equipment and support vendor. Cisco responded by triggering its Customer Assurance Program (CAP), a bland name that belies how rare and how serious CAPs are. CAP means Cisco commits any amount of money and every resource available until a crisis is resolved.

CAP was declared shortly after 4 p.m. By 6 p.m., a local CAP team from nearby Chelmsford, Mass., had set up a command center at the hospital and initiated “follow the sun” support—meaning additional staff at Cisco’s technical assistance centers would be plugged in to the crisis until their workday ended, when they’d hand off support to a similar group a few time zones behind them.

First, the CAP team wanted an instant network audit to locate CareGroup’s spanning tree loop. The team needed to examine 25,000 ports on the network. Normally, this is done by querying the ports. But the network was so listless, queries wouldn’t go through.

As a workaround, they decided to dial in to the core switches by modem. All hands went searching

5 of 11 3/7/2013 11:33 AM

All systems down http://www.computerworld.com/s/article/print/78803/All_systems_down…

for modems, and they found some old US Robotics 28.8Kbps models buried in a closet. Like musty yearbooks pulled from an attic, they blew the dust off them. They ran them to the core switches around Boston’s Longwood medical area and plugged them in. CAP was in business.

An outmoded network

By 9 p.m., they had pinpointed the problematic spanning tree loop. The Picture Archive Communication System (PACS) network, for sharing high-bandwidth visual files and other clinical data, was 10 hops away from the closest core network switch, three too many for spanning tree to handle.

And that’s when the dimensions of the problem fully dawned on the team members: They were struggling with an outmoded network. In September 2002, Halamka had hired Callisma’s Rusch to audit CareGroup’s infrastructure. When Rusch finished, he told Halamka, “You have a state- of-the-art network—for 1996.”

Halamka’s network was all Layer 2 switches with no Layer 3 routing. Switching is fast, inexpensive and relatively dumb, and it relies on spanning tree protocol. Routing is more expensive but smarter. Routers have quality-of-service throttles to control bandwidth and to isolate heavy traffic before it overwhelms the network. State-of-the-art networks in 2002 have routing at their core.

In 1996, CareGroup’s network was Beth Israel Hospital, and at its core was a switch called Libby030. In October of that year, the hospital merged with Deaconess Hospital. Deaconess’s network was plugged into Libby030.

Other systems were tacked on in the same way. In 1998, CareGroup connected PACS to what used to be Deaconess Hospital. A year later, CareGroup linked a new data center and its two core switches (RCA and RCB) to Libby030. There would be a fourth core switch added and a skein of redundant links, but Libby030 remained the main outlet. Halamka now understands that this was a “network of extension cords to extension cords. It was very fragile,” he says.

To fix the problem, the CAP team decided to put a Cisco 6509 router between the core network and PACS, eliminating spanning tree protocol and its seven-hop limitation. (The 6509 also has switching capabilities, so the team decided to kill three switches inside PACS and use the 6509 for that too.)

Soon after 9 p.m., a Boeing 747 with a Cisco 6509 on board left Mineta International Airport in San Jose bound for Boston’s Logan International Airport.

The local CAP team spent the night rebuilding the PACS network, a feat Halamka talks about with a fair bit of awe: The first time around, PACS took six months to build.

After working through the night, the team was momentarily disheartened Friday morning to see that, despite PACS being routed, the network was still saturated. But they rebooted Libby030 and

6 of 11 3/7/2013 11:33 AM

All systems down http://www.computerworld.com/s/article/print/78803/All_systems_down…

another core switch, which brought out the smiles.

“We rebooted and things looked pretty,” Halamka says.

Friday: Back to paper

By 8 a.m., the network started to flap again.

At 10 a.m., Halamka and COO Epstein decided to shut down the network and run the hospital on paper. The decision turned out to be liberating.

“We needed to stop bothering the devil out of the IT team,” says Epstein.

Shutting down the network also freed Sands and the hospital’s clinicians. Some had already given up on the computers but felt guilty about it. But “once the declaration came that we were shutting down the network, we felt absolved of our guilt,” Sands recalls.

The first job in adapting to paper is to find it: prescription forms, lab request forms. They had been tucked away and forgotten. And many of the newer interns had never used them before. On Friday, they were taught how to write prescriptions. When Sands had to write one, it was his first in 10 years at CareGroup. “When I do this on computer, it checks for allergy complications and makes sure I prescribe the correct dosage and refill period. It prints out educational materials for the patient. I remember being scared. Forcing myself to write slowly and legibly.”

At noon, Epstein came in to lend a hand … and walked into 1978. Epstein worked the copier, then sorted a three-inch stack of microbiology reports and handed them to runners who took them to patients’ rooms where they were left for doctors. (There were about 450 patients at the hospital.)

In time, the chaos gave way to a loosely defined routine, which was slower than normal and far more harried. The pre-IT generation, Sands says, adapted quickly. For the IT generation, himself included, it was an unnerving transition. He was reminded of a short story by the Victorian author E.M. Forster, “The Machine Stops,” about a world that depends upon an ?ber-computer to sustain human life. Eventually, those who designed the computer die and no one is left who knows how it works.

“We depend upon the network, but we also take it for granted,” Sands says. “It’s a credit to Halamka that we operate with a mind-set that the computers never go down. And that we put more and more critical demands on the systems. Then there’s a disaster. And you turn around and say, Oh my God.”

Halamka had become an ad hoc communications officer for anyone looking for information. Halamka was the hub of a wheel with spokes coming in to him from everywhere—the CAP team, executive staff, clinicians and the outlying hospitals. Halamka leaned on his emergency room

CIOs Extending Their Responsibilities to Clinical Care

RUBRIC

Excellent Quality

95-100%

Introduction

45-41 points

The background and significance of the problem and a clear statement of the research purpose is provided. The search history is mentioned.

Literature Support

91-84 points

The background and significance of the problem and a clear statement of the research purpose is provided. The search history is mentioned.

Methodology

58-53 points

Content is well-organized with headings for each slide and bulleted lists to group related material as needed. Use of font, color, graphics, effects, etc. to enhance readability and presentation content is excellent. Length requirements of 10 slides/pages or less is met.

Average Score

50-85%

40-38 points

More depth/detail for the background and significance is needed, or the research detail is not clear. No search history information is provided.

83-76 points

Review of relevant theoretical literature is evident, but there is little integration of studies into concepts related to problem. Review is partially focused and organized. Supporting and opposing research are included. Summary of information presented is included. Conclusion may not contain a biblical integration.

52-49 points

Content is somewhat organized, but no structure is apparent. The use of font, color, graphics, effects, etc. is occasionally detracting to the presentation content. Length requirements may not be met.

Poor Quality

0-45%

37-1 points

The background and/or significance are missing. No search history information is provided.

75-1 points

Review of relevant theoretical literature is evident, but there is no integration of studies into concepts related to problem. Review is partially focused and organized. Supporting and opposing research are not included in the summary of information presented. Conclusion does not contain a biblical integration.

48-1 points

There is no clear or logical organizational structure. No logical sequence is apparent. The use of font, color, graphics, effects etc. is often detracting to the presentation content. Length requirements may not be met

You Can Also Place the Order at www.collegepaper.us/orders/ordernow or www.crucialessay.com/orders/ordernow