Services to state agencies being restored

Services to state agencies being restored

The Virginia Information Technologies Agency, Northrop Grumman, IT staff at state agencies and EMC are continuing to work to restore services to agencies. The outage resulted from a damaged networked storage system, which now has been repaired. Currently, technicians are bringing up servers and agencies are testing applications.



Update on storage outage
12:00 p.m., Wednesday, September 1, 2010

Significant progress has been made and we have reached high confidence that all services will be restored soon.

* At 1:45 this morning, VITA and Northrop Grumman turned systems back over to DMV for additional testing. DMV still is unable to process drivers' licenses at its customer centers, but other services continue to be available.

* Core functions at the Department of Taxation are online today. Citizen access to Tax systems will be restored tomorrow.

* The State Board of Elections has reached a partially operational status. SBE will work with local registrars to ensure accuracy of voter registrations related to additions and changes and to applications for absentee voting.

* This outage is regrettable and the time it has taken to restore agency operations is unacceptable. We extend our apologies to citizens and state agency employees who were impacted, and ask for their continued patience.

* This outage, while significant and particularly so for those attempting to secure a driver's license, was not as widespread as some think. Of the 89 state agencies, 26 were impacted. Of those 26, three had core business functions impacted. State government functions and computer systems continued to operate. Mainframes, networks, PCs, e-mail and phone systems continued to be available. The outage affected 13% of the Commonwealth's file servers.

* VITA fully supports the Governor's call for a third-party review of the outage and actions taken to restore services.

* Updates will continue to be provided as they become available.

(Note: Earlier reports that 27 agencies were impacted were incorrect. The correct number is 26. The 27th system impacted was Northrop Grumman.)



Update on storage outage
11:45 a.m., Tuesday, August 31, 2010

VITA, Northrop Grumman, EMC and impacted state agencies are continuing to work around the clock to restore services. This networked storage system outage impacted 27 of 89 state agencies. State agencies are operating and serving citizens. The most public-facing issue is that the Department of Motor Vehicles (DMV) cannot issue drivers' licenses at its customer service centers.

* The networked storage unit -- an EMC DMX 3 -- has been repaired. We are continuing the time-consuming process of verifying and restoring data. This is especially time-consuming for those agencies with large amounts of complex data.

* Twenty-four of the 27 affected agencies are up and running. However, three agencies are not yet fully operational. They are, as we've said before, the Department of Motor Vehicles, Department of Taxation and the State Board of Elections.

* This outage has not crippled state government. It has created some challenges and the DMV outage has impacted citizens seeking drivers' licenses but the vast majority of state government computing functions are fully operational. Approximately two-thirds of state agencies were not impacted by this outage.

* We ask for the continued understanding and patience of state employees and citizens as the recovery effort continues.



STATEMENT FROM VIRGINIA SECRETARY OF TECHNOLOGY JIM DUFFEY
5 p.m., Monday, August 30, 2010

On Wednesday, August 25, at approximately 3 p.m., the Commonwealth of Virginia experienced an information technology (IT) infrastructure outage that affected 27 of the Commonwealth's 89 agencies and caused 13 percent of the Commonwealth's file servers to fail. The failure was in the equipment used for data storage, commonly known as a storage area network (SAN). Specifically, the SAN that failed was an EMC DMX-3.

According to the manufacturer of the storage system, the events that led to the outage appear to be unprecedented. The manufacturer reports that the system and its underlying technology have an exemplary history of reliability, industry-leading data availability of more than 99.999 percent and no similar failure has occurred in more than one billion hours of run time. A root cause analysis of the failure is currently being conducted.

The storage unit has been repaired and we have been in the meticulous process of carefully restoring data since the failure. This is a time-consuming process that requires close collaboration with the impacted agencies, especially those agencies with large, complex amounts of data.

Twenty-four of the 27 affected agencies were up and running this morning. However, three agencies are not yet fully operational. These agencies are the Department of Motor Vehicles, Department of Taxation and the State Board of Elections. Other agencies continue to experience minor issues.

The DMV was heavily impacted by this hardware failure and has been unable to process in-person driver's licenses or ID cards at its 74 customer service centers. Please keep checking the DMV website for updates concerning this situation. We understand that this is a great inconvenience for our citizens and we are doing everything in our power to restore service as quickly as possible.

Teams and staff from the affected state agencies, the Virginia Information Technologies Agency, Northrop Grumman and EMC continue to work around the clock to correct the situation. I have confidence in the teams that are working on this problem. They are aggressively executing our recovery plan and are working tirelessly to restore all the affected agencies to a fully operational status. They have made significant progress and continue to do so. I ask for the continued understanding and patience of state employees and citizens as this work continues.



Update on storage outage
10 p.m., Sunday, August 29, 2010

Throughout the weekend, teams have been working steadily and deliberately to ensure the restoration process is complete and that all data is verified following the networked storage system failure experienced last week. We appreciate the cooperation and patience of every agency and citizen affected by this issue. Below is an update regarding the substantial progress that has been made over the weekend and that we expect will continue to be made through the night:

* Successful repair to the storage system hardware is complete, and all but three or possibly four agencies out of the 26 agency systems have been restored. Agencies continue to perform verification testing.
* Progress continues, but work is not yet complete for the three or four agencies that have some of the largest and most complex databases. These databases make the restoration process extremely time consuming. The unfortunate result is the agencies will not be able to process some customer transactions until additional testing and validation are complete.
* According to the manufacturer of the storage system, the events that led to the outage appear to be unprecedented. The manufacturer reports that the system and its underlying technology have an exemplary history of reliability, industry-leading data availability of more than 99.999% and no similar failure in one billion hours of run time.
* While most issues have been resolved over the weekend, some issues may continue as the impacted systems are tested and validated. State agencies should report any issues to the VITA Customer Care Center (VCCC) at (866) 637-8482 or vccc@vita.virginia.gov. Additional staff will be available to handle any increase in call volume. Please note: E-mail should not be used to report critical issues or outages impacting an agency. To report a critical issue, please call the VCCC directly.



Update on storage outage
11 a.m., Friday, Aug. 27, 2010

* Maintenance and repair work proceeded as expected overnight. Applications, servers and the damaged storage system were taken down as planned.
* The storage system has been repaired.
* More than 60 percent of the servers attached to that storage system have been brought back up and are operational. Agencies are testing those servers and related applications.
* Work continues to restore the remaining servers.
* The system is not yet 100 percent operational. We will be brining services up throughout the day. Some data must be restored.
* Unfortunately, DMV still cannot process driver's licenses at its customer centers. Please check the DMV website for details on services available at the customer centers and online. Some other agencies continue to be impacted.



Update on storage outage
3 p.m., Thursday, Aug. 26, 2010

* While work continued today (Thursday, August 26, 2010) on repairing the faulty networked storage system, information technology operations continued in a degraded mode impacting 24 state agencies.
* The storage provider, EMC, determined that the best course of action is to perform an extensive maintenance and repair process. VITA and Northrop Grumman, in consultation, have determined this is the best way to proceed.
* That process of maintenance and repair will begin at 5:30 p.m. today.
* That process will impact the original 24 agencies. Impact to additional agencies still is being determined. We are aggressively notifying agencies of these plans so that they can plan accordingly.
* VITA, Northrop Grumman and EMC will coordinate efforts to execute a plan to replace the failed components and any others that appear suspect. This process involves shutting down applications, bringing down servers and bringing down the storage system, and restoring all three in reverse order. Applications will be brought back online during the early morning hours tomorrow.
* Afterwards, corruption of some data is expected. VITA and Northrop Grumman will work directly with agencies tomorrow for remediation and to restore data from backups.
* This is the best course of action to remedy the current situation and bring all applications back on line as expeditiously as possible.
* The SAN is a high-performance, highly reliable unit. EMC representatives report it is highly unusual to encounter a dual failure in a networked storage system. This is a unique issue that requires this extensive maintenance and repair process.
* Special thank you to citizens and state employees impacted for their patience and understanding during this challenge.



Update on storage outage
10 a.m., Thursday, Aug. 26

* A storage area network (SAN) failed. Redundancy also failed. This hardware failure occurred shortly before 3 p.m. Wednesday.
* Servers attached to this storage network could not access data.
* VITA and Northrop Grumman activated the rapid response team and began work with the appropriate vendors to restore service. Teams have worked through the night.
* Not all systems at a given agency are down. Possible impacts include databases, applications and websites
* Approximately two dozen agencies are impacted. For security reasons and because the degree of severity varies at the impacted agencies, we are not providing the names of those agencies.
* Such failures occur with information technology systems, the same as they do with power grids. We work to avoid such failures, minimize the impact and restore service as quickly as possible. And, we are doing just that now.
* The work we have done in recent years to improve the information technology for the state has made our systems more secure and reliable. When something like this does happens, we have processes in place and dedicated, knowledgeable staff at the agencies, VITA and Northrop Grumman to respond appropriately.