Việc nghiên cứu các kỹ thuật phát hiện sao chép thu hút được nhiều sự
quan tâm của các nhà nghiên cứu trong và ngoài nước. Do vậy, luận án đã đề
xuất hướng nghiên cứu liên quan đến lớp bài toán này. Qua thời gian nghiên cứu,
thấy rằng các đề xuất liên quan đến bài toán phát hiện sao chép vẫn còn một số
hạn chế như: các đề xuất giải quyết các trường hợp sao chép có sự thay đổi chưa
thực sự hiệu quả và vấn đề ứng dụng các kỹ thuật phát hiện sao chép cho văn
bản tiếng Việt còn nhiều hạn chế. Chính vì vậy, hướng nghiên cứu của luận án
là cần thiết. Luận án đã đạt được mục tiêu là đề xuất các kỹ thuật liên quan đến
bài toán phát hiện sao chép toàn cục, xây dựng các kho ngữ liệu tiếng Việt và
cải tiến các kỹ thuật đã đề xuất thử nghiệm trên kho ngữ liệu này góp phần khắc
phục các hạn chế đã nêu.
Các kết quả của luận án đạt được là:
- Nghiên cứu về bài toán phát hiện sao chép toàn cục; phân tích, đánh giá
ưu nhược điểm của các hướng nghiên cứu liên quan đến hai bài toán thành phần
gồm bài toán trích rút từ khóa tìm tập tài liệu ứng cử và bài toán phát hiện đoạn
sao chép.
- Đã đề xuất phương pháp trích rút từ khóa tìm tập tài liệu ứng cử và hai
phương pháp phát hiện đoạn sao chép cho văn bản tiếng Anh. Thực hiện thực
nghiệm, so sánh và đánh giá hiệu quả của các phương pháp đề xuất so với các
tiếp cận trên thế giới liên quan đến mỗi bài toán.
- Đã đề xuất phương pháp trích rút từ khóa cho văn bản dài tiếng Việt. Cải
tiến các kỹ thuật đã đề xuất cho văn bản tiếng Anh ứng dụng cho văn bản tiếng
Việt.
- Đã đề xuất giải pháp và quy trình xây dựng kho ngữ liệu phát hiện đoạn
sao chép tiếng Việt phục vụ thử nghiệm, đánh giá các thuật toán phát hiện sao
chép cho văn bản tiếng Việt.
- Đã thu thập và xây dựng hai kho ngữ liệu tiếng Việt gồm kho ngữ liệu
bài báo và kho ngữ liệu ĐATN sử dụng cho bài toán trích rút từ khóa tiếng Việt
                
              
                                            
                                
            
 
            
                 173 trang
173 trang | 
Chia sẻ: huydang97 | Lượt xem: 690 | Lượt tải: 0 
              
            Bạn đang xem trước 20 trang tài liệu Luận án Nghiên cứu phát triển một số kỹ thuật hỗ trợ phát hiện đạo văn và ứng dụng cho văn bản Tiếng Việt, để xem tài liệu hoàn chỉnh bạn click vào nút DOWNLOAD ở trên
e time. 
Our approach to addressing the networking requirements for live 
WAN migration builds on the observations that not all networking 
changes in this approach are time critical and further that 
instantaneous changes are best achieved in a localized manner. 
Specifically, in our solution, described in detail in Section 3, we allow the 
migration software to initiate the necessary networking changes as 
soon as the need for migration has been identified. We make use 
of tunneling technologies during this initial phase to preemptively 
establish connectivity between the data centers involved. Once 
server migration is complete, the migration software initiates a 
local change to direct traffic towards the new data center via the 
tunnel. Slower time scale network changes then phase out this local 
network connectivity change for a more optimal network wide path 
to the new data center. 
2.3 Storage Replication Requirements 
Data availability is typically addressed by replicating business 
data on a local/primary storage system, to some remote location 
from where it can be accessed. From a business/usability point of 
view, such remote replication is driven by two metrics [9]. First 
263 
is the recovery-point-objective which is the consistent data point to 
which data can be restored after a disaster. Second is the 
recoverytime-objective which is the time it takes to recover to that consistent 
data point after a disaster [13]. 
Remote replication can be broadly classified into the following 
two categories: 
¡ Synchronous replication: every data block written to a local 
P.6 
storage system is replicated to the remote location before the 
local write operation returns. 
¡ Asynchronous replication: in this case the local and remote 
storage systems are allowed to diverge. The amount of 
divergence between the local and remote copies is typically 
bounded by either a certain amount of data, or by a certain 
amount of time. 
Synchronous replication is normally recommended for 
applications, such as financial databases, where consistency between local 
and remote storage systems is a high priority. However, these 
desirable properties come at a price. First, because every data block 
needs to be replicated remotely, synchronous replication systems 
can not benefit from any local write coalescing of data if the same 
data blocks are written repeatedly [16]. Second, because data have 
to be copied to the remote location before the write operation 
returns, synchronous replication has a direct performance impact on 
the application, since both lower throughput and increased latency 
of the path between the primary and the remote systems are 
reflected in the time it takes for the local disk write to complete. 
An alternative is to use asynchronous replication. However, 
because the local and remote systems are allowed to diverge, 
asynchronous replication always involves some data loss in the event 
of a failure of the primary system. But, because write operations 
can be batched and pipelined, asynchronous replication systems 
can move data across the network in a much more efficient 
manner than synchronous replication systems. 
For WAN live server migration we seek a more flexible 
replication system where the mode can be dictated by the migration 
semantics. Specifically, to support live server migration we propose a 
remote replication system where the initial transfer of data between 
the data centers is performed via asynchronous replication to 
benefit from the efficiency of that mode of operation. When the bulk of 
the data have been transfered in this manner, replication switches 
to synchronous replication in anticipation of the completion of the 
server migration step. The final server migration step triggers a 
simultaneous switch-over to the storage system at the new data 
center. In this manner, when the virtual server starts executing in the 
new data center, storage requirements can be locally met. 
3. WAN MIGRATION SCENARIOS 
In this section we illustrate how our cooperative, context aware 
approach can combine the technical building blocks described in 
the previous section to realize live server migration across a wide 
area network. We demonstrate how the coordination of server 
virtualization and migration technologies, the storage replication 
P.7 
subsystem and the network can achieve live migration of the entire data 
center across the WAN. We utilize different scenarios to 
demonstrate our approach. In Section 3.1 we outline how our approach 
can be used to achieve the safe live migration of a data center when 
planned maintenance events are handled. In Section 3.2 we show 
the use of live server migration to mitigate the effects of unplanned 
outages or failures. 
3.1 Maintenance Outages 
We deal with maintenance outages in two parts. First, we 
consider the case where the service has no (or very limited) storage 
requirements. This might for example be the case with a network 
element such as a voice-over-IP (VoIP) gateway. Second, we deal 
with the more general case where the service also requires the 
migration of data storage to the new data center. 
Without Requiring Storage to be Migrated: Without storage to 
be replicated, the primary components that we need to coordinate 
are the server migration and network mobility. Figure 1 shows 
the environment where the application running in a virtual server 
VS has to be moved from a physical server in data center A to a 
physical server in data center B. 
Prior to the maintenance event, the coordinating migration 
management system (MMS) would signal to both the server 
management system as well as the network that a migration is imminent. 
The server management system would initiate the migration of the 
virtual server from physical server a (¢¤£¦¥ ) to physical server b 
(¢¤£¦§ ). After an initial bulk state transfer as preparation for 
migration, the server management system will mirror any state changes 
between the two virtual servers. 
Similarly, for the network part, based on the signal received 
from the MMS, the service provider edge (¢©¨ ) router will 
initiate a number of steps to prepare for the migration. Specifically, 
as shown in Figure 1(b), the migration system will cause the 
network to create a tunnel between ¢©¨and ¢©¨which will be used 
subsequently to transfer data destined to VS to data center B. 
When the MMS determines a convenient point to quiesce the VS, 
another signal is sent to both the server management system and 
the network. For the server management system, this signal will 
indicate the final migration of the VS from data center A to data 
center B, i.e., after this the VS will become active in data center B. 
For the network, this second signal enables the network data path to 
switchover locally at ¢©¨©¥ to the remote data center. Specifically, 
from this point in time, any traffic destined for the virtual server 
address that arrives at ¢©¨©¥ will be switched onto the tunnel to 
¢©¨©§ for delivery to data center B. 
P.8 
Note that at this point, from a server perspective the migration is 
complete as the VS is now active in data center B. However, traffic 
is sub-optimally flowing first to ¢©¨©¥ and then across the tunnel 
to ¢©¨¤§ . To rectify this situation another networking step is 
involved. Specifically, ¢©¨©§ starts to advertise a more preferred route 
to reach VS, than the route currently being advertised by ¢©¨¤¥ . In 
this manner, as ingress PEs to the network (¢©¨¤to ¢©¨¤ in 
Figure 1) receive the more preferred route, traffic will start to flow to 
¢©¨©§ directly and the tunnel between ¢©¨©¥ and ¢©¨©§ can be torn 
down leading to the final state shown in Figure 1(c). 
Requiring Storage Migration: When storage has to also be 
replicated, it is critical that we achieve the right balance between 
performance (impact on the application) and the recovery point or 
data loss when the switchover occurs to the remote data center. To 
achieve this, we allow the storage to be replicated asynchronously, 
prior to any initiation of the maintenance event, or, assuming the 
amount of data to be transfered is relatively small, asynchronous 
replication can be started in anticipation of a migration that is 
expected to happen shortly. Asynchronous replication during this 
initial phase allows for the application to see no performance 
impact. However, when the maintenance event is imminent, the MMS 
would signal to the replication system to switch from asynchronous 
replication to synchronous replication to ensure that there is no 
loss of data during migration. When data is being replicated 
synchronously, there will be a performance impact on the application. 
264 
Figure 1: Live server migration across a WAN 
This requires us to keep the exposure to the amount of time we 
replicate on a synchronous basis to a minimum. 
When the MMS signals to the storage system the requirement 
to switch to synchronous replication, the storage system completes 
all the pending asynchronous operations and then proceeds to 
perform all the subsequent writes by synchronously replicating it to 
the remote data center. Thus, between the server migration and 
synchronous replication, both the application state and all the 
storage operations are mirrored at the two environments in the two data 
centers. When all the pending write operations are copied over, 
then as in the previous case, we quiesce the application and the 
network is signaled to switch traffic over to the remote data center. 
From this point on, both storage and server migration operations 
are complete and activated in data center B. As above, the network 
state still needs to be updated to ensure optimal data flow directly 
to data center B. 
Note that while we have described the live server migration 
P.9 
process as involving the service provider for the networking part, it 
is possible for a data center provider to perform a similar set of 
functions without involving the service provider. Specifically, by 
creating a tunnel between the customer edge (CE) routers in the 
data center, and performing local switching on the appropriate CE, 
rather than on the PE, the data center provider can realize the same 
functionality. 
3.2 Unplanned Outages 
We propose to also use cooperative, context aware migration to 
deal with unplanned data center outages. There are multiple 
considerations that go into managing data center operations to plan 
and overcome failures through migration. Some of these are: (1) 
amount of overhead under normal operation to overcome 
anticipated failures; (2) amount of data loss affordable (recovery point 
objective - RPO); (3) amount of state that has to be migrated; and 
(4) time available from anticipated failure to occurrence of event. 
At the one extreme, one might incur the overhead of completely 
mirroring the application at the remote site. This has the 
consequence of both incurring processing and network overhead under 
normal operation as well as impacting application performance 
(latency and throughput) throughout. The other extreme is to only 
ensure data recovery and to start a new copy of the application at the 
remote site after an outage. In this case, application memory state 
such as ongoing sessions are lost, but data stored on disk is 
replicated and available in a consistent state. Neither this hot standby 
nor the cold standby approach described are desirable due to the 
overhead or the loss of application memory state. 
An intermediate approach is to recover control and essential state 
of the application, in addition to data stored on disk, to further 
minimize disruptions to users. A spectrum of approaches are possible. 
In a VoIP server, for instance, session-based information can be 
mirrored without mirroring the data flowing through each session. 
More generally, this points to the need to checkpoint some 
application state in addition to mirroring data on disk. Checkpointing 
application state involves storing application state either periodically 
or in an application-aware manner like databases do and then 
copying it to the remote site. Of course, this has the consequence that 
the application can be restarted remotely at the checkpoint 
boundary only. Similarly, for storage one may use asynchronous 
replication with a periodic snapshot ensuring all writes are up-to-date 
at the remote site at the time of checkpointing. Some data loss 
may occur upon an unanticipated, catastrophic failure, but the 
recovery point may be fairly small, depending on the frequency of 
checkpointing application and storage state. Coordination between 
P.10 
265 
the checkpointing of the application state and the snapshot of 
storage is key to successful migration while meeting the desired RPOs. 
Incremental checkpointing of application and storage is key to 
efficiency, and we see existing techniques to achieve this [4, 3, 11]. 
For instance, rather than full application mirroring, a virtualized 
replica can be maintained as a warm standby-in dormant or 
hibernating state-enabling a quick switch-over to the previously 
checkpointed state. To make the switch-over seamless, in addition 
to replicating data and recovering state, network support is needed. 
Specifically, on detecting the unavailability of the primary site, the 
secondary site is made active, and the same mechanism described 
in Section 3.1 is used to switch traffic over to reach the secondary 
site via the pre-established tunnel. Note that for simplicity of 
exposition we assume here that the PE that performs the local switch 
over is not affected by the failure. The approach can however, 
easily be extended to make use of a switchover at a router deeper in 
the network. 
The amount of state and storage that has to be migrated may vary 
widely from application to application. There may be many 
situations where, in principle, the server can be stateless. For example, 
a SIP proxy server may not have any persistent state and the 
communication between the clients and the proxy server may be using 
UDP. In such a case, the primary activity to be performed is in 
the network to move the communication over to the new data 
center site. Little or no overhead is incurred under normal operation to 
enable the migration to a new data center. Failure recovery involves 
no data loss and we can deal with near instantaneous, catastrophic 
failures. 
As more and more state is involved with the server, more 
overhead is incurred to checkpoint application state and potentially 
to take storage snapshots, either periodically or upon application 
prompting. It also means that the RPO is a function of the 
interval between checkpoints, when we have to deal with instantaneous 
failures. The more advanced information we have of an impending 
failure, the more effective we can be in having the state migrated 
over to the new data center, so that we can still have a tighter RPO 
when operations are resumed at the new site. 
4. RELATED WORK 
Prior work on this topic falls into several categories: virtual 
machine migration, storage replication and network support. 
At the core of our technique is the ability of encapsulate 
applications within virtual machines that can be migrated without 
application downtimes [15]. Most virtual machine software, such as Xen 
P.11 
[8] and VMWare [14] support live migration of VMs that involve 
extremely short downtimes ranging from tens of milliseconds to a 
second; details of Xen"s live migration techniques are discussed in 
[8]. As indicated earlier, these techniques assume that migration is 
being done on a LAN. VM migration has also been studied in the 
Shirako system [10] and for grid environments [17, 19]. 
Current virtual machine software support a suspend and resume 
feature that can be used to support WAN migration, but with 
downtimes [18, 12]. Recently live WAN migration using IP tunnels was 
demonstrated in [21], where an IP tunnel is set up from the source 
to destination server to transparently forward packets to and from 
the application; we advocate an alternate approach that assumes 
edge router support. 
In the context of storage, there exist numerous commercial 
products that perform replication, such as IBM Extended Remote Copy, 
HP Continuous Access XP, and EMC RepliStor. An excellent 
description of these and others, as well as a detailed taxonomy of the 
different approaches for replication can be found in [11]. The Ursa 
Minor system argues that no single fault model is optimal for all 
applications and proposed supporting data-type specific selections 
of fault models and encoding schemes for replication [1]. Recently, 
we proposed the notion of semantic-aware replication [13] where 
the system supports both synchronous and asynchronous 
replication concurrently and use signals from the file system to 
determine whether to replicate a particular write synchronously and 
asynchronously. 
In the context of network support, our work is related to the 
RouterFarm approach [2], which makes use of orchestrated 
network changes to realize near hitless maintenance on provider edge 
routers. In addition to being in a different application area, our 
approach differs from the RouterFarm work in two regards. First, 
we propose to have the required network changes be triggered by 
functionality outside of the network (as opposed to network 
management functions inside the network). Second, due to the stringent 
timing requirements of live migration, we expect that our approach 
would require new router functionality (as opposed to being 
realizable via the existing configuration interfaces). 
Finally, the recovery oriented computing (ROC) work 
emphasizes recovery from failures rather than failure avoidance [6]. In a 
similar spirit to ROC, we advocate using mechanisms from live VM 
migration to storage replication to support planned and unplanned 
outages in data centers (rather than full replication to mask such 
failures). 
5. CONCLUSION 
P.12 
A significant concern for Internet-based service providers is the 
continued operation and availability of services in the face of 
outages, whether planned or unplanned. In this paper we advocated 
a cooperative, context-aware approach to data center migration 
across WANs to deal with outages in a non-disruptive manner. We 
sought to achieve high availability of data center services in the 
face of both planned and incidental outages of data center 
facilities. We advocated using server virtualization technologies to 
enable the replication and migration of server functions. We proposed 
new network functions to enable server migration and replication 
across wide area networks (such as the Internet or a geographically 
distributed virtual private network), and finally showed the utility 
of intelligent and dynamic storage replication technology to ensure 
applications have access to data in the face of outages with very 
tight recovery point objectives. 
6. REFERENCES 
[1] M. Abd-El-Malek, W. V. Courtright II, C. Cranor, G. R. 
Ganger, J. Hendricks, A. J. Klosterman, M. Mesnier, 
M. Prasad, B. Salmon, R. R. Sambasivan, S. Sinnamohideen, 
J. D. Strunk, E. Thereska, M. Wachs, and J. J. Wylie. Ursa 
minor: versatile cluster-based storage. USENIX Conference 
on File and Storage Technologies, December 2005. 
[2] Mukesh Agrawal, Susan Bailey, Albert Greenberg, Jorge 
Pastor, Panagiotis Sebos, Srinivasan Seshan, Kobus van der 
Merwe, and Jennifer Yates. Routerfarm: Towards a dynamic, 
manageable network edge. SIGCOMM Workshop on 
Internet Network Management (INM), September 2006. 
[3] L. Alvisi. Understanding the Message Logging Paradigm for 
Masking Process Crashes. PhD thesis, Cornell, January 
1996. 
[4] L. Alvisi and K. Marzullo. Message logging: Pessimistic, 
optimistic, and causal. In Proceedings of the 15th 
International Conference on Distributed Computing Systems, 
pages 229-236. IEEE Computer Society, June 1995. 
266 
[5] Paul Barham, Boris Dragovic, Keir Fraser, Steven Hand, Tim 
Harris, Alex Ho, Rolf Neugebar, Ian Pratt, and Andrew 
Warfield. Xen and the art of virtualization. In the 
Proceedings of the ACM Symposium on Operating Systems 
Principles (SOSP), October 2003. 
[6] A. Brown and D. A. Patterson. Embracing failure: A case for 
recovery-oriented computing (roc). 2001 High Performance 
Transaction Processing Symposium, October 2001. 
[7] K. Brown, J. Katcher, R. Walters, and A. Watson. Snapmirror 
P.13 
and snaprestore: Advances in snapshot technology. Network 
Appliance Technical Report TR3043. 
www. ne t app. c om/t e c h_ l i br ar y/3043. ht ml . 
[8] C. Clark, K. Fraser, S. Hand, J. Hanse, E. Jul, C. Limpach, 
I. Pratt, and A. Warfiel. Live migration of virtual machines. 
In Proceedings of NSDI, May 2005. 
[9] Disaster Recovery Journal. Business continuity glossary. 
ht t p: 
//www. dr j . c om/gl os s ar y/dr j gl os s ar y. ht ml . 
[10] Laura Grit, David Irwin, , Aydan Yumerefendi, and Jeff 
Chase. Virtual machine hosting for networked clusters: 
Building the foundations for autonomic orchestration. In In 
the First International Workshop on Virtualization 
Technology in Distributed Computing (VTDC), November 
2006. 
[11] M. Ji, A. Veitch, and J. Wilkes. Seneca: Remote mirroring 
done write. USENIX 2003 Annual Technical Conference, 
June 2003. 
[12] M. Kozuch and M. Satyanarayanan. Internet suspend and 
resume. In Proceedings of the Fourth IEEE Workshop on 
Mobile Computing Systems and Applications, Calicoon, NY, 
June 2002. 
[13] Xiaotao Liu, Gal Niv, K. K. Ramakrishnan, Prashant Shenoy, 
and Jacobus Van der Merwe. The case for semantic aware 
remote replication. In Proc. 2nd International Workshop on 
Storage Security and Survivability (StorageSS 2006), 
Alexandria, VA, October 2006. 
[14] Michael Nelson, Beng-Hong Lim, and Greg Hutchins. Fast 
Transparent Migration for Virtual Machines. In USENIX 
Annual Technical Conference, 2005. 
[15] Mendel Rosenblum and Tal Garfinkel. Virtual machine 
monitors: Current technology and future trends. Computer, 
38(5):39-47, 2005. 
[16] C. Ruemmler and J. Wilkes. Unix disk access patterns. 
Proceedings of Winter 1993 USENIX, Jan 1993. 
[17] Paul Ruth, Junghwan Rhee, Dongyan Xu, Rick Kennell, and 
Sebastien Goasguen. Autonomic Live Adaptation of Virtual 
Computational Environments in a Multi-Domain 
Infrastructure. In IEEE International Conference on 
Autonomic Computing (ICAC), June 2006. 
[18] Constantine P. Sapuntzakis, Ramesh Chandra, Ben Pfaff, Jim 
Chow, Monica S. Lam, and Mendel Rosenblum. Optimizing 
the migration of virtual computers. In Proceedings of the 5th 
Symposium on Operating Systems Design and 
P.14 
Implementation, December 2002. 
[19] A. Sundararaj, A. Gupta, and P. Dinda. Increasing 
Application Performance in Virtual Environments through 
Run-time Inference and Adaptation. In Fourteenth 
International Symposium on High Performance Distributed 
Computing (HPDC), July 2005. 
[20] Symantec Corporation. Veritas Volume Replicator 
Administrator"s Guide. ht t p: 
//f t p. s uppor t . ve r i t as . c om/pub/s uppor t / 
pr oduc t s /Vol ume _ Re pl i c at or /2%83842. pdf , 
5.0 edition, 2006. 
[21] F. Travostino, P. Daspit, L. Gommans, C. Jog, C. de Laat, 
J. Mambretti, I. Monga, B. van Oudenaarde, S. Raghunath, 
and P. Wang. Seamless live migration of virtual machines 
over the man/wan. Elsevier Future Generations Computer 
Systems, 2006. 
[22] T. Wood, P. Shenoy, A. Venkataramani, and M. Yousif. 
Black-box and gray-box strategies for virtual machine 
migration. In Proceedings of the Usenix Symposium on 
Networked System Design and Implementation (NSDI), 
Cambridge, MA, April 2007. 
[23] A xen way to iscsi virtualization? 
2. Tập từ khóa xác định trước 
Tệp C-20.key thuộc kho ngữ liệu SemEval2010 gồm các từ khóa do tác giả 
xác định: 
internetbased, service, data, center, migration, wan, lan, virtual, server, 
storage, replication, synchronous, replication, asynchronous, replication, network, 
support, storage, voiceoverip, voip, database 
3. Tập từ khóa ứng cử 
Kết quả trích rút tập từ khóa ứng cử lấy ra từ các Cụm danh từ, Thực thể có 
tên và các cụm ba từ xuất hiện lặp lại nhiều lần: 
migration, replication, virtualization, server, live, outages, center, storage, 
network, virtual, recovery, technologies, data, ramakrishnan, internetbased, 
application, remote, cooperative, wan, availability, maintenance, services, 
applications, service, technology, aware, distributed, operation, contextaware, 
management, approach, continued, centers, context, unplanned, access, providers, 
P.15 
requirements, networks, significant, nondisruptive, business, systems, facilities, 
shenoy, prashant, intelligent, face, design, advances, objectives, disaster, manner, 
concern, categories, unanticipated, functions, wide, lan, high, minor, wans, 
propose, area, computercommunication, dynamic, internet, environment, 
mechanisms, physical, der, utility, disk, local, networking, functionality, failures, 
van, asynchronous, tight, ongoing, point, synchronous, jacobus, support, paper, 
disruptions, catastrophic, effective, current, servers, particular, critical, seamless, 
connectivity, administrator, tunnel, merwe, subsystems, environments, use, 
entertainment, operations, general, new, balance, labsresearch, outage, 
applicationservice, first, operating, second, available, address, users, essential, 
technical, location, allow, servicesapplications, realtime, building, events, 
continuous, instantaneous, introduction, milliseconds, sophisticated, reliability, 
semantics, descriptors, different, number, university, mirroring, machine, 
components, interactions, frequency, configuration, state, similar, private, system, 
techniques, abstract, downtime, amount, prior, concerns, underlying, connections, 
framework, work, changes, same, mission, ip, recent, robust, massachusetts, 
reasons, tool, contribution, coordinate, fashion, viewpoint, reachability, multiple, 
level, terms, redundancy, subject, example, failure, presents, challenges, 
performance, blocks, requirement, disruption, provider, section, http, software, 
platforms, extensions, write, networkbased, hotswappable, active, traffic, case, 
redundant, kk, such, today, throughput, robustness, running, shared, practices, 
knowledge, little, essence, appropriate, capabilities, businessusability, large, 
individual, needs, tens, consistent, es, addresses, unsolicited, cognizant, 
implication, necessity, complete, supplies, ability, load, router, nature, process, 
towards, databases, eg, primary, coordination, becomes, mobility, replicas, unique, 
experience, checkpoint, optimal, devices, efficiency, desirable, entirety, varies, 
attractive, difficulty, unresolved, anticipation, entire, san, switchover, 
observations, convergence, feat, hundreds, logic, constraints, applicationaware, 
way, main, simultaneous, means, unavailability, localprimary, reason, coalescing, 
challenge, other, considerations, actions, part, driver, initial, consistency, binding, 
latency, overhead, extension, whole, desire, weight, switch, latter, preparation, 
signal, snapshot, anticipated, actual, subset, divergence, space, operators, transfer, 
localized, enabler, alternative, processor, sessionbased, heavy, protocols, 
layertwo, discussion, power, phase, vpns, decades, focus, purposes, kind, routers, 
completion, loss, basic, preestablished, routed, event, site, signals, properties, 
P.16 
consequence, key, recoveryoriented, financial, edge, necessary, communication, 
impact, perspective, intermediate, virtualized, subsystem, order, convenient, time, 
previous, switches, routing, orchestrated, reply, approaches, checkpoints, cases, 
suspend, bulk, information, encapsulate, preferred, modern, efficient, solution, 
mac, priority, concert, nonlan, manual, years, clusterbased, sans, returns, flexible, 
need, scales, semanticaware, problem, secondary, occurrence, path, initiation, 
subsequent, builds, writes, direct, status, metrics, starts, forms, change, situation, 
exposition, addition, selections, scenarios, resume, gateway, issues, disks, 
voiceoverip, persistent, affordable, certain, delivery, exposure, copies, successful, 
block, detail, simplicity, ie, orchestration, snapshots, situations, description, 
incidental, final, emphasizes, sessions, normal, element, proceeds, checkpointing, 
effects, customer, periodic, route, commercial, stringent, interfaces, extreme, 
spectrum, instance, over, blades, principle, processing, foundations, avoidance, 
arp, scale, manageable, mechanism, datatype, alternate, standbyin, continuity, 
excellent, snaprestore, technique, versatile, mode, step, downtimes, price, ingress, 
minimum, fact, prompting, replica, impending, activity, session, taxonomy, 
advanced, memory, interval, products, numerous, parts, safe, control, view, clients, 
standby, function, detailed, figure, steps, regards, strategies, dormant, specific, 
deeper, computers, vs, proxy, monitors, fault, clusters, basis, right, machines, 
course, details, feature, points, schemes, quick, cold, voip, differs, timing, semantic, 
incremental, several, tunnels, graybox, patterns, full, hitless, ht, glossary, spirit, 
causal, international, notion, source, usenix, set, models, others, edition, thesis, 
topic, ar, single, note, hot, survivability, short, proceedings, trends, autonomic, 
implementation, pe, grid, warm, core, infrastructure, venkataramani, model, copy, 
computing, many, 15th, symposium, ml, ce, pages, manwan, computational, 
routerfarm, conference, y3043, multidomain, transaction, www, os, uppor, 
transparent, generations, networked, conclusion, ne, december, gl, replicator, art, 
pdf, adaptation, principles, logging, roc, pes, replistor, 5th, fourteenth, abdelmalek, 
garfinkel, workshop, dr, related, sebastien, ydr, references, computer, pl, pr, 
paradigm, crashes, sundararaj, inference, agrawal, appliance, travostino, 
oudenaarde, cornell, sigcomm, thereska, message, blackbox, prasad, seshan, rpos, 
ve, security, dragovic, vm, raghunath, society, file, calicoon, mesnier, benghong, 
june, goasguen, nelson, journal, runtime, winter, annual, cranor, sip, pastor, 
dongyan, barham, fourth, ieee, report, wilkes, tr3043, future, de, xp, sosp, seneca, 
volume, unix, wachs, gupta, icac, guide, yousif, wood, gal, mms, phd, dinda, proc, 
P.17 
lim, vtdc, wang, liu, jan, inm, rhee, andrew, hpdc, ii, va, sinnamohideen, niv, 
monica, xu, tal, ny, satyanarayanan, hp, kobus, re, keir, alexandria, srinivasan, 
rosenblum, hendricks, storagess, laat, marzullo, courtright, shirako, klosterman, 
sambasivan, panagiotis, vmware, yumerefendi, corporation, snapmirror, xen, 
greenberg, bailey, hutchins, constantine, sapuntzakis, patterson, albert, app, 
harris, jennifer, nsdi, steven, neugebar, warfield, ursa, watson, ibm, paul, susan, 
chandra, kozuch, brown, vms, ganger, salmon, strunk, katcher, walters, irwin, 
clark, mukesh, alvisi, limpach, warfiel, emc, fraser, alex, michael, ruemmler, 
mambretti, chase, junghwan, ramesh, wylie, xiaotao, jorge, sebos, yates, symantec, 
pratt, cambridge, boris, jeff, tim, kennell, ian, david, oduc, veitch, hand, ho, rick, 
mendel, hanse, veritas, gommans, laura, aydan, rolf, daspit, grit, acm, lam, ben, 
jim, pfaff, jul, greg, ji, monga, ruth, chow, vol, jog, ume, ma, rpo 
4. Kết quả xác định độ quan trọng của từ 
Kết quả dự đoán độ quan trọng của mỗi từ trong tập từ khóa ứng cử và sắp 
xếp giảm dần theo độ quan trọng (Các từ in đậm trong 10 kết quả đầu tiên nằm 
trong tập từ khóa xác định trước). 
Bảng P1. Kết quả dự đoán độ quan trọng của từ 
STT Từ khóa ứng cử Giá trị y STT Từ khóa ứng cử Giá trị y 
1 migration 0.840984 382 resume 0.0045808 
2 replication 0.678164 383 gateway 0.0045794 
3 virtualization 0.659803 384 issues 0.004577 
4 server 0.618527 385 disks 0.0045655 
5 live 0.61789 386 voiceoverip 0.0045531 
6 outages 0.610963 387 persistent 0.0045473 
7 center 0.606313 388 affordable 0.004527 
8 storage 0.542996 389 certain 0.0045241 
9 network 0.533428 390 delivery 0.0045182 
10 virtual 0.490329 391 exposure 0.0045027 
11 recovery 0.462664 392 copies 0.0044478 
12 technologies 0.439565 393 successful 0.0044444 
13 data 0.439148 394 block 0.0044061 
14 ramakrishnan 0.371115 395 detail 0.0043981 
15 internetbased 0.363683 396 simplicity 0.0043912 
16 application 0.32836 397 ie 0.0043785 
17 remote 0.311041 398 orchestration 0.0043709 
18 cooperative 0.298606 399 snapshots 0.00437 
19 wan 0.260697 400 situations 0.0043109 
20 availability 0.249949 401 description 0.0042526 
21 maintenance 0.246622 402 incidental 0.0042457 
P.18 
STT Từ khóa ứng cử Giá trị y STT Từ khóa ứng cử Giá trị y 
22 services 0.236617 403 final 0.0042226 
23 applications 0.21758 404 emphasizes 0.0042183 
24 service 0.208958 405 sessions 0.0041841 
25 technology 0.188583 406 normal 0.0041711 
26 aware 0.184637 407 element 0.0041647 
27 distributed 0.184114 408 proceeds 0.0041553 
28 operation 0.178693 409 checkpointing 0.0041511 
29 contextaware 0.174755 410 effects 0.0041382 
30 management 0.16334 411 customer 0.00413 
31 approach 0.160562 412 periodic 0.0040518 
32 continued 0.159632 413 route 0.0040377 
33 centers 0.151135 414 commercial 0.0040179 
34 context 0.149825 415 stringent 0.0039577 
35 unplanned 0.136594 416 interfaces 0.0039557 
36 access 0.135795 417 extreme 0.0039349 
37 providers 0.132818 418 spectrum 0.0039304 
38 requirements 0.132789 419 instance 0.0039263 
39 networks 0.131715 420 over 0.003924 
40 significant 0.129641 421 blades 0.0039219 
41 nondisruptive 0.127865 422 principle 0.0038941 
42 business 0.126391 423 processing 0.0038913 
43 systems 0.123377 424 foundations 0.0038743 
44 facilities 0.122692 425 avoidance 0.0038511 
45 shenoy 0.113144 426 arp 0.0038416 
46 prashant 0.109356 427 scale 0.0038255 
47 intelligent 0.107079 428 manageable 0.0038238 
48 face 0.10703 429 mechanism 0.0038141 
49 design 0.10576 430 datatype 0.0037792 
50 advances 0.105361 431 alternate 0.0037481 
51 objectives 0.103294 432 standbyin 0.0037446 
52 disaster 0.102727 433 continuity 0.0037397 
53 manner 0.102325 434 excellent 0.0037301 
54 concern 0.101271 435 snaprestore 0.0036539 
55 categories 0.098923 436 technique 0.0036314 
56 unanticipated 0.097049 437 versatile 0.0036119 
57 functions 0.097024 438 mode 0.0036081 
58 wide 0.094731 439 step 0.0036024 
59 lan 0.09354 440 downtimes 0.0035835 
60 high 0.091383 441 price 0.0035753 
61 minor 0.087343 442 ingress 0.0035746 
62 wans 0.081532 443 minimum 0.0035724 
63 propose 0.081235 444 fact 0.0035698 
P.19 
STT Từ khóa ứng cử Giá trị y STT Từ khóa ứng cử Giá trị y 
64 area 0.075401 445 prompting 0.0035426 
65 computercommunication 0.071124 446 replica 0.0035229 
66 dynamic 0.070823 447 impending 0.0035184 
67 internet 0.070188 448 activity 0.003507 
68 environment 0.07006 449 session 0.0034768 
69 mechanisms 0.068699 450 taxonomy 0.0034586 
70 physical 0.067211 451 advanced 0.0033992 
71 der 0.065018 452 memory 0.0033911 
72 utility 0.064928 453 interval 0.0033575 
73 disk 0.063486 454 products 0.0033335 
74 local 0.062574 455 numerous 0.0032772 
75 networking 0.062066 456 parts 0.0032592 
76 functionality 0.060316 457 safe 0.0032163 
77 failures 0.058608 458 control 0.0031983 
78 van 0.056861 459 view 0.0031885 
79 asynchronous 0.055627 460 clients 0.0031725 
80 tight 0.055299 461 standby 0.0031548 
81 ongoing 0.055233 462 function 0.0031449 
82 point 0.055094 463 detailed 0.0031171 
83 synchronous 0.055003 464 figure 0.0030952 
84 jacobus 0.054747 465 steps 0.0030943 
85 support 0.05382 466 regards 0.0030621 
86 paper 0.053773 467 strategies 0.0030361 
87 disruptions 0.053526 468 dormant 0.0029639 
88 catastrophic 0.052759 469 specific 0.0029358 
89 effective 0.052542 470 deeper 0.0029295 
90 current 0.051912 471 computers 0.0029269 
91 servers 0.049387 472 vs 0.0029162 
92 particular 0.049132 473 proxy 0.0029011 
93 critical 0.047906 474 monitors 0.0028663 
94 seamless 0.046336 475 fault 0.00285 
95 connectivity 0.044587 476 clusters 0.002828 
96 administrator 0.043383 477 basis 0.002826 
97 tunnel 0.04324 478 right 0.0028248 
98 merwe 0.041538 479 machines 0.0028083 
99 subsystems 0.041409 480 course 0.0028024 
100 environments 0.040257 481 details 0.0027973 
101 use 0.036387 482 feature 0.0027855 
102 entertainment 0.036088 483 points 0.0027779 
103 operations 0.03538 484 schemes 0.0027776 
104 general 0.034748 485 quick 0.0027557 
105 new 0.034575 486 cold 0.0027474 
P.20 
STT Từ khóa ứng cử Giá trị y STT Từ khóa ứng cử Giá trị y 
106 balance 0.034293 487 voip 0.0027359 
107 labsresearch 0.034123 488 differs 0.0027316 
108 outage 0.032523 489 timing 0.0026993 
109 applicationservice 0.0323 490 semantic 0.0026863 
110 first 0.032053 491 incremental 0.0026836 
111 operating 0.031873 492 several 0.002665 
112 second 0.031871 493 tunnels 0.0026577 
113 available 0.031776 494 graybox 0.0026532 
114 address 0.031016 495 patterns 0.0025977 
115 users 0.030828 496 full 0.0025534 
116 essential 0.030411 497 hitless 0.0025488 
117 technical 0.030131 498 ht 0.0025404 
118 location 0.029677 499 glossary 0.0025208 
119 allow 0.029313 500 spirit 0.002509 
120 servicesapplications 0.029243 501 causal 0.0024637 
121 realtime 0.029228 502 international 0.002438 
122 building 0.029168 503 notion 0.0024247 
123 events 0.028764 504 source 0.0024038 
124 continuous 0.028121 505 usenix 0.0023922 
125 instantaneous 0.027937 506 set 0.0023618 
126 introduction 0.027788 507 models 0.0023371 
127 milliseconds 0.027768 508 others 0.0023345 
128 sophisticated 0.027693 509 edition 0.0022988 
129 reliability 0.027484 510 thesis 0.0022935 
130 semantics 0.02676 511 topic 0.0022763 
131 descriptors 0.025592 512 ar 0.0022534 
132 different 0.025459 513 single 0.0022509 
133 number 0.025158 514 note 0.0022465 
134 university 0.024702 515 hot 0.0022409 
135 mirroring 0.024501 516 survivability 0.0022022 
136 machine 0.024371 517 short 0.0021964 
137 components 0.024161 518 proceedings 0.0021695 
138 interactions 0.024085 519 trends 0.0021362 
139 frequency 0.023768 520 autonomic 0.0021327 
140 configuration 0.023489 521 implementation 0.002083 
141 state 0.023362 522 pe 0.0020795 
142 similar 0.023309 523 grid 0.0020748 
143 private 0.023003 524 warm 0.002055 
144 system 0.022504 525 core 0.0020492 
145 techniques 0.021737 526 infrastructure 0.0020421 
146 abstract 0.021286 527 venkataramani 0.0020299 
147 downtime 0.021142 528 model 0.0020187 
P.21 
STT Từ khóa ứng cử Giá trị y STT Từ khóa ứng cử Giá trị y 
148 amount 0.020983 529 copy 0.00194 
149 prior 0.020438 530 computing 0.0018993 
150 concerns 0.020368 531 many 0.0018784 
151 underlying 0.020181 532 15th 0.0018723 
152 connections 0.020169 533 symposium 0.0018621 
153 framework 0.019967 534 ml 0.0018518 
154 work 0.019943 535 ce 0.0018451 
155 changes 0.019746 536 pages 0.0017967 
156 same 0.019699 537 manwan 0.0017894 
157 mission 0.019429 538 computational 0.0017754 
158 ip 0.019258 539 routerfarm 0.0017595 
159 recent 0.018481 540 conference 0.0017545 
160 robust 0.018437 541 y3043 0.00175 
161 massachusetts 0.018398 542 multidomain 0.0017365 
162 reasons 0.018173 543 transaction 0.0017094 
163 tool 0.018069 544 www 0.0016864 
164 contribution 0.017874 545 os 0.0016773 
165 coordinate 0.017508 546 uppor 0.0016162 
166 fashion 0.01731 547 transparent 0.0016075 
167 viewpoint 0.017007 548 generations 0.0015975 
168 reachability 0.016715 549 networked 0.0015931 
169 multiple 0.016661 550 conclusion 0.0015557 
170 level 0.016497 551 ne 0.0015469 
171 terms 0.016478 552 december 0.0015402 
172 redundancy 0.016408 553 gl 0.0015227 
173 subject 0.015692 554 replicator 0.0015208 
174 example 0.015675 555 art 0.0015121 
175 failure 0.015571 556 pdf 0.0015027 
176 presents 0.015501 557 adaptation 0.001501 
177 challenges 0.015161 558 principles 0.0014718 
178 performance 0.014878 559 logging 0.0014657 
179 blocks 0.014839 560 roc 0.0014649 
180 requirement 0.014562 561 pes 0.0014648 
181 disruption 0.014552 562 replistor 0.0014273 
182 provider 0.014527 563 5th 0.0014269 
183 section 0.014516 564 fourteenth 0.0014263 
184 http 0.014488 565 abdelmalek 0.0014201 
185 software 0.01448 566 garfinkel 0.0014167 
186 platforms 0.014328 567 workshop 0.0014111 
187 extensions 0.014232 568 dr 0.0013682 
188 write 0.014197 569 related 0.0013521 
189 networkbased 0.014171 570 sebastien 0.0013487 
P.22 
STT Từ khóa ứng cử Giá trị y STT Từ khóa ứng cử Giá trị y 
190 hotswappable 0.013964 571 ydr 0.0013394 
191 active 0.01382 572 references 0.0013047 
192 traffic 0.013746 573 computer 0.0012926 
193 case 0.013394 574 pl 0.0012628 
194 redundant 0.013225 575 pr 0.0012412 
195 kk 0.012854 576 paradigm 0.0012372 
196 such 0.01283 577 crashes 0.0012056 
197 today 0.01281 578 sundararaj 0.001189 
198 throughput 0.012766 579 inference 0.001188 
199 robustness 0.012753 580 agrawal 0.0011803 
200 running 0.012621 581 appliance 0.001175 
201 shared 0.012589 582 travostino 0.0011587 
202 practices 0.012569 583 oudenaarde 0.0011525 
203 knowledge 0.012438 584 cornell 0.0011472 
204 little 0.012241 585 sigcomm 0.00111 
205 essence 0.012232 586 thereska 0.0011092 
206 appropriate 0.012105 587 message 0.0011037 
207 capabilities 0.012089 588 blackbox 0.0011029 
208 businessusability 0.011849 589 prasad 0.0010992 
209 large 0.011811 590 seshan 0.0010836 
210 individual 0.011753 591 rpos 0.0010718 
211 needs 0.011703 592 ve 0.0010707 
212 tens 0.01148 593 security 0.0010704 
213 consistent 0.011461 594 dragovic 0.0010703 
214 es 0.011417 595 vm 0.001057 
215 addresses 0.01139 596 raghunath 0.0010229 
216 unsolicited 0.011261 597 society 0.001008 
217 cognizant 0.011184 598 file 0.0010009 
218 implication 0.011171 599 calicoon 0.0009904 
219 necessity 0.011126 600 mesnier 0.0009895 
220 complete 0.011042 601 benghong 0.000976 
221 supplies 0.010953 602 june 0.0009648 
222 ability 0.010771 603 goasguen 0.0009563 
223 load 0.010654 604 nelson 0.0009411 
224 router 0.010647 605 journal 0.0009258 
225 nature 0.010615 606 runtime 0.000911 
226 process 0.010454 607 winter 0.0009097 
227 towards 0.009911 608 annual 0.0008994 
228 databases 0.009863 609 cranor 0.0008817 
229 eg 0.009755 610 sip 0.0008663 
230 primary 0.009714 611 pastor 0.0008657 
231 coordination 0.009567 612 dongyan 0.000851 
P.23 
STT Từ khóa ứng cử Giá trị y STT Từ khóa ứng cử Giá trị y 
232 becomes 0.009501 613 barham 0.0008447 
233 mobility 0.009493 614 fourth 0.0008392 
234 replicas 0.009441 615 ieee 0.0008385 
235 unique 0.009179 616 report 0.000827 
236 experience 0.009144 617 wilkes 0.0008237 
237 checkpoint 0.009134 618 tr3043 0.0008223 
238 optimal 0.009079 619 future 0.0008174 
239 devices 0.009068 620 de 0.0008126 
240 efficiency 0.009034 621 xp 0.0007982 
241 desirable 0.009008 622 sosp 0.0007951 
242 entirety 0.008996 623 seneca 0.0007899 
243 varies 0.008913 624 volume 0.0007782 
244 attractive 0.008811 625 unix 0.0007777 
245 difficulty 0.008741 626 wachs 0.0007764 
246 unresolved 0.008637 627 gupta 0.0007693 
247 anticipation 0.008615 628 icac 0.0007677 
248 entire 0.008518 629 guide 0.0007181 
249 san 0.008446 630 yousif 0.0007099 
250 switchover 0.008434 631 wood 0.0006844 
251 observations 0.008378 632 gal 0.0006833 
252 convergence 0.008318 633 mms 0.0006742 
253 feat 0.007965 634 phd 0.0006715 
254 hundreds 0.007892 635 dinda 0.0006558 
255 logic 0.00782 636 proc 0.0006469 
256 constraints 0.007783 637 lim 0.0006428 
257 applicationaware 0.007772 638 vtdc 0.0006254 
258 way 0.00776 639 wang 0.0006206 
259 main 0.007727 640 liu 0.000605 
260 simultaneous 0.007708 641 jan 0.0006041 
261 means 0.007664 642 inm 0.000602 
262 unavailability 0.007623 643 rhee 0.0005962 
263 localprimary 0.007621 644 andrew 0.0005909 
264 reason 0.007501 645 hpdc 0.0005783 
265 coalescing 0.00743 646 ii 0.0005641 
266 challenge 0.007417 647 va 0.0005618 
267 other 0.007394 648 sinnamohideen 0.0005494 
268 considerations 0.00738 649 niv 0.0005456 
269 actions 0.007338 650 monica 0.0005414 
270 part 0.007235 651 xu 0.0005404 
271 driver 0.007214 652 tal 0.0005352 
272 initial 0.007191 653 ny 0.0005255 
273 consistency 0.007187 654 satyanarayanan 0.0004802 
P.24 
STT Từ khóa ứng cử Giá trị y STT Từ khóa ứng cử Giá trị y 
274 binding 0.007176 655 hp 0.0004593 
275 latency 0.007136 656 kobus 0.0004488 
276 overhead 0.007092 657 re 0.0004384 
277 extension 0.007062 658 keir 0.0003816 
278 whole 0.007004 659 alexandria 0.0003689 
279 desire 0.006951 660 srinivasan 0.0003569 
280 weight 0.006927 661 rosenblum 0.0003449 
281 switch 0.006921 662 hendricks 0.0003437 
282 latter 0.006908 663 storagess 0.0003361 
283 preparation 0.006881 664 laat 0.0002986 
284 signal 0.006875 665 marzullo 0.0002801 
285 snapshot 0.006874 666 courtright 0.0002798 
286 anticipated 0.006833 667 shirako 0.0002795 
287 actual 0.006811 668 klosterman 0.0002775 
288 subset 0.006804 669 sambasivan 0.0002757 
289 divergence 0.006801 670 panagiotis 0.0002679 
290 space 0.006789 671 vmware 0.0002562 
291 operators 0.006741 672 yumerefendi 0.0002521 
292 transfer 0.006687 673 corporation 0.0002517 
293 localized 0.006684 674 snapmirror 0.0002388 
294 enabler 0.006667 675 xen 0.0002384 
295 alternative 0.006654 676 greenberg 0.0002366 
296 processor 0.00665 677 bailey 0.0002238 
297 sessionbased 0.006645 678 hutchins 0.000217 
298 heavy 0.00664 679 constantine 0.0002148 
299 protocols 0.006616 680 sapuntzakis 0.0002146 
300 layertwo 0.006604 681 patterson 0.000214 
301 discussion 0.006598 682 albert 0.0002124 
302 power 0.006525 683 app 0.0002115 
303 phase 0.006509 684 harris 0.0002076 
304 vpns 0.006507 685 jennifer 0.0002048 
305 decades 0.006464 686 nsdi 0.0001985 
306 focus 0.006448 687 steven 0.0001931 
307 purposes 0.006413 688 neugebar 0.0001924 
308 kind 0.006337 689 warfield 0.0001916 
309 routers 0.006307 690 ursa 0.0001906 
310 completion 0.006291 691 watson 0.0001886 
311 loss 0.006257 692 ibm 0.0001831 
312 basic 0.00625 693 paul 0.0001793 
313 preestablished 0.006211 694 susan 0.0001779 
314 routed 0.006206 695 chandra 0.0001771 
315 event 0.006204 696 kozuch 0.0001758 
P.25 
STT Từ khóa ứng cử Giá trị y STT Từ khóa ứng cử Giá trị y 
316 site 0.006195 697 brown 0.0001757 
317 signals 0.006155 698 vms 0.0001715 
318 properties 0.006106 699 ganger 0.000165 
319 consequence 0.006096 700 salmon 0.0001633 
320 key 0.006056 701 strunk 0.0001625 
321 recoveryoriented 0.006006 702 katcher 0.0001613 
322 financial 0.005989 703 walters 0.000161 
323 edge 0.00598 704 irwin 0.0001609 
324 necessary 0.00597 705 clark 0.0001604 
325 communication 0.005953 706 mukesh 0.0001593 
326 impact 0.005874 707 alvisi 0.0001563 
327 perspective 0.005872 708 limpach 0.0001553 
328 intermediate 0.005856 709 warfiel 0.0001546 
329 virtualized 0.005826 710 emc 0.0001529 
330 subsystem 0.00579 711 fraser 0.0001527 
331 order 0.005773 712 alex 0.0001513 
332 convenient 0.005768 713 michael 0.0001502 
333 time 0.005661 714 ruemmler 0.0001499 
334 previous 0.005661 715 mambretti 0.0001485 
335 switches 0.005652 716 chase 0.0001482 
336 routing 0.005564 717 junghwan 0.0001478 
337 orchestrated 0.00556 718 ramesh 0.0001438 
338 reply 0.00556 719 wylie 0.0001412 
339 approaches 0.005505 720 xiaotao 0.0001395 
340 checkpoints 0.005482 721 jorge 0.0001386 
341 cases 0.005471 722 sebos 0.0001385 
342 suspend 0.005456 723 yates 0.0001374 
343 bulk 0.005414 724 symantec 0.0001365 
344 information 0.005397 725 pratt 0.0001327 
345 encapsulate 0.005384 726 cambridge 0.0001317 
346 preferred 0.005355 727 boris 0.0001299 
347 modern 0.005348 728 jeff 0.0001295 
348 efficient 0.005325 729 tim 0.0001289 
349 solution 0.005286 730 kennell 0.0001287 
350 mac 0.005283 731 ian 0.0001283 
351 priority 0.005279 732 david 0.0001278 
352 concert 0.005264 733 oduc 0.0001276 
353 nonlan 0.005264 734 veitch 0.0001266 
354 manual 0.005255 735 hand 0.000125 
355 years 0.005195 736 ho 0.0001217 
356 clusterbased 0.005181 737 rick 0.0001214 
357 sans 0.005139 738 mendel 0.0001195 
P.26 
STT Từ khóa ứng cử Giá trị y STT Từ khóa ứng cử Giá trị y 
358 returns 0.005111 739 hanse 0.0001193 
359 flexible 0.005098 740 veritas 0.0001192 
360 need 0.005092 741 gommans 0.0001144 
361 scales 0.005086 742 laura 0.0001142 
362 semanticaware 0.005044 743 aydan 0.000114 
363 problem 0.004982 744 rolf 0.0001129 
364 secondary 0.004961 745 daspit 0.0001003 
365 occurrence 0.004953 746 grit 0.0001001 
366 path 0.004924 747 acm 9.91E-05 
367 initiation 0.004921 748 lam 9.84E-05 
368 subsequent 0.004894 749 ben 9.84E-05 
369 builds 0.004858 750 jim 9.81E-05 
370 writes 0.004834 751 pfaff 9.61E-05 
371 direct 0.004823 752 jul 9.06E-05 
372 status 0.004808 753 greg 9.00E-05 
373 metrics 0.004762 754 ji 8.76E-05 
374 starts 0.004734 755 monga 8.70E-05 
375 forms 0.004701 756 ruth 8.66E-05 
376 change 0.004692 757 chow 8.39E-05 
377 situation 0.004689 758 vol 6.78E-05 
378 exposition 0.004662 759 jog 6.70E-05 
379 addition 0.00465 760 ume 5.71E-05 
380 selections 0.004642 761 ma 5.24E-05 
381 scenarios 0.004629 762 rpo 1.36E-05