久久大香国产成人av,亚洲综合另类小说专区,中文亚洲成a人片在线观看

第一篇：高性能高可靠的計算機存儲系統架構設計研究中英文摘要

論文中英文摘要

作者姓名：孫宏濱

論文題目：高性能高可靠的計算機存儲系統架構設計研究

中文摘要

過去幾十年里，集成電路工藝尺寸縮小已經為電路設計帶來巨大的性能改善。按照摩爾定律的預測，處理器的速度每18個月就會翻一番，而存儲器的速度每年僅僅增長7%。結果，處理器與存儲器之間的速度鴻溝每21一個月就會翻一番，這被稱為“存儲墻”問題。包括高速緩存與主存在內的計算機存儲系統架構設計是解決處理器與存儲器之間性能鴻溝的主要方法。隨著CMOS工藝尺寸的不斷縮小，計算存儲系統的可靠性和性能都受到了嚴重威脅。日益升高的硬件缺陷與軟錯誤發生率，使高速緩存的良率和可靠性不斷惡化。同時，逐漸成熟的三維集成工藝技術為解決“存儲墻”問題也提供了更好的技術手段。設計高性能高可靠性的存儲結構已經成為計算機系統的關鍵技術。本文在減輕處理器-存儲系統性能鴻溝和改進存儲系統可靠性方面做了以下幾項重要研究工作：

首先，本文提出了一種高效的內建修復分析方法來提高嵌入式存儲器的良率。當前，嵌入式存儲器已成為處理器和系統集成芯片的核心部件，決定著整個芯片的良率。嵌入式存儲器很難像傳統存儲器一樣通過外部測試設備來檢測缺陷并分析修復策略，而需要內建自測試和內建修復分析電路來完成存儲器的測試與修復。以前內建修復分析器的研究都假設硬件缺陷只能夠被片上的冗余行或列修復，但事實上大多數的嵌入式存儲器都集成有糾錯碼電路來防止存儲器中的軟錯誤。本文的方法通過適當的利用片上已有的糾錯碼電路，開發一種修復率高且硬件開銷小的存儲器內建修復分析器。該方法使用非常簡單的塊缺陷優先的修復分析方法來降低硬件開銷，使用片上糾錯碼資源修正殘余的硬件缺陷，并最終使用適當的方法恢復軟錯誤的容忍力。本文提出的方法可有效降低內建修復分析算法的硬件開銷，同時能夠保持相同或者更高的硬件缺陷修復率以及容軟錯誤能力。

其次，本文提出了一種利用多比特糾錯碼來提高二級緩存容錯能力的方法。靜態隨機存儲器的錯誤包括硬件缺陷和粒子射線引起的軟錯誤兩種。在傳統存儲器的設計中，硬件缺陷一般由片上冗余的行列資源來修補，而軟錯誤由單比特糾錯碼來保護。集成電路工藝尺寸的不斷縮小已使可靠高密度的高速緩存設計越來越復雜，傳統的可靠性設計方法將無法滿足良率要求。雖然多比特糾錯碼能顯著提高高速緩存的可靠性，但由于多比特糾錯碼會明顯的降低計算機性能并增加面積開銷，通常被認為無法應用于高速緩存設計。本文研究了在二級緩

存中使用多比特糾錯碼在防止軟錯誤的同時容忍大量隨機硬件缺陷的可行性與可能的可靠性改善問題。我們的研究并不著眼于開發新的多比特糾錯碼，而專注于如何在二級緩存中利用架構設計有效利用多比特糾錯碼。由于那些有一個或多個缺陷位的緩存數據塊可以在存儲器測試的時候檢查出來，我們本能的可以采用一種更好的方式：只使用多比特糾錯碼保護那些需要的緩存數據塊，而不是普遍的保護所有數據塊。這種選擇性保護方案可以使多比特解碼較長延遲對處理器性能的影響大大降低，而需要存儲的糾錯碼冗余位也會相應下降。這種選擇性使用多比特糾錯碼的方案需要基于內容尋址存儲器的實時查找表來判斷當前訪問的數據塊是否被多比特糾錯碼保護。但是，盡管直接由內容尋址存儲器來實現選擇性使用多比特糾錯碼看似簡單，其功能無法滿足高密度的缺陷率條件：（1）隨著隨機缺陷率的增加，大部分的緩存訪問都將引起多比特糾錯碼解碼操作，因而會降低整個系統的性能。（2）內容尋址存儲器比普通SRAM的功耗要大得多，因此不斷查找缺陷表將導致過大的能量消耗。本文進一步利用高速緩存訪問的局部性原理，通過以幾個特殊功能的小緩存來輔助高速緩存的方法，巧妙避免了大多數的多比特容錯碼解碼延遲，極大降低了多比特容錯碼的面積開銷。此外，本文提出的二級緩存設計可在提高可靠性的同時保持相同的容軟錯誤能力。

三維集成已經成為處理器設計領域一項前景廣闊的技術，為解決高性能處理器的存儲墻問題提供了可行的解決方案。面向三維集成的工藝技術，本文開發了一種采用粗顆粒度分區策略的三維動態隨機存儲器結構。與之前的研究相比，該結構在不引起過孔加工限制的情況下充分利用三維集成的優勢，在所有硅基層合理共享全局的地址和數據總線，從而只需要的很少量的硅層間過孔和相對較低的過孔加工尺寸要求。本文進一步提出使用該存儲結構為多核計算系統設計了一種異構三維動態隨機存儲器結構，利用三維動態隨機存儲器同時設計計算機系統的二級緩存和計算機主存。為提高動態隨機存儲器二級緩存的性能，本文提出采用可變子單元大小和多閾值電路等技術降低訪問延遲。與通常動態存儲器性能遠低于靜態存儲器的印象相反，我們使用改進的存儲器建模工具證明三維動態隨機存儲器二級緩存設計可實現與靜態存儲器相同的訪問速度，甚至更快。通過應用以上技術，本文提出的三維動態存儲結構能有效的減小訪問延遲，進而改進三維集成計算系統的整體性能。

對于未來的三維集成微處理器，由于硅片垂直疊放相互遮擋，不同的硅片層受射線粒子引起軟錯誤的程度也不同。研究表明，外層硅片可以為內層硅片遮擋粒子射線，這一現象被稱為屏蔽效應。受三維微處理器結構的屏蔽效應啟發，本文提出一種容軟錯誤的三維高速緩存結構。由于外硅片層為內層電路遮擋Alpha射線，內層電路可能天然的不受Alpha射線的影響而具有容軟錯誤的能力，其容錯電路可以省去。因而，訪問不受軟錯誤威脅的內層硅片上的緩存數據的延遲與能耗比其他硅片層要小得多。進一步，我們開發了多種技術來使外層硅片上的數據動態搬移到內層，從而使高速緩存的數據訪問集中于不受軟錯誤影響的硅片層。

對于一級緩存，我們提出一種內層直接映射緩存結構來盡量增加內層數據的訪問，同時避免訪問不必要數據所引起的功耗損失；對于低級緩存，我們提出解除Tag與Data塊之間的直接對應關系，來彌補低級緩存相對低的局部訪問特性。該三維高速緩存結構可顯著提高處理器的性能和能耗效率。

最后，本文分析了未來三維集成的視頻處理電路的性能與功耗改善。隨著視頻處理算法的復雜度不斷提高，存儲帶寬已成為高級視頻編碼與顯示處理系統的主要瓶頸，這一帶寬不足狀況還會進一步惡化。由于三維邏輯-存儲集成將會提供大量的垂直互聯，因而將對需要大存儲容量與高帶寬的視頻處理應用產生重要影響。為了量化估計三維集成視頻處理系統的性能和功耗改善，本文進一步開發了一款可無縫集成于多媒體多核處理系統的三維集成的運動估計加速器。我們提出一種三維集成的動態存儲器結構和圖像幀存儲策略，并設計一種全并行的二維運動估計加速方法來利用三維集成動態存儲器降低系統功耗。該方法可無縫的支持各種運動估計視頻處理算法，包括H.264/AVC編碼標準中的變塊運動估計。我們以多幀運動估計為例，使用硬件設計和動態存儲器建模工具證明了該運動估計加速器的能耗效率。

本文結合存儲系統的設計需求與最新的集成電路工藝進展，針對計算機存儲系統設計的多個關鍵問題提出了系統的解決方案。本文提出的所有架構設計與方法研究都使用系統級和電路級的仿真工具完成了有效性的驗證。其中存儲器電路級設計主要使用硬件電路設計仿真與存儲器建模工具來完成對電路參數的預估。計算機系統級性能則分別采用了單核和多核處理器系統仿真器對本文提出架構的處理能力和功耗進行了完整的評估。

關鍵詞：存儲結構；可靠性；容錯技術；三維集成Architecture Design of High Performance and Reliable

Computer Memory Systems

Sun Hongbin

ABSTRACT

Scaling of CMOS devices has provided remarkable improvement in performance of integrated circuits in the past few decades.Moore’s law tells that processor speed doubles every 18 months because of technology scaling.The memory speed, however, increased only by about 7% per year.As a consequence, the processor-memory speed gap doubles every 21 months, which is called as “memory wall”.To bridge the processor-memory gap, computer memory hierarchy including both cache and main memory has played a key role to alleviate the affect of the memory slowness.As CMOS technology continues to scale down, how to design a high performance and reliable memory hierarchy in computer system has become a grand challenge.The yield and reliability of cache memory are threatened by both hard faults and soft errors.In the meanwhile, the emerging three-dimensional(3D)integration technology provides the better approaches to address the “memory wall”.As a consequence, to design the high performance and reliable memory architecture becomes the critical technique in computer systems.This thesis makes several important contributions to mitigate the processor-memory gap and improve the reliability of memory hierarchy.First, we present a cost-efficient built-in repair analysis(BIRA)approach to improve the yield of embedded memory.As embedded memories become more and more dominant in system-on-chip(SoC)design, it is very crucial to achieve sufficiently high embedded memory yield.Due to the increasing number of diversified embedded memories on chip, external memory testing and redundancy repair analysis become inadequate and the use of BIRA becomes more attractive and even indispensable.All the prior work on BIRA assumed that defects can only be repaired by redundant rows or columns.Motivated by the fact that most embedded memories use error correction code(ECC)to uniformly protect all the memory words from soft errors, we propose to appropriately leverage the existing on-chip error correction circuit to enable very low-cost built-in repair analysis implementations while maintaining the same and even higher defect repair rate and the same soft error tolerance.Second, we propose a defect tolerant L2 cache memory by using multi-bit error correction codes.Potential faults in SRAM can be parametric/catastrophic defects or transient soft errors, both of which are becoming increasingly serious as the technology feature size shrinks.In conventional design practice, memory defects are handled by using spare(or redundant)rows, columns, and/or words to repair(i.e., replace)the defective ones, while soft errors are compensated by single-error-correcting error-correcting codes.As the technology continues to scale down, traditional repair-only defect tolerance strategy may no longer be sufficient to ensure high enough yield.Although strong multi-bit ECCs appear to be a natural choice to improve the reliability, it is commonly believed that multi-bit ECCs may incur prohibitive performance degradation and

silicon/energy cost for cache memory.This work concerns the feasibility and potential of using multi-bit ECC to tolerate a large amount of random defects in L2 cache without the loss of soft error tolerance.This work does not intend to develop any new multi-bit ECC, instead we focus on how to enable the effective use of multi-bit ECC in L2 cache.Since cache blocks consisting of one or more defective cells can be identified during memory testing, it is very intuitive that a better choice is to apply multi-bit ECC to the cache blocks only whenever necessary instead of uniformly protecting all the cache blocks using multi-bit ECC.Such a simple selective use of multi-bit ECC may largely alleviate the impact on the overall cache performance and area overhead.Intuitively, implementation of the selective use of multi-bit ECC must perform content addressable memory(CAM)based run-time table look-up to check whether or not the cache block being accessed should be protected by the multi-bit ECC.However, although a direct realization of selective use of multi-bit ECC accompanied by CAM is quite straightforward, its effectiveness may be inadequate in the presence of a relatively high random defect density for two main reasons:(i)As the random defect density increases, a larger percentage of cache read operations may invoke multi-bit ECC decoding, which will directly degrade the overall system performance such as IPC;(ii)Since the energy consumption of CAM is greatly larger than that of normal SRAM and the size of CAM will increase as the random defect density increases, a significant energy consumption overhead will be incurred.However, by supplementing a conventional L2 cache core with several special-purpose small caches/buffers, we can greatly reduce the silicon cost and minimize the probability of explicitly executing multi-bit ECC decoding on the cache read critical path.Moreover, the proposed L2 cache design can maintain the same level of soft error tolerance in the meanwhile.Three dimensional(3D)integration is emerging as an attractive technology for microprocessor design, and provides a viable and promising option to address the well-known memory wall problem in high performance computing systems.Based on 3D integration technology, we develop a 3D DRAM design applying coarse-grained 3D partitioning strategy, which introduces a much less number of through-silicon vias(TSVs)and less stringent constraints on TSV pitch compared with prior work.The key is to share the global routing of memory address and data bus among all the DRAM dies through coarse-grained TSVs with the small pitch.We also investigate the potential of using 3D DRAM to implement both L2 cache and main memory in 3D multi-core processor-DRAM integrated computing systems.In contrast to the common impression that DRAM is much slower than SRAM, using the modified CACTI tool, we show that 3D DRAM L2 cache may achieve comparable and even faster speed than 2D SRAM L2 cache.By employing these design techniques, the proposed 3D DRAM design can effectively reduce the access latency, hence improve the overall 3D integrated computing system performance.3D microarchitecture provides another interesting advantage that circuits on different dies may exhibit the heterogeneous soft error vulnerabilities due to the shielding effect of die stacking.Recent research characterized microarchitecture soft error vulnerabilities across the 3D-stacked chip dies and concluded that the inner-dies can be shielded by the outer-dies from particle strikes.Motivated by the shielding effect in 3D microarchitecture, we propose a soft error resilient 3D cache architecture.The underlying idea is to eliminate the error correction circuits on the soft error invulnerable dies(SID), being aware that the inner-dies may be inherently soft error invulnerable since they are implicitly protected by the outer dies from particle strikes.As a result, data access on the soft error invulnerable dies introduces a much less latency and energy dissipation.Moreover, we develop techniques to enable the dynamic data block movement in cache memory, which can effectively maximize the data access on the soft error invulnerable dies.For L1 cache, we propose a SID direct mapping cache architecture to maximize the accesses on the SIDs and avoid the energy

waste on the useless data accesses in the meanwhile.For low level caches, we propose to decouple the tag entry from data block to compensate the relatively poor locality characteristics in low level caches.The overall cache hierarchy can achieve a significant performance and energy efficiency improvement.Finally, we analyze the potential benefits of 3D-stacked video processing circuits in terms of performance and energy consumption.Currently, bandwidth has become the primary bottleneck of the advanced video coding and display processing systems.The bandwidth deficiency in video processor may be even worse when people try to use more sophisticated algorithms to further improve the performance.We show that 3D integration will have a significant impact on memory intensive video processing, given the massive logic-memory interconnect bandwidth enabled by die stacking.To quantitatively demonstrate the attractive advantages, we further develop a 3D integrated motion estimation accelerator that can be integrated in multimedia processing multicore processor.We develop a 3D integrated DRAM memory organization and image frame storage strategy geared to motion estimation, and apply a fully parallel 2D motion estimation computation engine to take advantage of the 3D stacked DRAM to minimize the energy consumption.Our proposed approach seamlessly supports various motion estimation algorithms and variable block-size motion estimation(VBSME)that has been adopted in H.264/AVC.We present a case study on multi-frame motion estimation by applying the proposed accelerator design solution based on DRAM performance modeling and ASIC design to demonstrate its energy efficiency.By focusing on the design requirement of memory hierarchy and new advance in semiconductor technology, this thesis proposes several efficient architecture solutions to address the critical problems in computer memory systems.All the architectures and approaches proposed in this thesis are extensively demonstrated by using the system-level and/or circuit-level simulation tools.The electrical properties of memory circuit design are mainly evaluated and estimated by leveraging circuit design and simulation tools.While the unicore and multicore microprocessor simulators are used to give an extensive evaluation to the computational capability and energy consumption of the proposed architectures.Key words:Memory hierarchy;Reliability;Defect tolerance;3D integration

久久99精品久久久久久琪琪,久久人人爽人人爽人人片亞洲,熟妇人妻无码中文字幕,亚洲精品无码久久久久久久

高性能高可靠的計算機存儲系統架構設計研究中英文摘要

第一篇：高性能高可靠的計算機存儲系統架構設計研究中英文摘要

相關范文推薦