<?xml version="1.0" encoding="utf-8"?>
<?xml-stylesheet href="/xsl/rss.xsl" type="text/xsl" media="screen"?>
<rss version="2.0" 
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:ppp="http://blog.sohu.com/rss/module/ppp/"
	>

	<channel>
		<title>龙居</title>
		<link>http://longrenrex.blog.sohu.com/</link>
		<description><![CDATA[【嘟嘟龙】：有错就要认，挨打要立正！命苦不能怪政府，点儿背不能怨社会！]]></description>
		<pubDate>Fri, 8 Aug 2008 10:41:32 +0800</pubDate>
		<generator>搜狐博客</generator>
		<ppp:ebi>7373a63792</ppp:ebi>
		<image>
			<title>http://blog.sohu.com</title>
			<url>http://js.pp.sohu.com/ppp/blog/images/common/logo_150_60.gif</url>
			<link>http://blog.sohu.com/</link>
			<width>100</width>
			<height>43</height>
			<description>搜狐博客</description>
		</image>
		<item>
			<title>Central processing unit (4)[转]</title>
			<link>http://longrenrex.blog.sohu.com/96678282.html</link>
			<comments>http://longrenrex.blog.sohu.com/96678282.html#comment</comments>
			<dc:creator>龙居</dc:creator>
			<pubDate>Thu, 7 Aug 2008 20:14:51 +0800</pubDate>
			<category>记事本</category>
			<guid>http://longrenrex.blog.sohu.com/96678282.html</guid>
			<description><![CDATA[<h3><span>Parallelism</span></h3>
<dl>
<dd>
<div><i>Main article: <a title="Parallel computing" href="http://en.wikipedia.org/wiki/Parallel_computing">Parallel computing</a></i></div></dd></dl>
<div>
<div style="WIDTH: 302px"><a title="Model of a subscalar CPU. Notice that it takes fifteen cycles to complete three instructions." href="http://en.wikipedia.org/wiki/Image:Nopipeline.png"><img height="56" alt="Model of a subscalar CPU. Notice that it takes fifteen cycles to complete three instructions." src="http://upload.wikimedia.org/wikipedia/commons/thumb/2/2c/Nopipeline.png/300px-Nopipeline.png" width="300" border="0" /></a> 
<div>
<div><a title="Enlarge" href="http://en.wikipedia.org/wiki/Image:Nopipeline.png"><img height="11" alt="" src="http://en.wikipedia.org/skins-1.5/common/images/magnify-clip.png" width="15" /></a></div>Model of a subscalar CPU. Notice that it takes fifteen cycles to complete three instructions.</div></div></div>
<p>The description of the basic operation of a CPU offered in the previous section describes the simplest form that a CPU can take. This type of CPU, usually referred to as <b>subscalar</b>, operates on and executes one instruction on one or two pieces of data at a time.</p>
<p>This process gives rise to an inherent inefficiency in subscalar CPUs. Since only one instruction is executed at a time, the entire CPU must wait for that instruction to complete before proceeding to the next instruction. As a result the subscalar CPU gets &quot;hung up&quot; on instructions which take more than one clock cycle to complete execution. Even adding a second execution unit (see below) does not improve performance much; rather than one pathway being hung up, now two pathways are hung up and the number of unused transistors is increased. This design, wherein the CPU's execution resources can operate on only one instruction at a time, can only possibly reach <b>scalar</b> performance (one instruction per clock). However, the performance is nearly always subscalar (less than one instruction per cycle).</p>
<p>Attempts to achieve scalar and better performance have resulted in a variety of design methodologies that cause the CPU to behave less linearly and more in parallel. When referring to parallelism in CPUs, two terms are generally used to classify these design techniques. <a title="Instruction level parallelism" href="http://en.wikipedia.org/wiki/Instruction_level_parallelism">Instruction level parallelism</a> (ILP) seeks to increase the rate at which instructions are executed within a CPU (that is, to increase the utilization of on-die execution resources), and <a title="Thread level parallelism" href="http://en.wikipedia.org/wiki/Thread_level_parallelism">thread level parallelism</a> (TLP) purposes to increase the number of <a title="Thread (computer science)" href="http://en.wikipedia.org/wiki/Thread_%28computer_science%29">threads</a> (effectively individual programs) that a CPU can execute simultaneously. Each methodology differs both in the ways in which they are implemented, as well as the relative effectiveness they afford in increasing the CPU's performance for an application.<sup><a title="" href="http://en.wikipedia.org/wiki/Central_processing_unit#cite_note-9">[10]</a></sup></p>
<p><a name="Instruction_level_parallelism"></a></p>
<h4><span>[<a title="Edit section: Instruction level parallelism" href="http://en.wikipedia.org/w/index.php?title=Central_processing_unit&action=edit&section=9">edit</a>]</span> <span>Instruction level parallelism</span></h4>
<dl>
<dd>
<div><i>Main articles: <a title="Instruction pipelining" href="http://en.wikipedia.org/wiki/Instruction_pipelining">Instruction pipelining</a> and <a title="Superscalar" href="http://en.wikipedia.org/wiki/Superscalar">Superscalar</a></i></div></dd></dl>
<div>
<div style="WIDTH: 302px"><a title="Basic five-stage pipeline.  In the best case scenario, this pipeline can sustain a completion rate of one instruction per cycle." href="http://en.wikipedia.org/wiki/Image:Fivestagespipeline.png"><img height="87" alt="Basic five-stage pipeline.  In the best case scenario, this pipeline can sustain a completion rate of one instruction per cycle." src="http://upload.wikimedia.org/wikipedia/commons/thumb/2/21/Fivestagespipeline.png/300px-Fivestagespipeline.png" width="300" border="0" /></a> 
<div>
<div><a title="Enlarge" href="http://en.wikipedia.org/wiki/Image:Fivestagespipeline.png"><img height="11" alt="" src="http://en.wikipedia.org/skins-1.5/common/images/magnify-clip.png" width="15" /></a></div>Basic five-stage pipeline. In the best case scenario, this pipeline can sustain a completion rate of one instruction per cycle.</div></div></div>
<p>One of the simplest methods used to accomplish increased parallelism is to begin the first steps of instruction fetching and decoding before the prior instruction finishes executing. This is the simplest form of a technique known as <b><a title="Instruction pipelining" href="http://en.wikipedia.org/wiki/Instruction_pipelining">instruction pipelining</a></b>, and is utilized in almost all modern general-purpose CPUs. Pipelining allows more than one instruction to be executed at any given time by breaking down the execution pathway into discrete stages. This separation can be compared to an assembly line, in which an instruction is made more complete at each stage until it exits the execution pipeline and is retired.</p>
<p>Pipelining does, however, introduce the possibility for a situation where the result of the previous operation is needed to complete the next operation; a condition often termed data dependency conflict. To cope with this, additional care must be taken to check for these sorts of conditions and delay a portion of the instruction pipeline if this occurs. Naturally, accomplishing this requires additional circuitry, so pipelined processors are more complex than subscalar ones (though not very significantly so). A pipelined processor can become very nearly scalar, inhibited only by pipeline stalls (an instruction spending more than one clock cycle in a stage).</p>
<div>
<div style="WIDTH: 302px"><a title="Simple superscalar pipeline.  By fetching and dispatching two instructions at a time, a maximum of two instructions per cycle can be completed." href="http://en.wikipedia.org/wiki/Image:Superscalarpipeline.png"><img height="173" alt="Simple superscalar pipeline.  By fetching and dispatching two instructions at a time, a maximum of two instructions per cycle can be completed." src="http://upload.wikimedia.org/wikipedia/commons/thumb/c/ce/Superscalarpipeline.png/300px-Superscalarpipeline.png" width="300" border="0" /></a> 
<div>
<div><a title="Enlarge" href="http://en.wikipedia.org/wiki/Image:Superscalarpipeline.png"><img height="11" alt="" src="http://en.wikipedia.org/skins-1.5/common/images/magnify-clip.png" width="15" /></a></div>Simple superscalar pipeline. By fetching and dispatching two instructions at a time, a maximum of two instructions per cycle can be completed.</div></div></div>
<p>Further improvement upon the idea of instruction pipelining led to the development of a method that decreases the idle time of CPU components even further. Designs that are said to be <b>superscalar</b> include a long instruction pipeline and multiple identical execution units. <span><sup><a title="" href="http://en.wikipedia.org/wiki/Central_processing_unit#endnote_Huynh2003a">[Huynh 2003]</a></sup></span> In a superscalar pipeline, multiple instructions are read and passed to a dispatcher, which decides whether or not the instructions can be executed in parallel (simultaneously). If so they are dispatched to available execution units, resulting in the ability for several instructions to be executed simultaneously. In general, the more instructions a superscalar CPU is able to dispatch simultaneously to waiting execution units, the more instructions will be completed in a given cycle.</p>
<p>Most of the difficulty in the design of a superscalar CPU architecture lies in creating an effective dispatcher. The dispatcher needs to be able to quickly and correctly determine whether instructions can be executed in parallel, as well as dispatch them in such a way as to keep as many execution units busy as possible. This requires that the instruction pipeline is filled as often as possible and gives rise to the need in superscalar architectures for significant amounts of <a title="CPU cache" href="http://en.wikipedia.org/wiki/CPU_cache">CPU cache</a>. It also makes <a title="Hazard (computer architecture)" href="http://en.wikipedia.org/wiki/Hazard_%28computer_architecture%29">hazard</a>-avoiding techniques like <a title="Branch prediction" href="http://en.wikipedia.org/wiki/Branch_prediction">branch prediction</a>, <a title="Speculative execution" href="http://en.wikipedia.org/wiki/Speculative_execution">speculative execution</a>, and <a title="Out-of-order execution" href="http://en.wikipedia.org/wiki/Out-of-order_execution">out-of-order execution</a> crucial to maintaining high levels of performance. By attempting to predict which branch (or path) a conditional instruction will take, the CPU can minimize the number of times that the entire pipeline must wait until a conditional instruction is completed. Speculative execution often provides modest performance increases by executing portions of code that may or may not be needed after a conditional operation completes. Out-of-order execution somewhat rearranges the order in which instructions are executed to reduce delays due to data dependencies.</p>
<p>In the case where a portion of the CPU is superscalar and part is not, the part which is not suffers a performance penalty due to scheduling stalls. The original <a title="Intel Pentium" href="http://en.wikipedia.org/wiki/Intel_Pentium">Intel Pentium</a> (P5) had two superscalar ALUs which could accept one instruction per clock each, but its FPU could not accept one instruction per clock. Thus the P5 was integer superscalar but not floating point superscalar. Intel's successor to the Pentium architecture, <a title="Intel P6" href="http://en.wikipedia.org/wiki/Intel_P6">P6</a>, added superscalar capabilities to its floating point features, and therefore afforded a significant increase in floating point instruction performance.</p>
<p>Both simple pipelining and superscalar design increase a CPU's ILP by allowing a single processor to complete execution of instructions at rates surpassing one instruction per cycle (<b>IPC</b>).<sup><a title="" href="http://en.wikipedia.org/wiki/Central_processing_unit#cite_note-10">[11]</a></sup> Most modern CPU designs are at least somewhat superscalar, and nearly all general purpose CPUs designed in the last decade are superscalar. In later years some of the emphasis in designing high-ILP computers has been moved out of the CPU's hardware and into its software interface, or ISA. The strategy of the <a title="Very long instruction word" href="http://en.wikipedia.org/wiki/Very_long_instruction_word">very long instruction word</a> (VLIW) causes some ILP to become implied directly by the software, reducing the amount of work the CPU must perform to boost ILP and thereby reducing the design's complexity.</p>
<p><a name="Thread_level_parallelism"></a></p>
<h4><span>[<a title="Edit section: Thread level parallelism" href="http://en.wikipedia.org/w/index.php?title=Central_processing_unit&action=edit&section=10">edit</a>]</span> <span>Thread level parallelism</span></h4>
<p>Another strategey of achieving performance is to execute multiple programs or <a title="Thread (computer science)" href="http://en.wikipedia.org/wiki/Thread_%28computer_science%29">threads</a> in parallel. This area of research is known as <a title="Parallel computing" href="http://en.wikipedia.org/wiki/Parallel_computing">parallel computing</a>. In <a title="Flynn's taxonomy" href="http://en.wikipedia.org/wiki/Flynn%27s_taxonomy">Flynn's taxonomy</a>, this strategy is known as Multiple Instructions-Multiple Data or MIMD.</p>
<p>One technology used for this purpose was <a title="Multiprocessing" href="http://en.wikipedia.org/wiki/Multiprocessing">multiprocessing</a> (MP). The initial flavor of this technology is known as <a title="Symmetric multiprocessing" href="http://en.wikipedia.org/wiki/Symmetric_multiprocessing">symmetric multiprocessing</a> (SMP), where a small number of CPUs share a coherent view of their memory system. In this scheme, each CPU has additional hardware to maintain a constantly up-to-date view of memory. By avoiding stale views of memory, the CPUs can cooperate on the same program and programs can migrate from one CPU to another. To increase the number of cooperating CPUs beyond a handful, schemes such as <a title="Non-uniform memory access" href="http://en.wikipedia.org/wiki/Non-uniform_memory_access">non-uniform memory access</a> (NUMA) and <a title="Directory-based coherence protocols" href="http://en.wikipedia.org/wiki/Directory-based_coherence_protocols">directory-based coherence protocols</a> were introduced in the 1990s. SMP systems are limited to a small number of CPUs while NUMA systems have been built with thousands of processors. Initially, multiprocessing was built using multiple discrete CPUs and boards to implement the interconnect between the processors. When the processors and their interconnect are all implemented on a single silicon chip, the technology is known as a <a title="Multi-core (computing)" href="http://en.wikipedia.org/wiki/Multi-core_%28computing%29">multi-core</a> microprocessor.</p>
<p>It was later recognized that finer-grain parallelism existed with a single program. A single program might have several threads (or functions) that could be executed separately or in parallel. Some of earliest examples of this technology implemented <a title="Input/output" href="http://en.wikipedia.org/wiki/Input/output">input/output</a> processing such as <a title="Direct memory access" href="http://en.wikipedia.org/wiki/Direct_memory_access">direct memory access</a> as a separate thread from the computation thread. A more general approach to this technology was introduced in the 1970s when systems were designed to run multiple computation threads in parallel. This technology is known as <a title="Multi-threading" href="http://en.wikipedia.org/wiki/Multi-threading">multi-threading</a> (MT). This approach is considered more cost-effective than multiprocessing, as only a small number of components within a CPU is replicated in order to support MT as opposed to the entire CPU in the case of MP. In MT, the execution units and the memory system including the caches are shared among multiple threads. The downside of MT is that the hardware support for multithreading is more visible to software than that of MP and thus supervisor software like operating systems have to undergo larger changes to support MT. One type of MT that was implemented is known as block multithreading, where one thread is executed until it is stalled waiting for data to return from external memory. In this scheme, the CPU would then quickly switch to another thread which is ready to run, the switch often done in one CPU clock cycle. Another type of MT is known as <a title="Simultaneous multithreading" href="http://en.wikipedia.org/wiki/Simultaneous_multithreading">simultaneous multithreading</a>, where instructions of multiple threads are executed in parallel within one CPU clock cycle.</p>
<p>For several decades from the 1970s to early 2000s, the focus in designing high performance general purpose CPUs was largely on achieving high ILP through technologies such as pipelining, caches, superscalar execution, Out-of-order execution, etc. This trend culminated in large, power-hungry CPUs such as the Intel <a title="Pentium 4" href="http://en.wikipedia.org/wiki/Pentium_4">Pentium 4</a>. By the early 2000s, CPU designers were thwarted from achieving higher performance from ILP techniques due to the growing disparity between CPU operating frequencies and main memory operating frequencies as well as escalating CPU power dissipation owing to more esoteric ILP techniques.</p>
<p>CPU designers then borrowed ideas from commercial computing markets such as <a title="Transaction processing" href="http://en.wikipedia.org/wiki/Transaction_processing">transaction processing</a>, where the aggregate performance of multiple programs, also known as <a title="Throughput" href="http://en.wikipedia.org/wiki/Throughput">throughput</a> computing, was more important than the performance of a single thread or program.</p>
<p>This reversal of emphasis is evidenced by the proliferation of dual and multiple core CMP (chip-level multiprocessing) designs and notably, Intel's newer designs resembling its less superscalar <a title="Intel P6" href="http://en.wikipedia.org/wiki/Intel_P6">P6</a> architecture. Late designs in several processor families exhibit CMP, including the <a title="X86-64" href="http://en.wikipedia.org/wiki/X86-64">x86-64</a> <a title="Opteron" href="http://en.wikipedia.org/wiki/Opteron">Opteron</a> and <a title="Athlon 64 X2" href="http://en.wikipedia.org/wiki/Athlon_64_X2">Athlon 64 X2</a>, the <a title="SPARC" href="http://en.wikipedia.org/wiki/SPARC">SPARC</a> <a title="UltraSPARC T1" href="http://en.wikipedia.org/wiki/UltraSPARC_T1">UltraSPARC T1</a>, IBM <a title="POWER4" href="http://en.wikipedia.org/wiki/POWER4">POWER4</a> and <a title="POWER5" href="http://en.wikipedia.org/wiki/POWER5">POWER5</a>, as well as several <a title="Video game console" href="http://en.wikipedia.org/wiki/Video_game_console">video game console</a> CPUs like the <a title="Xbox 360" href="http://en.wikipedia.org/wiki/Xbox_360">Xbox 360</a>'s triple-core PowerPC design, and the <a title="PS3" href="http://en.wikipedia.org/wiki/PS3">PS3</a>'s 8-core <a title="Cell (microprocessor)" href="http://en.wikipedia.org/wiki/Cell_%28microprocessor%29">Cell microprocessor</a>.</p>
<p><a name="Data_parallelism"></a></p>
<h4><span>[<a title="Edit section: Data parallelism" href="http://en.wikipedia.org/w/index.php?title=Central_processing_unit&action=edit&section=11">edit</a>]</span> <span>Data parallelism</span></h4>
<dl>
<dd>
<div><i>Main articles: <a title="Vector processor" href="http://en.wikipedia.org/wiki/Vector_processor">Vector processor</a> and <a title="SIMD" href="http://en.wikipedia.org/wiki/SIMD">SIMD</a></i></div></dd></dl>
<p>A less common but increasingly important paradigm of CPUs (and indeed, computing in general) deals with data parallelism. The processors discussed earlier are all referred to as some type of scalar device.<sup><a title="" href="http://en.wikipedia.org/wiki/Central_processing_unit#cite_note-11">[12]</a></sup> As the name implies, vector processors deal with multiple pieces of data in the context of one instruction. This contrasts with scalar processors, which deal with one piece of data for every instruction. Using <a title="Flynn's taxonomy" href="http://en.wikipedia.org/wiki/Flynn%27s_taxonomy">Flynn's taxonomy</a>, these two schemes of dealing with data are generally referred to as <a title="SISD" href="http://en.wikipedia.org/wiki/SISD">SISD</a> (single instruction, single data) and <a title="SIMD" href="http://en.wikipedia.org/wiki/SIMD">SIMD</a> (single instruction, multiple data), respectively. The great utility in creating CPUs that deal with vectors of data lies in optimizing tasks that tend to require the same operation (for example, a sum or a <a title="Dot product" href="http://en.wikipedia.org/wiki/Dot_product">dot product</a>) to be performed on a large set of data. Some classic examples of these types of tasks are <a title="Multimedia" href="http://en.wikipedia.org/wiki/Multimedia">multimedia</a> applications (images, video, and sound), as well as many types of <a title="Scientific computing" href="http://en.wikipedia.org/wiki/Scientific_computing">scientific</a> and engineering tasks. Whereas a scalar CPU must complete the entire process of fetching, decoding, and executing each instruction and value in a set of data, a vector CPU can perform a single operation on a comparatively large set of data with one instruction. Of course, this is only possible when the application tends to require many steps which apply one operation to a large set of data.</p>
<p>Most early vector CPUs, such as the <a title="Cray-1" href="http://en.wikipedia.org/wiki/Cray-1">Cray-1</a>, were associated almost exclusively with scientific research and <a title="Cryptography" href="http://en.wikipedia.org/wiki/Cryptography">cryptography</a> applications. However, as multimedia has largely shifted to digital media, the need for some form of SIMD in general-purpose CPUs has become significant. Shortly after <a title="Floating point unit" href="http://en.wikipedia.org/wiki/Floating_point_unit">floating point execution units</a> started to become commonplace to include in general-purpose processors, specifications for and implementations of SIMD execution units also began to appear for general-purpose CPUs. Some of these early SIMD specifications like Intel's <a title="MMX (instruction set)" href="http://en.wikipedia.org/wiki/MMX_%28instruction_set%29">MMX</a> were integer-only. This proved to be a significant impediment for some software developers, since many of the applications that benefit from SIMD primarily deal with <a title="Floating point" href="http://en.wikipedia.org/wiki/Floating_point">floating point</a> numbers. Progressively, these early designs were refined and remade into some of the common, modern SIMD specifications, which are usually associated with one ISA. Some notable modern examples are Intel's <a title="Streaming SIMD Extensions" href="http://en.wikipedia.org/wiki/Streaming_SIMD_Extensions">SSE</a> and the PowerPC-related <a title="AltiVec" href="http://en.wikipedia.org/wiki/AltiVec">AltiVec</a> (also known as VMX).<sup><a title="" href="http://en.wikipedia.org/wiki/Central_processing_unit#cite_note-12">[13]</a></sup></p>]]></description>
		</item>
		    
		
		<item>
			<title>Central processing unit (3)[转]</title>
			<link>http://longrenrex.blog.sohu.com/96678227.html</link>
			<comments>http://longrenrex.blog.sohu.com/96678227.html#comment</comments>
			<dc:creator>龙居</dc:creator>
			<pubDate>Thu, 7 Aug 2008 20:14:17 +0800</pubDate>
			<category>记事本</category>
			<guid>http://longrenrex.blog.sohu.com/96678227.html</guid>
			<description><![CDATA[<h2><span>CPU operation</span></h2>
<p>The fundamental operation of most CPUs, regardless of the physical form they take, is to execute a sequence of stored instructions called a program.The program is represented by a series of numbers that are kept in some kind of <a title="Memory (computers)" href="http://en.wikipedia.org/wiki/Memory_%28computers%29">computer memory</a>. There are four steps that nearly all von Neumann CPUs use in their operation: <b>fetch</b>, <b>decode</b>, <b>execute</b>, and <b>writeback</b>.</p>
<p><br />The first step, <b>fetch</b>, involves retrieving an <a title="Instruction (computer science)" href="http://en.wikipedia.org/wiki/Instruction_%28computer_science%29">instruction</a> (which is represented by a number or sequence of numbers) from program memory. The location in program memory is determined by a <a title="Program counter" href="http://en.wikipedia.org/wiki/Program_counter">program counter</a> (PC), which stores a number that identifies the current position in the program. In other words, the program counter keeps track of the CPU's place in the current program. After an instruction is fetched, the PC is incremented by the length of the instruction word in terms of memory units.<sup><a title="" href="http://en.wikipedia.org/wiki/Central_processing_unit#cite_note-2">[3]</a></sup> Often the instruction to be fetched must be retrieved from relatively slow memory, causing the CPU to stall while waiting for the instruction to be returned. This issue is largely addressed in modern processors by caches and pipelinearchitectures (see below).</p>
<p>The instruction that the CPU fetches from memory is used to determine what the CPU is to do. In the <b>decode</b> step, the instruction is broken up into parts that have significance to other portions of the CPU. The way in which the numerical instruction value is interpreted is defined by the CPU's instruction set architecture(<b>ISA</b>).<sup><a title="" href="http://en.wikipedia.org/wiki/Central_processing_unit#cite_note-3">[4]</a></sup> Often, one group of numbers in the instruction, called the opcode, indicates which operation to perform. The remaining parts of the number usually provide information required for that instruction, such as operands for an addition operation. Such operands may be given as a constant value (called an immediate value), or as a place to locate a value: a <a title="Processor register" href="http://en.wikipedia.org/wiki/Processor_register">register</a> or a memory address, as determined by some addressing mode. In older designs the portions of the CPU responsible for instruction decoding were unchangeable hardware devices. However, in more abstract and complicated CPUs and ISAs, a microprogram is often used to assist in translating instructions into various configuration signals for the CPU. This microprogram is sometimes rewritable so that it can be modified to change the way the CPU decodes instructions even after it has been manufactured.</p>
<p>After the fetch and decode steps, the <b>execute</b> step is performed. During this step, various portions of the CPU are connected so they can perform the desired operation. If, for instance, an addition operation was requested, an arithmetic logic unit (<b>ALU</b>) will be connected to a set of inputs and a set of outputs. The inputs provide the numbers to be added, and the outputs will contain the final sum. The ALU contains the circuitry to perform simple arithmetic and logical operations on the inputs (like addition and bitwise operations). If the addition operation produces a result too large for the CPU to handle, an arithmetic overflow flag in a flags register may also be set .</p>
<p>The final step, <b>writeback</b>, simply &quot;writes back&quot; the results of the execute step to some form of memory. Very often the results are written to some internal CPU register for quick access by subsequent instructions. In other cases results may be written to slower, but cheaper and larger, <a title="Random access memory" href="http://en.wikipedia.org/wiki/Random_access_memory">main memory</a>. Some types of instructions manipulate the program counter rather than directly produce result data. These are generally called &quot;jumps&quot; and facilitate behavior like <a title="Control flow" href="http://en.wikipedia.org/wiki/Control_flow#Loops">loops</a>, conditional program execution (through the use of a conditional jump), and <a title="Subroutine" href="http://en.wikipedia.org/wiki/Subroutine">functions</a> in programs.<sup><a title="" href="http://en.wikipedia.org/wiki/Central_processing_unit#cite_note-4">[5]</a></sup> Many instructions will also change the state of digits in a &quot;flags&quot; register. These flags can be used to influence how a program behaves, since they often indicate the outcome of various operations. For example, one type of &quot;compare&quot; instruction considers two values and sets a number in the flags register according to which one is greater. This flag could then be used by a later jump instruction to determine program flow.</p>
<p>After the execution of the instruction and writeback of the resulting data, the entire process repeats, with the next instruction cycle normally fetching the next-in-sequence instruction because of the incremented value in the program counter. If the completed instruction was a jump, the program counter will be modified to contain the address of the instruction that was jumped to, and program execution continues normally. In more complex CPUs than the one described here, multiple instructions can be fetched, decoded, and executed simultaneously. This section describes what is generally referred to as the &quot;Classic RISC pipeline,&quot; which in fact is quite common among the simple CPUs used in many electronic devices (often called microcontroller). It largely ignores the important role of <a title="CPU cache" href="http://en.wikipedia.org/wiki/CPU_cache">CPU cache</a>, and therefore the <b>access</b> stage of the pipeline.</p>
<p><a name="Design_and_implementation"></a></p>
<h2><span>[<a title="Edit section: Design and implementation" href="http://en.wikipedia.org/w/index.php?title=Central_processing_unit&action=edit&section=5">edit</a>]</span> <span>Design and implementation</span></h2>
<dl>
<dd>
<div><i>Main article: <a title="CPU design" href="http://en.wikipedia.org/wiki/CPU_design">CPU design</a></i></div></dd></dl>
<table style="#" cellpadding="4" border="0">
<tbody>
<tr>
Prerequisites</tr>
<tr>
<a title="Computer architecture" href="http://en.wikipedia.org/wiki/Computer_architecture">Computer architecture</a></tr>
<tr>
<a title="Digital circuit" href="http://en.wikipedia.org/wiki/Digital_circuit">Digital circuits</a></tr></tbody></table>
<p><a name="Integer_range"></a></p>
<h3><span>[<a title="Edit section: Integer range" href="http://en.wikipedia.org/w/index.php?title=Central_processing_unit&action=edit&section=6">edit</a>]</span> <span>Integer range</span></h3>
<p>The way a <b>CPU</b> represents numbers is a design choice that affects the most basic ways in which the device functions. Some early digital computers used an electrical model of the common <a title="Decimal" href="http://en.wikipedia.org/wiki/Decimal">decimal</a> (base ten) <a title="Numeral system" href="http://en.wikipedia.org/wiki/Numeral_system">numeral system</a> to represent numbers internally. A few other computers have used more exotic numeral systems like <a title="Balanced ternary" href="http://en.wikipedia.org/wiki/Balanced_ternary">ternary</a> (base three). Nearly all modern CPUs represent numbers in <a title="Binary numeral system" href="http://en.wikipedia.org/wiki/Binary_numeral_system">binary</a> form, with each digit being represented by some two-valued physical quantity such as a &quot;high&quot; or &quot;low&quot; <a title="Volt" href="http://en.wikipedia.org/wiki/Volt">voltage</a>.<sup><a title="" href="http://en.wikipedia.org/wiki/Central_processing_unit#cite_note-5">[6]</a></sup></p>
<div>
<div style="WIDTH: 252px"><a title="MOS 6502 microprocessor in a dual in-line package, an extremely popular 8-bit design." href="http://en.wikipedia.org/wiki/Image:MOS_6502AD_4585_top.jpg"><img height="91" alt="MOS 6502 microprocessor in a dual in-line package, an extremely popular 8-bit design." src="http://upload.wikimedia.org/wikipedia/commons/thumb/4/49/MOS_6502AD_4585_top.jpg/250px-MOS_6502AD_4585_top.jpg" width="250" border="0" /></a> 
<div>
<div><a title="Enlarge" href="http://en.wikipedia.org/wiki/Image:MOS_6502AD_4585_top.jpg"><img height="11" alt="" src="http://en.wikipedia.org/skins-1.5/common/images/magnify-clip.png" width="15" /></a></div><a title="MOS Technology 6502" href="http://en.wikipedia.org/wiki/MOS_Technology_6502">MOS 6502</a> microprocessor in a <a title="Dual in-line package" href="http://en.wikipedia.org/wiki/Dual_in-line_package">dual in-line package</a>, an extremely popular 8-bit design.</div></div></div>
<p>Related to number representation is the size and precision of numbers that a CPU can represent. In the case of a binary CPU, a <b>bit</b> refers to one significant place in the numbers a CPU deals with. The number of bits (or numeral places) a CPU uses to represent numbers is often called &quot;<a title="Word (computer science)" href="http://en.wikipedia.org/wiki/Word_%28computer_science%29">word size</a>&quot;, &quot;bit width&quot;, &quot;data path width&quot;, or &quot;integer precision&quot; when dealing with strictly integer numbers (as opposed to floating point). This number differs between architectures, and often within different parts of the very same CPU. For example, an <a title="8-bit" href="http://en.wikipedia.org/wiki/8-bit">8-bit</a> CPU deals with a range of numbers that can be represented by eight binary digits (each digit having two possible values), that is, 2<sup>8</sup> or 256 discrete numbers. In effect, integer size sets a hardware limit on the range of integers the software run by the CPU can utilize.<sup><a title="" href="http://en.wikipedia.org/wiki/Central_processing_unit#cite_note-6">[7]</a></sup></p>
<p>Integer range can also affect the number of locations in memory the CPU can <b>address</b> (locate). For example, if a binary CPU uses 32 bits to represent a memory address, and each memory address represents one <a title="Octet (computing)" href="http://en.wikipedia.org/wiki/Octet_%28computing%29">octet</a> (8 bits), the maximum quantity of memory that CPU can address is 2<sup>32</sup> octets, or 4 <a title="GiB" href="http://en.wikipedia.org/wiki/GiB">GiB</a>. This is a very simple view of CPU <a title="Address space" href="http://en.wikipedia.org/wiki/Address_space">address space</a>, and many designs use more complex addressing methods like <a title="Bank switching" href="http://en.wikipedia.org/wiki/Bank_switching">paging</a> in order to locate more memory than their integer range would allow with a flat address space.</p>
<p>Higher levels of integer range require more structures to deal with the additional digits, and therefore more complexity, size, power usage, and general expense. It is not at all uncommon, therefore, to see 4- or 8-bit <a title="Microcontroller" href="http://en.wikipedia.org/wiki/Microcontroller">microcontrollers</a> used in modern applications, even though CPUs with much higher range (such as 16, 32, 64, even 128-bit) are available. The simpler microcontrollers are usually cheaper, use less power, and therefore dissipate less heat, all of which can be major design considerations for electronic devices. However, in higher-end applications, the benefits afforded by the extra range (most often the additional address space) are more significant and often affect design choices. To gain some of the advantages afforded by both lower and higher bit lengths, many CPUs are designed with different bit widths for different portions of the device. For example, the IBM <a title="System/370" href="http://en.wikipedia.org/wiki/System/370">System/370</a> used a CPU that was primarily 32 bit, but it used 128-bit precision inside its <a title="Floating point" href="http://en.wikipedia.org/wiki/Floating_point">floating point</a> units to facilitate greater accuracy and range in floating point numbers <span><a title="" href="http://en.wikipedia.org/wiki/Central_processing_unit#endnote_Amdahl1964b">(Amdahl <i>et al.</i> 1964)</a></span>. Many later CPU designs use similar mixed bit width, especially when the processor is meant for general-purpose usage where a reasonable balance of integer and floating point capability is required.</p>
<p><a name="Clock_rate"></a></p>
<h3><span>[<a title="Edit section: Clock rate" href="http://en.wikipedia.org/w/index.php?title=Central_processing_unit&action=edit&section=7">edit</a>]</span> <span>Clock rate</span></h3>
<dl>
<dd>
<div><i>Main article: <a title="Clock rate" href="http://en.wikipedia.org/wiki/Clock_rate">Clock rate</a></i></div></dd></dl>
<p>Most CPUs, and indeed most <a title="Sequential logic" href="http://en.wikipedia.org/wiki/Sequential_logic">sequential logic</a> devices, are <a title="Synchronous circuit" href="http://en.wikipedia.org/wiki/Synchronous_circuit">synchronous</a> in nature.<sup><a title="" href="http://en.wikipedia.org/wiki/Central_processing_unit#cite_note-7">[8]</a></sup> That is, they are designed and operate on assumptions about a synchronization signal. This signal, known as a <b>clock signal</b>, usually takes the form of a periodic <a title="Square wave" href="http://en.wikipedia.org/wiki/Square_wave">square wave</a>. By calculating the maximum time that electrical signals can move in various branches of a CPU's many circuits, the designers can select an appropriate <a title="Frequency" href="http://en.wikipedia.org/wiki/Frequency">period</a> for the clock signal.</p>
<p>This period must be longer than the amount of time it takes for a signal to move, or propagate, in the worst-case scenario. In setting the clock period to a value well above the worst-case propagation delay, it is possible to design the entire CPU and the way it moves data around the &quot;edges&quot; of the rising and falling clock signal. This has the advantage of simplifying the CPU significantly, both from a design perspective and a component-count perspective. However, it also carries the disadvantage that the entire CPU must wait on its slowest elements, even though some portions of it are much faster. This limitation has largely been compensated for by various methods of increasing CPU parallelism (see below).</p>
<p>However architectural improvements alone do not solve all of the drawbacks of globally synchronous CPUs. For example, a clock signal is subject to the delays of any other electrical signal. Higher clock rates in increasingly complex CPUs make it more difficult to keep the clock signal in phase (synchronized) throughout the entire unit. This has led many modern CPUs to require multiple identical clock signals to be provided in order to avoid delaying a single signal significantly enough to cause the CPU to malfunction. Another major issue as clock rates increase dramatically is the amount of heat that is dissipated by the CPU. The constantly changing clock causes many components to switch regardless of whether they are being used at that time. In general, a component that is switching uses more energy than an element in a static state. Therefore, as clock rate increases, so does heat dissipation, causing the CPU to require more effective cooling solutions.</p>
<p>One method of dealing with the switching of unneeded components is called <a title="Clock gating" href="http://en.wikipedia.org/wiki/Clock_gating">clock gating</a>, which involves turning off the clock signal to unneeded components (effectively disabling them). However, this is often regarded as difficult to implement and therefore does not see common usage outside of very low-power designs.<sup><a title="" href="http://en.wikipedia.org/wiki/Central_processing_unit#cite_note-8">[9]</a></sup> Another method of addressing some of the problems with a global clock signal is the removal of the clock signal altogether. While removing the global clock signal makes the design process considerably more complex in many ways, asynchronous (or clockless) designs carry marked advantages in power consumption and heat dissipation in comparison with similar synchronous designs. While somewhat uncommon, entire CPUs have been built without utilizing a global clock signal. Two notable examples of this are the <a title="ARM architecture" href="http://en.wikipedia.org/wiki/ARM_architecture">ARM</a> compliant <a title="AMULET microprocessor" href="http://en.wikipedia.org/wiki/AMULET_microprocessor">AMULET</a> and the <a title="MIPS architecture" href="http://en.wikipedia.org/wiki/MIPS_architecture">MIPS</a> R3000 compatible MiniMIPS. Rather than totally removing the clock signal, some CPU designs allow certain portions of the device to be asynchronous, such as using asynchronous <a title="Arithmetic logic unit" href="http://en.wikipedia.org/wiki/Arithmetic_logic_unit">ALUs</a> in conjunction with superscalar pipelining to achieve some arithmetic performance gains. While it is not altogether clear whether totally asynchronous designs can perform at a comparable or better level than their synchronous counterparts, it is evident that they do at least excel in simpler math operations. This, combined with their excellent power consumption and heat dissipation properties, makes them very suitable for <a title="Embedded computer" href="http://en.wikipedia.org/wiki/Embedded_computer">embedded computers</a> <span><a title="" href="http://en.wikipedia.org/wiki/Central_processing_unit#endnote_Garside1999a">(Garside <i>et al.</i> 1999)</a></span>.</p>]]></description>
		</item>
		    
		
		<item>
			<title>Central processing unit (2)[转]</title>
			<link>http://longrenrex.blog.sohu.com/96678144.html</link>
			<comments>http://longrenrex.blog.sohu.com/96678144.html#comment</comments>
			<dc:creator>龙居</dc:creator>
			<pubDate>Thu, 7 Aug 2008 20:13:22 +0800</pubDate>
			<category>记事本</category>
			<guid>http://longrenrex.blog.sohu.com/96678144.html</guid>
			<description><![CDATA[<h3><span>Discrete transistor and IC CPUs</span></h3>
<div>
<div style="WIDTH: 352px"><a title="CPU, core memory, and external bus interface of a DEC PDP-8/I. made of medium-scale integrated circuits" href="http://en.wikipedia.org/wiki/Image:PDP-8i_cpu.jpg"><img height="263" alt="CPU, core memory, and external bus interface of a DEC PDP-8/I. made of medium-scale integrated circuits" src="http://upload.wikimedia.org/wikipedia/commons/thumb/0/03/PDP-8i_cpu.jpg/350px-PDP-8i_cpu.jpg" width="350" border="0" /></a> 
<div>
<div><a title="Enlarge" href="http://en.wikipedia.org/wiki/Image:PDP-8i_cpu.jpg"><img height="11" alt="" src="http://en.wikipedia.org/skins-1.5/common/images/magnify-clip.png" width="15" /></a></div>CPU, <a title="Magnetic core memory" href="http://en.wikipedia.org/wiki/Magnetic_core_memory">core memory</a>, and <a title="Computer bus" href="http://en.wikipedia.org/wiki/Computer_bus">external bus</a> interface of a DEC <a title="PDP-8" href="http://en.wikipedia.org/wiki/PDP-8">PDP-8</a>/I. made of medium-scale integrated circuits</div></div></div>
<p>The design complexity of CPUs increased as various technologies facilitated building smaller and more reliable electronic devices. The first such improvement came with the advent of the <a title="Transistor" href="http://en.wikipedia.org/wiki/Transistor">transistor</a>. Transistorized CPUs during the 1950s and 1960s no longer had to be built out of bulky, unreliable, and fragile switching elements like <a title="Vacuum tube" href="http://en.wikipedia.org/wiki/Vacuum_tube">vacuum tubes</a> and <a title="Relay" href="http://en.wikipedia.org/wiki/Relay">electrical relays</a>. With this improvement more complex and reliable CPUs were built onto one or several <a title="Printed circuit board" href="http://en.wikipedia.org/wiki/Printed_circuit_board">printed circuit boards</a> containing discrete (individual) components.</p>
<p>During this period, a method of manufacturing many transistors in a compact space gained popularity.The <a title="Integrated circuit" href="http://en.wikipedia.org/wiki/Integrated_circuit">integrated circuit</a> (<b>IC</b>) allowed a large number of transistors to be manufactured on a single <a title="Semiconductor" href="http://en.wikipedia.org/wiki/Semiconductor">semiconductor</a>-based <a title="Die (integrated circuit)" href="http://en.wikipedia.org/wiki/Die_%28integrated_circuit%29">die</a>, or &quot;chip.&quot; At first only very basic non-specialized digital circuits such as <a title="NOR gate" href="http://en.wikipedia.org/wiki/NOR_gate">NOR gates</a> were miniaturized into ICs. CPUs based upon these &quot;building block&quot; ICs are generally referred to as &quot;small-scale integration&quot; (<b>SSI</b>) devices. SSI ICs, such as the ones used in the <a title="Apollo guidance computer" href="http://en.wikipedia.org/wiki/Apollo_guidance_computer">Apollo guidance computer</a>, usually contained transistor counts numbering in multiples of ten. To build an entire CPU out of SSI ICs required thousands of individual chips, but still consumed much less space and power than earlier discrete transistor designs. As microelectronic technology advanced, an increasing number of transistors were placed on ICs, thus decreasing the quantity of individual ICs needed for a complete CPU. <b>MSI</b> and <b>LSI</b> (medium- and large-scale integration) ICs increased transistor counts to hundreds, and then thousands.</p>
<p>In 1964 <a title="IBM" href="http://en.wikipedia.org/wiki/IBM">IBM</a> introduced its <a title="System/360" href="http://en.wikipedia.org/wiki/System/360">System/360</a> computer architecture which was used in a series of computers that could run the same programs with different speed and performance. This was significant at a time when most electronic computers were incompatible with one another, even those made by the same manufacturer. To facilitate this improvement, IBM utilized the concept of a <a title="Microprogram" href="http://en.wikipedia.org/wiki/Microprogram">microprogram</a> (often called &quot;microcode&quot;), which still sees widespread usage in modern CPUs <span><a title="" href="http://en.wikipedia.org/wiki/Central_processing_unit#endnote_Amdahl1964a">(Amdahl <i>et al.</i> 1964)</a></span>. The System/360 architecture was so popular that it dominated the <a title="Mainframe computer" href="http://en.wikipedia.org/wiki/Mainframe_computer">mainframe computer</a> market for the decades and left a legacy that is still continued by similar modern computers like the IBM <a title="ZSeries" href="http://en.wikipedia.org/wiki/ZSeries">zSeries</a>. In the same year (1964), <a title="Digital Equipment Corporation" href="http://en.wikipedia.org/wiki/Digital_Equipment_Corporation">Digital Equipment Corporation</a> (DEC) introduced another influential computer aimed at the scientific and research markets, the <a title="PDP-8" href="http://en.wikipedia.org/wiki/PDP-8">PDP-8</a>. DEC would later introduce the extremely popular <a title="PDP-11" href="http://en.wikipedia.org/wiki/PDP-11">PDP-11</a> line that originally was built with SSI ICs but was eventually implemented with LSI components once these became practical. In stark contrast with its SSI and MSI predecessors, the first LSI implementation of the PDP-11 contained a CPU composed of only four LSI integrated circuits <span><a title="" href="http://en.wikipedia.org/wiki/Central_processing_unit#endnote_dec1975a">(Digital Equipment Corporation 1975)</a></span>.</p>
<p>Transistor-based computers had several distinct advantages over their predecessors. Aside from facilitating increased reliability and lower power consumption, transistors also allowed CPUs to operate at much higher speeds because of the short switching time of a transistor in comparison to a tube or relay. Thanks to both the increased reliability as well as the dramatically increased speed of the switching elements (which were almost exclusively transistors by this time), CPU clock rates in the tens of megahertz were obtained during this period. Additionally, while discrete transistor and IC CPUs were in heavy usage, new high-performance designs like <a title="SIMD" href="http://en.wikipedia.org/wiki/SIMD">SIMD</a> (Single Instruction Multiple Data) <a title="Vector processor" href="http://en.wikipedia.org/wiki/Vector_processor">vector processors</a> began to appear. These early experimental designs later gave rise to the era of specialized <a title="Supercomputer" href="http://en.wikipedia.org/wiki/Supercomputer">supercomputers</a> like those made by <a title="Cray Inc." href="http://en.wikipedia.org/wiki/Cray_Inc.">Cray Inc.</a></p>
<p><a name="Microprocessors"></a></p>
<h3><span>[<a title="Edit section: Microprocessors" href="http://en.wikipedia.org/w/index.php?title=Central_processing_unit&action=edit&section=3">edit</a>]</span> <span>Microprocessors</span></h3>
<dl>
<dd>
<div><i>Main article: <a title="Microprocessor" href="http://en.wikipedia.org/wiki/Microprocessor">Microprocessor</a></i></div></dd></dl>
<div>
<div style="WIDTH: 252px"><a title="The integrated circuit from an Intel 8742, an 8-bit microcontroller that includes a CPU running at 12 MHz, 128 bytes of RAM, 2048 bytes of EPROM, and I/O in the same chip." href="http://en.wikipedia.org/wiki/Image:153056995_5ef8b01016_o.jpg"><img height="193" alt="The integrated circuit from an Intel 8742, an 8-bit microcontroller that includes a CPU running at 12 MHz, 128 bytes of RAM, 2048 bytes of EPROM, and I/O in the same chip." src="http://upload.wikimedia.org/wikipedia/commons/thumb/c/c7/153056995_5ef8b01016_o.jpg/250px-153056995_5ef8b01016_o.jpg" width="250" border="0" /></a> 
<div>
<div><a title="Enlarge" href="http://en.wikipedia.org/wiki/Image:153056995_5ef8b01016_o.jpg"><img height="11" alt="" src="http://en.wikipedia.org/skins-1.5/common/images/magnify-clip.png" width="15" /></a></div>The <a title="Integrated circuit" href="http://en.wikipedia.org/wiki/Integrated_circuit">integrated circuit</a> from an <a title="Intel" href="http://en.wikipedia.org/wiki/Intel">Intel</a> 8742, an 8-bit microcontroller that includes a <a title="CPU" href="http://en.wikipedia.org/wiki/CPU">CPU</a> running at 12 MHz, 128 bytes of <a title="RAM" href="http://en.wikipedia.org/wiki/RAM">RAM</a>, 2048 bytes of <a title="EPROM" href="http://en.wikipedia.org/wiki/EPROM">EPROM</a>, and <a title="Input/output" href="http://en.wikipedia.org/wiki/Input/output">I/O</a> in the same chip.</div></div></div>
<div>
<div style="WIDTH: 252px"><a title="Intel 80486DX2 microprocessor in a ceramic PGA package." href="http://en.wikipedia.org/wiki/Image:Intel_80486DX2_bottom.jpg"><img height="205" alt="Intel 80486DX2 microprocessor in a ceramic PGA package." src="http://upload.wikimedia.org/wikipedia/commons/thumb/e/e7/Intel_80486DX2_bottom.jpg/250px-Intel_80486DX2_bottom.jpg" width="250" border="0" /></a> 
<div>
<div><a title="Enlarge" href="http://en.wikipedia.org/wiki/Image:Intel_80486DX2_bottom.jpg"><img height="11" alt="" src="http://en.wikipedia.org/skins-1.5/common/images/magnify-clip.png" width="15" /></a></div><a title="Intel 80486DX2" href="http://en.wikipedia.org/wiki/Intel_80486DX2">Intel 80486DX2</a> microprocessor in a ceramic <a title="Pin grid array" href="http://en.wikipedia.org/wiki/Pin_grid_array">PGA</a> package.</div></div></div>
<p>The introduction of the <a title="Microprocessor" href="http://en.wikipedia.org/wiki/Microprocessor">microprocessor</a> in the 1970s significantly affected the design and implementation of CPUs. Since the introduction of the first microprocessor (the <a title="Intel 4004" href="http://en.wikipedia.org/wiki/Intel_4004">Intel 4004</a>) in 1970 and the first widely used microprocessor (the <a title="Intel 8080" href="http://en.wikipedia.org/wiki/Intel_8080">Intel 8080</a>) in 1974, this class of CPUs has almost completely overtaken all other central processing unit implementation methods. Mainframe and minicomputer manufacturers of the time launched proprietary IC development programs to upgrade their older <a title="Computer architecture" href="http://en.wikipedia.org/wiki/Computer_architecture">computer architectures</a>, and eventually produced instruction set compatible microprocessors that were backward-compatible with their older hardware and software. Combined with the advent and eventual vast success of the now ubiquitous <a title="Personal computer" href="http://en.wikipedia.org/wiki/Personal_computer">personal computer</a>, the term &quot;CPU&quot; is now applied almost exclusively to microprocessors.</p>
<p>Previous generations of CPUs were implemented as discrete components and numerous small <a title="Integrated circuit" href="http://en.wikipedia.org/wiki/Integrated_circuit">integrated circuits</a> (ICs) on one or more circuit boards. Microprocessors, on the other hand, are CPUs manufactured on a very small number of ICs; usually just one. The overall smaller CPU size as a result of being implemented on a single die means faster switching time because of physical factors like decreased gate parasitic <a title="Capacitance" href="http://en.wikipedia.org/wiki/Capacitance">capacitance</a>. This has allowed synchronous microprocessors to have clock rates ranging from tens of megahertz to several gigahertz. Additionally, as the ability to construct exceedingly small transistors on an IC has increased, the complexity and number of transistors in a single CPU has increased dramatically. This widely observed trend is described by <a title="Moore's law" href="http://en.wikipedia.org/wiki/Moore%27s_law">Moore's law</a>, which has proven to be a fairly accurate predictor of the growth of CPU (and other IC) complexity to date.</p>
<p>While the complexity, size, construction, and general form of CPUs have changed drastically over the past sixty years, it is notable that the basic design and function has not changed much at all. Almost all common CPUs today can be very accurately described as von Neumann stored-program machines. As the aforementioned Moore's law continues to hold true, concerns have arisen about the limits of integrated circuit transistor technology. Extreme miniaturization of electronic gates is causing the effects of phenomena like <a title="Electromigration" href="http://en.wikipedia.org/wiki/Electromigration">electromigration</a> and <a title="Subthreshold leakage" href="http://en.wikipedia.org/wiki/Subthreshold_leakage">subthreshold leakage</a> to become much more significant. These newer concerns are among the many factors causing researchers to investigate new methods of computing such as the <a title="Quantum computer" href="http://en.wikipedia.org/wiki/Quantum_computer">quantum computer</a>, as well as to expand the usage of <a title="Parallel computing" href="http://en.wikipedia.org/wiki/Parallel_computing">parallelism</a> and other methods that extend the usefulness of the classical von Neumann model.</p>]]></description>
		</item>
		    
		
		<item>
			<title>Central processing unit (1)[转]</title>
			<link>http://longrenrex.blog.sohu.com/96678065.html</link>
			<comments>http://longrenrex.blog.sohu.com/96678065.html#comment</comments>
			<dc:creator>龙居</dc:creator>
			<pubDate>Thu, 7 Aug 2008 20:12:41 +0800</pubDate>
			<category>记事本</category>
			<guid>http://longrenrex.blog.sohu.com/96678065.html</guid>
			<description><![CDATA[<div>
<div>&quot;CPU&quot; redirects here. For other uses, see <a title="CPU (disambiguation)" href="http://en.wikipedia.org/wiki/CPU_%28disambiguation%29">CPU (disambiguation)</a>.</div>
<div>
<div style="WIDTH: 252px"><a title="Die of an Intel 80486DX2 microprocessor (actual size: 12×6.75?mm) in its packaging." href="http://en.wikipedia.org/wiki/Image:80486dx2-large.jpg"><img height="187" alt="Die of an Intel 80486DX2 microprocessor (actual size: 12×6.75?mm) in its packaging." src="http://upload.wikimedia.org/wikipedia/commons/thumb/0/02/80486dx2-large.jpg/250px-80486dx2-large.jpg" width="250" border="0" /></a> 
<div>
<div><a title="Enlarge" href="http://en.wikipedia.org/wiki/Image:80486dx2-large.jpg"><img height="11" alt="" src="http://en.wikipedia.org/skins-1.5/common/images/magnify-clip.png" width="15" /></a></div><a title="Die (integrated circuit)" href="http://en.wikipedia.org/wiki/Die_%28integrated_circuit%29">Die</a> of an <a title="Intel 80486DX2" href="http://en.wikipedia.org/wiki/Intel_80486DX2">Intel 80486DX2</a> microprocessor (actual size: 12&times;6.75&nbsp;mm) in its packaging.</div></div></div>
<p>A <b>Central Processing Unit</b> (<b>CPU</b>), or sometimes just called <b>processor</b>, is a description of a class of logic machines that can execute <a title="Computer program" href="http://en.wikipedia.org/wiki/Computer_program">computer programs</a>. This broad definition can easily be applied to many early computers that existed long before the term &quot;CPU&quot; ever came into widespread usage. The term itself and its initialism have been in use in the computer industry at least since the early 1960s <span><a title="" href="http://en.wikipedia.org/wiki/Central_processing_unit#endnote_weik1961a">(Weik 1961)</a></span>. The form, design and implementation of CPUs have changed dramatically since the earliest examples, but their fundamental operation has remained much the same.</p>
<p>Early CPUs were custom-designed as a part of a larger, sometimes one-of-a-kind, computer. However, this costly method of <a title="Designing" href="http://en.wikipedia.org/wiki/Designing">designing</a> custom CPUs for a particular application has largely given way to the development of mass-produced processors that are suited for one or many purposes. This standardization trend generally began in the era of discrete <a title="Transistor" href="http://en.wikipedia.org/wiki/Transistor">transistor</a> <a title="Mainframe computer" href="http://en.wikipedia.org/wiki/Mainframe_computer">mainframes</a> and <a title="Minicomputer" href="http://en.wikipedia.org/wiki/Minicomputer">minicomputers</a> and has rapidly accelerated with the popularization of the <a title="Integrated circuit" href="http://en.wikipedia.org/wiki/Integrated_circuit">integrated circuit</a> (IC). The IC has allowed increasingly complex CPUs to be designed and manufactured to tolerances on the order of <a title="Nanometer" href="http://en.wikipedia.org/wiki/Nanometer">nanometers</a>. Both the miniaturization and standardization of CPUs have increased the presence of these digital devices in modern life far beyond the limited application of dedicated computing machines. Modern microprocessors appear in everything from <a title="Automobile" href="http://en.wikipedia.org/wiki/Automobile">automobiles</a> to <a title="Cell phone" href="http://en.wikipedia.org/wiki/Cell_phone">cell phones</a> to children's toys.</p>
<p>&nbsp;</p>
<h2><span>History of CPUs</span></h2>
<dl>
<dd>
<div><i>Main article: <a title="History of general purpose CPUs" href="http://en.wikipedia.org/wiki/History_of_general_purpose_CPUs">History of general purpose CPUs</a></i></div></dd></dl>
<div>
<div style="WIDTH: 252px"><a title="EDVAC, one of the first electronic stored program computers." href="http://en.wikipedia.org/wiki/Image:Edvac.jpg"><img height="324" alt="EDVAC, one of the first electronic stored program computers." src="http://upload.wikimedia.org/wikipedia/commons/thumb/1/17/Edvac.jpg/250px-Edvac.jpg" width="250" border="0" /></a> 
<div>
<div><a title="Enlarge" href="http://en.wikipedia.org/wiki/Image:Edvac.jpg"><img height="11" alt="" src="http://en.wikipedia.org/skins-1.5/common/images/magnify-clip.png" width="15" /></a></div><a title="EDVAC" href="http://en.wikipedia.org/wiki/EDVAC">EDVAC</a>, one of the first electronic stored program computers.</div></div></div>
<p>Prior to the advent of machines that resemble today's CPUs, computers such as the <a title="ENIAC" href="http://en.wikipedia.org/wiki/ENIAC">ENIAC</a> had to be physically rewired in order to perform different tasks. These machines are often referred to as &quot;fixed-program computers,&quot; since they had to be physically reconfigured in order to run a different program. Since the term &quot;CPU&quot; is generally defined as a software (computer program) execution device, the earliest devices that could rightly be called CPUs came with the advent of the stored-program computer.</p>
<p>The idea of a stored-program computer was already present during ENIAC's design, but was initially omitted so the machine could be finished sooner. On June 30, 1945, before ENIAC was even completed, mathematician <a title="John von Neumann" href="http://en.wikipedia.org/wiki/John_von_Neumann">John von Neumann</a> distributed the paper entitled &quot;<a title="First Draft of a Report on the EDVAC" href="http://en.wikipedia.org/wiki/First_Draft_of_a_Report_on_the_EDVAC">First Draft of a Report on the EDVAC</a>.&quot; It outlined the design of a stored-program computer that would eventually be completed in August 1949 <span><a title="" href="http://en.wikipedia.org/wiki/Central_processing_unit#endnote_vonNeumann1945a">(von Neumann 1945)</a></span>. EDVAC was designed to perform a certain number of instructions (or operations) of various types. These instructions could be combined to create useful programs for the EDVAC to run. Significantly, the programs written for EDVAC were stored in high-speed <a title="Memory (computers)" href="http://en.wikipedia.org/wiki/Memory_%28computers%29">computer memory</a> rather than specified by the physical wiring of the computer. This overcame a severe limitation of ENIAC, which was the large amount of time and effort it took to reconfigure the computer to perform a new task. With von Neumann's design, the program, or software, that EDVAC ran could be changed simply by changing the contents of the computer's memory. <sup><a title="" href="http://en.wikipedia.org/wiki/Central_processing_unit#cite_note-0">[1]</a></sup></p>
<p>While von Neumann is most often credited with the design of the stored-program computer because of his design of EDVAC, others before him such as <a title="Konrad Zuse" href="http://en.wikipedia.org/wiki/Konrad_Zuse">Konrad Zuse</a> had suggested similar ideas. Additionally, the so-called <a title="Harvard architecture" href="http://en.wikipedia.org/wiki/Harvard_architecture">Harvard architecture</a> of the <a title="Harvard Mark I" href="http://en.wikipedia.org/wiki/Harvard_Mark_I">Harvard Mark I</a>, which was completed before EDVAC, also utilized a stored-program design using <a title="Punched tape" href="http://en.wikipedia.org/wiki/Punched_tape">punched paper tape</a> rather than electronic memory. The key difference between the von Neumann and Harvard architectures is that the latter separates the storage and treatment of CPU instructions and data, while the former uses the same memory space for both. Most modern CPUs are primarily von Neumann in design, but elements of the Harvard architecture are commonly seen as well.</p>
<p>Being <a title="Digital" href="http://en.wikipedia.org/wiki/Digital">digital</a> devices, all CPUs deal with discrete states and therefore require some kind of switching elements to differentiate between and change these states. Prior to commercial acceptance of the transistor, <a title="Relay" href="http://en.wikipedia.org/wiki/Relay">electrical relays</a> and <a title="Vacuum tube" href="http://en.wikipedia.org/wiki/Vacuum_tube">vacuum tubes</a> (thermionic valves) were commonly used as switching elements. Although these had distinct speed advantages over earlier, purely mechanical designs, they were unreliable for various reasons. For example, building <a title="Direct current" href="http://en.wikipedia.org/wiki/Direct_current">direct current</a> <a title="Sequential logic" href="http://en.wikipedia.org/wiki/Sequential_logic">sequential logic</a> circuits out of relays requires additional hardware to cope with the problem of <a title="Switch" href="http://en.wikipedia.org/wiki/Switch#Contact_bounce">contact bounce</a>. While vacuum tubes do not suffer from contact bounce, they must heat up before becoming fully operational and eventually stop functioning altogether.<sup><a title="" href="http://en.wikipedia.org/wiki/Central_processing_unit#cite_note-1">[2]</a></sup> Usually, when a tube failed, the CPU would have to be diagnosed to locate the failing component so it could be replaced. Therefore, early electronic (vacuum tube based) computers were generally faster but less reliable than electromechanical (relay based) computers.</p>
<p>Tube computers like <a title="EDVAC" href="http://en.wikipedia.org/wiki/EDVAC">EDVAC</a> tended to average eight hours between failures, whereas relay computers like the (slower, but earlier) <a title="Harvard Mark I" href="http://en.wikipedia.org/wiki/Harvard_Mark_I">Harvard Mark I</a> failed very rarely <span><a title="" href="http://en.wikipedia.org/wiki/Central_processing_unit#endnote_weik1961b">(Weik 1961:238)</a></span>. In the end, tube based CPUs became dominant because the significant speed advantages afforded generally outweighed the reliability problems. Most of these early synchronous CPUs ran at low <a title="Clock rate" href="http://en.wikipedia.org/wiki/Clock_rate">clock rates</a> compared to modern microelectronic designs (see below for a discussion of clock rate). Clock signal frequencies ranging from 100 <a title="Hertz" href="http://en.wikipedia.org/wiki/Hertz">kHz</a> to 4&nbsp;MHz were very common at this time, limited largely by the speed of the switching devices they were built with.</p></div>]]></description>
		</item>
		    
		
		<item>
			<title>JVM 参数设置备忘【原创】</title>
			<link>http://longrenrex.blog.sohu.com/95868786.html</link>
			<comments>http://longrenrex.blog.sohu.com/95868786.html#comment</comments>
			<dc:creator>龙居</dc:creator>
			<pubDate>Wed, 30 Jul 2008 09:57:53 +0800</pubDate>
			<category>JAVA</category>
			<guid>http://longrenrex.blog.sohu.com/95868786.html</guid>
			<description><![CDATA[<p>&nbsp;&nbsp;&nbsp; 最近在做系统的压力测试，为了增大压力，需要对jvm进行一些设置，在此做个笔记，备忘。</p>
<p><strong>Heap Tuning Parameters</strong></p>
<p>Maximum heap size depends on maximum address space per process. The following table shows the maximum per-process address values for various platforms:</p>
<p><a name="gacna"></a>Table 4&ndash;1 Maximum Address Space Per Process 
<table cellspacing="0" cellpadding="10" border="2"><b></b>
<tbody>
<tr>
<td valign="top" align="left">
<p>Operating System </p></td>
<td valign="top" align="left">
<p>Maximum Address Space Per Process </p></td></tr>
<tr>
<td valign="top" align="left">
<p>Redhat Linux 32 bit</p></td>
<td valign="top" align="left">
<p>2 GB</p></td></tr>
<tr>
<td valign="top" align="left">
<p>Redhat Linux 64 bit</p></td>
<td valign="top" align="left">
<p>3 GB</p></td></tr>
<tr>
<td valign="top" align="left">
<p>Windows 98/2000/NT/Me/XP</p></td>
<td valign="top" align="left">
<p>2 GB</p></td></tr>
<tr>
<td valign="top" align="left">
<p>Solaris x86 (32 bit)</p></td>
<td valign="top" align="left">
<p>4 GB</p></td></tr>
<tr>
<td valign="top" align="left">
<p>Solaris 32 bit</p></td>
<td valign="top" align="left">
<p>4 GB</p></td></tr>
<tr>
<td valign="top" align="left">
<p>Solaris 64 bit</p></td>
<td valign="top" align="left">
<p>Terabytes</p></td></tr></tbody></table></p>
<p></p>
<p>Maximum heap space is always smaller than maximum address space per process, because the process also needs space for stack, libraries, and so on. To determine the maximum heap space that can be allocated, use a profiling tool to examine the way memory is used. Gauge the maximum stack space the process uses and the amount of memory taken up libraries and other memory structures. The difference between the maximum address space and the total of those values is the amount of memory that can be allocated to the heap.</p>
<p>You can improve performance by increasing your heap size or using a different garbage collector. In general, for long-running server applications, use the J2SE throughput collector on machines with multiple processors (<tt>-XX:+AggressiveHeap</tt>) and as large a heap as you can fit in the free memory of your machine.</p>
<p><a name=""></a>&nbsp;</p>
<h3>Heap Tuning Parameters</h3>
<p>You can control the heap size with the following JVM parameters:</p>
<p><a name=""></a>&nbsp;</p>
<ul>
<li>
<p><tt>-Xms</tt>value </p></li>
<li>
<p><tt>-Xmx</tt>value </p></li>
<li>
<p><tt>-XX:MinHeapFreeRatio=</tt>minimum </p></li>
<li>
<p><tt>-XX:MaxHeapFreeRatio=</tt>maximum </p></li>
<li>
<p><tt>-XX:NewRatio=</tt>ratio </p></li>
<li>
<p><tt>-XX:NewSize=</tt>size </p></li>
<li>
<p><tt>-XX:MaxNewSize=</tt>size </p></li>
<li>
<p><tt>-XX:+AggressiveHeap</tt> </p></li></ul><span>
<p><strong>Behavioral Options</strong></p>
<p>Option and Default Value<br />&nbsp;Description <br />-XX:-AllowUserSignalHandlers Do not complain if the application installs signal handlers. (Relevant to Solaris and Linux only.)</p>
<p>&nbsp;<br />-XX:AltStackSize=16384 Alternate signal stack size (in Kbytes). (Relevant to Solaris only, removed from 5.0.)</p>
<p>&nbsp;<br />-XX:-DisableExplicitGC Disable calls to System.gc(), JVM still performs garbage collection when necessary.</p>
<p>&nbsp;<br />-XX:+FailOverToOldVerifier Fail over to old verifier when the new type checker fails. (Introduced in 6.)</p>
<p>&nbsp;<br />-XX:+HandlePromotionFailure The youngest generation collection does not require a guarantee of full promotion of all live objects. (Introduced in 1.4.2 update 11) [5.0 and earlier: false.]</p>
<p>&nbsp;<br />-XX:+MaxFDLimit Bump the number of file descriptors to max. (Relevant&nbsp; to Solaris only.)</p>
<p>&nbsp;<br />-XX:PreBlockSpin=10 Spin count variable for use with -XX:+UseSpinning. Controls the maximum spin iterations allowed before entering operating system thread synchronization code. (Introduced in 1.4.2.)</p>
<p>&nbsp;<br />-XX:-RelaxAccessControlCheck Relax the access control checks in the verifier. (Introduced in 6.)</p>
<p>&nbsp;<br />-XX:+ScavengeBeforeFullGC Do young generation GC prior to a full GC. (Introduced in 1.4.1.)</p>
<p>&nbsp;<br />-XX:+UseAltSigs Use alternate signals instead of SIGUSR1 and SIGUSR2 for VM internal signals. (Introduced in 1.3.1 update 9, 1.4.1. Relevant to Solaris only.)</p>
<p>&nbsp;<br />-XX:+UseBoundThreads Bind user level threads to kernel threads. (Relevant to Solaris only.)</p>
<p>&nbsp;<br />-XX:-UseConcMarkSweepGC Use concurrent mark-sweep collection for the old generation. (Introduced in 1.4.1)</p>
<p>&nbsp;<br />-XX:+UseGCOverheadLimit Use a policy that limits the proportion of the VM's time that is spent in GC before an OutOfMemory error is thrown. (Introduced in 6.)</p>
<p>&nbsp;<br />-XX:+UseLWPSynchronization Use LWP-based instead of thread based synchronization. (Introduced in 1.4.0. Relevant to Solaris only.)</p>
<p>&nbsp;<br />-XX:-UseParallelGC Use parallel garbage collection for scavenges. (Introduced in 1.4.1)</p>
<p>&nbsp;<br />-XX:-UseParallelOldGC Use parallel garbage collection for the full collections. Enabling this option automatically sets -XX:+UseParallelGC. (Introduced in 5.0 update 6.)</p>
<p>&nbsp;<br />-XX:-UseSerialGC Use serial garbage collection. (Introduced in 5.0.)</p>
<p>&nbsp;<br />-XX:-UseSpinning Enable naive spinning on Java monitor before entering operating system thread synchronizaton code. (Relevant to 1.4.2 and 5.0 only.) [1.4.2, multi-processor Windows platforms: true]</p>
<p>&nbsp;<br />-XX:+UseTLAB Use thread-local object allocation (Introduced in 1.4.0, known as UseTLE prior to that.) [1.4.2 and earlier, x86 or with -client: false]</p>
<p>&nbsp;<br />-XX:+UseSplitVerifier Use the new type checker with StackMapTable attributes. (Introduced in 5.0.)[5.0: false]</p>
<p>&nbsp;<br />-XX:+UseThreadPriorities Use native thread priorities.</p>
<p>&nbsp;<br />-XX:+UseVMInterruptibleIO Thread interrupt before or with EINTR for I/O operations results in OS_INTRPT. (Introduced in 6. Relevant to Solaris only.)<br />&nbsp;</p>
<p>--------------------------------------------------------------------------------</p>
<p><strong>Performance Options</strong></p>
<p>Option and Default Value<br />&nbsp;Description <br />-XX:+AggressiveOpts Turn on point performance compiler optimizations that are expected to be default in upcoming releases. (Introduced in 5.0 update 6.)</p>
<p>&nbsp;<br />-XX:CompileThreshold=10000 Number of method invocations/branches before compiling [-client: 1,500]</p>
<p>&nbsp;<br />-XX:LargePageSizeInBytes=4m Sets the large page size used for the Java heap. (Introduced in 1.4.0 update 1.) [amd64: 2m.]</p>
<p>&nbsp;<br />-XX:MaxHeapFreeRatio=70 Maximum percentage of heap free after GC to avoid shrinking.</p>
<p>&nbsp;<br />-XX:MaxNewSize=size Maximum size of new generation (in bytes). Since 1.4, MaxNewSize is computed as a function of NewRatio. [1.3.1 Sparc: 32m; 1.3.1 x86: 2.5m.]</p>
<p>&nbsp;<br />-XX:MaxPermSize=64m Size of the Permanent Generation.&nbsp; [5.0 and newer: 64 bit VMs are scaled 30% larger; 1.4 amd64: 96m; 1.3.1 -client: 32m.]</p>
<p>&nbsp;<br />-XX:MinHeapFreeRatio=40 Minimum percentage of heap free after GC to avoid expansion.</p>
<p>&nbsp;<br />-XX:NewRatio=2 Ratio of new/old generation sizes. [Sparc -client: 8; x86 -server: 8; x86 -client: 12.]-client: 4 (1.3) 8 (1.3.1+), x86: 12]</p>
<p>&nbsp;<br />-XX:NewSize=2.125m Default size of new generation (in bytes) [5.0 and newer: 64 bit VMs are scaled 30% larger; x86: 1m; x86, 5.0 and older: 640k]</p>
<p>&nbsp;<br />-XX:ReservedCodeCacheSize=32m Reserved code cache size (in bytes) - maximum code cache size. [Solaris 64-bit, amd64, and -server x86: 48m; in 1.5.0_06 and earlier, Solaris 64-bit and and64: 1024m.]</p>
<p>&nbsp;<br />-XX:SurvivorRatio=8 Ratio of eden/survivor space size [Solaris amd64: 6; Sparc in 1.3.1: 25; other Solaris platforms in 5.0 and earlier: 32]</p>
<p>&nbsp;<br />-XX:TargetSurvivorRatio=50 Desired percentage of survivor space used after scavenge.</p>
<p>&nbsp;<br />-XX:ThreadStackSize=512 Thread Stack Size (in Kbytes). (0 means use default stack size) [Sparc: 512; Solaris x86: 320 (was 256 prior in 5.0 and earlier); Sparc 64 bit: 1024; Linux amd64: 1024 (was 0 in 5.0 and earlier); all others 0.]</p>
<p>&nbsp;<br />-XX:+UseBiasedLocking Enable biased locking. For more details, see this tuning example. (Introduced in 5.0 update 6.) [5.0: false]</p>
<p>&nbsp;<br />-XX:+UseFastAccessorMethods Use optimized versions of Get&lt;Primitive&gt;Field.</p>
<p>&nbsp;<br />-XX:-UseISM Use Intimate Shared Memory. [Not accepted for non-Solaris platforms.] For details, see Intimate Shared Memory.</p>
<p>&nbsp;<br />-XX:+UseLargePages Use large page memory. (Introduced in 5.0 update 5.) For details, see Java Support for Large Memory Pages.</p>
<p>&nbsp;<br />-XX:+UseMPSS Use Multiple Page Size Support w/4mb pages for the heap. Do not use with ISM as this replaces the need for ISM. (Introduced in 1.4.0 update 1, Relevant to Solaris 9 and newer.) [1.4.1 and earlier: false]<br />&nbsp;</p>
<p>--------------------------------------------------------------------------------</p>
<p><strong>Debugging Options</strong></p>
<p>Option and Default Value<br />&nbsp;Description <br />-XX:-CITime Prints time spent in JIT Compiler. (Introduced in 1.4.0.)</p>
<p>&nbsp;<br />-XX:ErrorFile=./hs_err_pid&lt;pid&gt;.log If an error occurs, save the error data to this file. (Introduced in 6.)</p>
<p>&nbsp;<br />-XX:-ExtendedDTraceProbes Enable performance-impacting dtrace probes. (Introduced in 6. Relevant to Solaris only.)</p>
<p>&nbsp;<br />-XX:HeapDumpPath=./java_pid&lt;pid&gt;.hprof Path to directory or filename for heap dump. Manageable. (Introduced in 1.4.2 update 12, 5.0 update 7.)</p>
<p>&nbsp;<br />-XX:-HeapDumpOnOutOfMemoryError Dump heap to file when java.lang.OutOfMemoryError is thrown. Manageable. (Introduced in 1.4.2 update 12, 5.0 update 7.)</p>
<p>&nbsp;<br />-XX:OnError=&quot;&lt;cmd args&gt;;&lt;cmd args&gt;&quot; Run user-defined commands on fatal error. (Introduced in 1.4.2 update 9.)</p>
<p>&nbsp;<br />-XX:OnOutOfMemoryError=&quot;&lt;cmd args&gt;;<br />&lt;cmd args&gt;&quot; Run user-defined commands when an OutOfMemoryError is first thrown. (Introduced in 1.4.2 update 12, 6)</p>
<p>&nbsp;<br />-XX:-PrintClassHistogram Print a histogram of class instances on Ctrl-Break. Manageable. (Introduced in 1.4.2.) The jmap -histo command provides equivalent functionality.</p>
<p>&nbsp;<br />-XX:-PrintConcurrentLocks Print java.util.concurrent locks in Ctrl-Break thread dump. Manageable. (Introduced in 6.) The jstack -l command provides equivalent functionality.</p>
<p>&nbsp;<br />-XX:-PrintCommandLineFlags Print flags that appeared on the command line. (Introduced in 5.0.)</p>
<p>&nbsp;<br />-XX:-PrintCompilation Print message when a method is compiled.</p>
<p>&nbsp;<br />-XX:-PrintGC Print messages at garbage collection. Manageable. </p>
<p>&nbsp;<br />-XX:-PrintGCDetails Print more details at garbage collection. Manageable. (Introduced in 1.4.0.)</p>
<p>&nbsp;<br />-XX:-PrintGCTimeStamps Print timestamps at garbage collection. Manageable (Introduced in 1.4.0.)</p>
<p>&nbsp;<br />-XX:-PrintTenuringDistribution Print tenuring age information.</p>
<p>&nbsp;<br />-XX:-TraceClassLoading Trace loading of classes.</p>
<p>&nbsp;<br />-XX:-TraceClassLoadingPreorder Trace all classes loaded in order referenced (not loaded). (Introduced in 1.4.2.)</p>
<p>&nbsp;<br />-XX:-TraceClassResolution Trace constant pool resolutions. (Introduced in 1.4.2.)</p>
<p>&nbsp;<br />-XX:-TraceClassUnloading Trace unloading of classes.</p>
<p>&nbsp;<br />-XX:-TraceLoaderConstraints Trace recording of loader constraints. (Introduced in 6.)</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p><strong>This is an exmple heap configuration used by Application Server on Solaris for large applications:</strong></p><pre><strong>-Xms3584m
 -Xmx3584m
 -verbose:gc
 -Dsun.rmi.dgc.client.gcInterval=3600000</strong></pre>
<p><br /><strong>&nbsp;参考资料《Sun Java SystemApplication Server Enterprise Edition 8.2 PerformanceTuning Guide》</strong></p></span>]]></description>
		</item>
		    
		
		<item>
			<title>Solaris java启动shell脚本备忘</title>
			<link>http://longrenrex.blog.sohu.com/95356153.html</link>
			<comments>http://longrenrex.blog.sohu.com/95356153.html#comment</comments>
			<dc:creator>龙居</dc:creator>
			<pubDate>Thu, 24 Jul 2008 10:18:48 +0800</pubDate>
			<category>JAVA</category>
			<guid>http://longrenrex.blog.sohu.com/95356153.html</guid>
			<description><![CDATA[<p>&nbsp;&nbsp;&nbsp; 以前发过一篇 《Linux下Java应用程序启动 和 关闭脚本》，今天想在Solaris下面启动一套system test的程序，就拷贝了以前写的脚本，居然跑不起来，有问题！ 才知道unix和linux的shell是有细节上的区别的，版本不一样，写法也有区别，丢人了。。。</p>
<p>start.sh:</p>
<p>classpath=/export/home/wl92user/system_test/lib/common-axis.jar:/export/home/wl92user/system_test/lib/common-characteristics.jar:/export/home/wl92user/system_test/lib/common-executors.jar:/export/home/wl92user/system_test/lib/common-functiontest.jar:/export/home/wl92user/system_test/lib/common-handlers.jar:/export/home/wl92user/system_test/lib/common-messages.jar:/export/home/wl92user/system_test/lib/common-misc.jar:/export/home/wl92user/system_test/lib/common-misc.jar:/export/home/wl92user/system_test/lib/common-network.jar:/export/home/wl92user/system_test/lib/common-systemtest.jar:/export/home/wl92user/system_test/lib/common-xmlbeans.jar:/export/home/wl92user/system_test/lib/commons-codec-1.3.jar:/export/home/wl92user/system_test/lib/commons-httpclient-3.1.jar:/export/home/wl92user/system_test/lib/commons-io-1.4.jar:/export/home/wl92user/system_test/lib/mlp-all-plugin.jar:/export/home/wl92user/system_test/lib/ntt.jar:/export/home/wl92user/system_test/lib/mlp_system_mlp320.jar<br />java -cp &quot;$classpath&quot; se.ericsson.st.testcases.BATReciver &gt; console.log &amp; echo $! &gt; system_test_receiver.pid</p>
<p>&nbsp;</p>
<p>stop.sh:</p>
<p>#!/bin/sh</p>
<p>if [ -f system_test_receiver.pid ]; then<br />&nbsp;&nbsp;&nbsp; kill -9 `cat system_test_receiver.pid`<br />&nbsp;&nbsp;&nbsp; rm system_test_receiver.pid<br />else<br />&nbsp;&nbsp;&nbsp; echo &quot;Receiver not started (cant find system_test_receiver.pid)&quot;<br />fi<br /></p>]]></description>
		</item>
		    
		
		<item>
			<title>奋斗在广州【八】</title>
			<link>http://longrenrex.blog.sohu.com/94921710.html</link>
			<comments>http://longrenrex.blog.sohu.com/94921710.html#comment</comments>
			<dc:creator>龙居</dc:creator>
			<pubDate>Sat, 19 Jul 2008 13:33:49 +0800</pubDate>
			<category>记事本</category>
			<guid>http://longrenrex.blog.sohu.com/94921710.html</guid>
			<description><![CDATA[<p>&nbsp;&nbsp;&nbsp; 上一篇写了一下我做测试以后的心得体会，写的太多，怕字数超过限制，就再另外新建一个日志，写点其他的收获。</p>
<p>&nbsp;&nbsp;&nbsp; 以前一直想给自己写的一些程序加上GUI，不过直接用J2SE的API做GUI遇到复杂的效果很麻烦，而且比较丑陋，一直都没有想到什么好点子。最近无意中跟网友聊到这个，有人推荐用RCP（Rich Client Platform），就是使用ECLIPSE的包，来开发GUI，ECLIPSE里面的所有GUI效果都可以直接调用，很爽，呵呵。 只可惜相关的资料不多，或者有很多我不知道。在网友的帮助下，尝试做了一个简单的，效果不错，以后等有时间了，再详细研究。我们现在用的是爱立信自己做的一套ntt的测试框架来测试，如果有专门的人员来用RCP来做一套GUI，再配上ntt框架，没准可以做一套JAVA版本的loadrunner哦！！！&nbsp;Eclipse界面的loadrunner，真的很期待啊！！！！</p>
<p>&nbsp;&nbsp;&nbsp; 再一个收获就是关于APACHE的MINA开源项目。以前的短信网关客户端程序，我都是自己用标准J2SE API来实现的，效果也还不错。也是跟一些网友聊到一些问题的时候，有网友就说为什么不试一试mina? 其实mina我很早以前就看过，当时mina还是1.*版本，研究以后发现，跟自己的需求还是相差比较远。交流一下才知道，原来mina已经有2.*版本了，不过还在开发过程中，没有稳定版。我马上去下载下来研究，发现真是不错，然后用mina的框架写了一个短信网关客户端的demo，效果确实不错。 </p>
<p>&nbsp;&nbsp;&nbsp; 我以前设计通信程序分为2层， 一层是数据控制层，一层是业务控制层。 数据控制层 用来控制socket的连接和关闭（数据层会始终保持socket的连接），以及数据流的读写，把数据包转换成业务控制层所需要的业务对象。业务控制层，通过数据层得到业务对象，进行业务逻辑处理。mina 2.* 分层很细，最底层是I/O Service层，类似我的数据控制层，控制socket的连接和关闭等；第二层是I/O Fliter Chain,可以根据需要添加fliter 或者 encoder/decoder 或者 其他东东，功能异常强大，而且还无限扩展，第三层是I/O Handler，其中包含一个I/O Session（这个类似SOCKET的句柄）的事件接口，通过底层出发各种事件，来实现复杂的业务逻辑。这样省去了我们不少麻烦，特别是对线程的控制，mina的底层都自动实现了，很好，很强大。APACHE的官方网站上面说，这个东东都快赶上C/C++做的socket程序了，我是强烈推荐大家使用的，同时也非常期待2.*的正式版出现。</p>
<p>&nbsp;&nbsp; 最近还发现一个好东东，就是jpcap。由于一直做电信行业，经常需要使用wireshark来抓包，总在想能不能用java实现类似功能呢？有人就告诉我，用jpcap!其实jpcap就是使用jni，来调用pcap的库，来实现这种功能的。我试了一下，还真不错， 唯一麻烦的就是要先安装PCAP的包。基本上通过jpcap能解决大部分的通信协议问题，不过使用jni会有1ms左右的延时，这一点还是要注意的。目前正在研究中，看具体能用jpcap做点什么有意思的东东（Java版本的wireshark? 貌似意义不大）。对了，jpcap里面不光有java代码，还有C的代码，幸好我早些时候复习了一下C，把C的内容基本上都拣起来了（总算大学没白读，唉），这时候正好看看。</p>
<p>&nbsp;&nbsp; 今天到此为止，现在去跑步，锻炼身体，就不在多写了。 咱们做程序的，每天都是坐N长时间，还要对着PC，平时还是要多运动，加强锻炼，走可持续发展的道路，呵呵。</p>
<p>&nbsp; </p>]]></description>
		</item>
		    
		
		<item>
			<title>奋斗在广州【七】</title>
			<link>http://longrenrex.blog.sohu.com/94917155.html</link>
			<comments>http://longrenrex.blog.sohu.com/94917155.html#comment</comments>
			<dc:creator>龙居</dc:creator>
			<pubDate>Fri, 18 Jul 2008 20:35:54 +0800</pubDate>
			<category>记事本</category>
			<guid>http://longrenrex.blog.sohu.com/94917155.html</guid>
			<description><![CDATA[<p>&nbsp;&nbsp;&nbsp; 有一段时间没写《奋斗在广州》的日记了，主要是因为最近不是太闲了就是太忙了。 闲的时候，没事情可写；忙的时候没功夫来写。现在还是老样子，忙着做SIG测试，测出来问题我就闲了，没测出来问题我就忙了。废话不多写，就写写最近我在忙啥。</p>
<p>&nbsp;&nbsp;&nbsp; 因为系统的功能测试基本上已经完成，现在主要做系统测试，也就是把那些功能测试的case合起来跑，多线程大压力的并发，然后挂几天，看看系统死不死。不用说大家都猜到了，不死才怪！出了问题，就要找问题，然后解决，接下来再发新版本，再测。。。 测试跟开发一样，是个迭代的过程。在测试爱立信项目和产品的同时， 我想到了自己以前做的一些程序，就想用类似的测试方法来测一下自己以前做的东西。尝试做了一下，发现了不少问题。比如，以前做的一个短信网关客户端程序， 短信包的包头是定长而包体是变长的，包体的大小，在包头内有描述，所以 我就在程序里面先读取包头，再动态去分配内存空间去存放包体。当压力小的时候还好，把压力加很大的时候，JVM来不及回收包体动态分配的heap内存，很快就会oom。 有人可能会说一开始把内存分配好，不过java 的nio好像没有提供从socket中读取指定大小字节的API，旧的IO有。或者 还有解决方法是，一次读取全部的数据包，然后进行分析，把多余的字节跟下一个数据包合并之类,等等。如果java有类似C语言的指针，这个问题就不是问题了，哈哈。这里就不详细讨论这个问题了，反正是解决了，呵呵。我这里要表达的意思是，测试还是很重要的，特别是大压力的并发测试，很容易发现一些代码的BUG。IBM 上有一篇文章，关于Java 基准测试，讲的不错：<a href="http://www.ibm.com/developerworks/cn/java/j-benchmark1.html?S_TACT=105AGX52&S_CMP=tec-csdn">http://www.ibm.com/developerworks/cn/java/j-benchmark1.html?S_TACT=105AGX52&amp;S_CMP=tec-csdn</a>。 同样的代码，在不同环境下的运行效果也是千差万别。由此我想到去年做的SOA项目，运行环境很复杂，而且系统耦合环节很多，测试找BUG实在不容易。 我当时主要做的是持久层EJB3那一块，幸好当时leader要求的是分层测试，先针对所有ENTITY测试，然后放在容器内部测试基本的方法，最后再测业务层，即使是这样，还是有不少问题，但是这样的做法是相当好的，值得称赞。 有很多使用持久层框架的项目，直接生成DAO对象，然后就开始写，根本不对基本的DAO进行测试，而是直接测试业务层， 这未免对那些自动生成的代码太有信心了。往往都是业务层发现问题，再反过来找DAO有那些问题。其实自动生成的这些代码，还是有不少问题的，比如 最终生成的SQL有可能无法利用创建的数据库索引 之类的问题。</p>
<p>&nbsp;&nbsp; 再来说说测试的一些工具，loadrunner 之类的工具，想想很多人都很熟，基本上就是做好测试脚本挂着跑就OK。但是 难在根据生成的报表来看问题在哪里，就比较麻烦了。所以，我们有时候拿Jprofile之类的工具，来做一下内存和CPU的监控，看看问题具体在哪里。但是 这样也有一些问题，就是Jprofile这些工具往往对系统有影响，不能长时间监控，否则影响系统性能。还有，有些系统已经是on site的，都不会让你轻易操作的，一点办法都没有。有的时候，还真要动动脑筋才行。不过相对于WINDOWS系统，linux或者unix还相对容易调试一点，可以用jstack或者jmap/jstat/jps来看看。在linux或者unix下，程序挂了，还有core文件，可以看看，例如：/usr/lib/jvm/java-6-sun-1.6.0.03/bin/jstack /usr/lib/jvm/java-6-sun-1.6.0.03/bin/java core.5216&nbsp; &gt; jstack-5216.txt 和/usr/lib/jvm/java-6-sun-1.6.0.03/bin/jmap -histo /usr/lib/jvm/java-6-sun-1.6.0.03/bin/java core.5216&nbsp; &gt; jmap-histo-5216.txt。</p>
<p>&nbsp;&nbsp; 虽说测试很重要，但是在国内这种氛围，对测试并不是很重视。国内重视的是开发效率，越快干完越好，不求精。其实说白了，主要是一个成本问题，没人又没钱。开发都没人月，还要给你几倍于开发的人月去做测试？ 白日做梦。。。但是咱们还是要重视测试这个环节，有总比没有好。</p>
<p>&nbsp;&nbsp; 总而言之，今年做测试让我收获不小，学到不少东西，能够看到开发角度看不到的。我想再干2年基层工作也该转型了，我又不能喝酒又不能抽烟，吹水也不咋的，做市场是没机会了，看以后能不能往项目管理方向发展，希望通过做一段时间的测试工作，来积累一些经验。 下一步，估计 要学怎么做项目管理了，学习怎么做需求，怎么做概要设计，怎么计算人月等等，要走的路还很长啊！！！ 要是能参加相关培训就好了，可是我是不会自己掏钱去学的，公司要是愿意出钱就好了。不过目前我最想参加的培训是oracle数据库培训和linux系统开发和管理培训，可惜咱既没钱又没机会，惨。。。谁能比我惨啊。。。。</p>
<p>&nbsp;&nbsp;&nbsp; 现在写了不少了，还没写完，就新建一篇再写吧，下一篇写点最近技术上的收获，待续。。。。</p>]]></description>
		</item>
		    
		
		<item>
			<title>Linux平台下使用JNI备忘【原创】</title>
			<link>http://longrenrex.blog.sohu.com/93931899.html</link>
			<comments>http://longrenrex.blog.sohu.com/93931899.html#comment</comments>
			<dc:creator>龙居</dc:creator>
			<pubDate>Mon, 7 Jul 2008 20:59:41 +0800</pubDate>
			<category>JAVA</category>
			<guid>http://longrenrex.blog.sohu.com/93931899.html</guid>
			<description><![CDATA[<p>&nbsp; 几年前在博客上写过一篇java的jni调用的文章（<a href="http://longrenrex.blog.sohu.com/579247.html">http://longrenrex.blog.sohu.com/579247.html</a>），当时是以在windows平台下为例，使用的是VC 6.0来编译的。现在来做个备忘笔记，介绍一下在linux下，如何使用jni，本文主要以IBM的一篇文章的例子为说明（<a href="http://www.ibm.com/developerworks/cn/java/l-linux-jni/">http://www.ibm.com/developerworks/cn/java/l-linux-jni/</a>）。linux下面使用jni稍微比windows下麻烦一点， 因为缺少一些类似VS2005这样强大的IDE，以及设置环境变量和权限管理等问题。</p>
<p>1 首先创建一个简单的java类：</p>
<p>public class Hello<br />{<br />&nbsp;static<br />&nbsp;{<br />&nbsp;&nbsp;try<br />&nbsp;&nbsp;{<br />//此处即为本地方法所在链接库名<br />&nbsp;&nbsp;&nbsp;System.loadLibrary(&quot;hello&quot;);<br />&nbsp;&nbsp;}<br />&nbsp;&nbsp;catch(UnsatisfiedLinkError e)<br />&nbsp;&nbsp;{<br />&nbsp;&nbsp;&nbsp;System.err.println( &quot;Cannot load hello library:\n &quot; +<br />&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; e.toString() );<br />&nbsp;&nbsp;}<br />&nbsp;}<br />&nbsp;public Hello()<br />&nbsp;{<br />&nbsp;}<br />//声明的本地方法<br />&nbsp;&nbsp;public native void SayHello(String strName);<br />} <br /></p>
<p>这里有个地方要注意，就是 这个类最好不要加包信息。因为类在某个包下面，使用javah命令生成*.h头文件的时候会不一样，比如在test包下，就会生成test_Hello.h，而且 在jni调用的时候，也有区别，为了方便起见，我们这里就不加包了，免得麻烦，：）</p>
<p>然后，编译得到Hello.class</p>
<p>2 生成 Hello.h</p>
<p>&nbsp; 使用命令：javah Hello</p>
<p>&nbsp;&nbsp;这里有一点要注意，如果这个命令报错，有可能是因为你没有当前路径到设置环境变量classpath中，所以：</p>
<p>&nbsp; javah -classpath&quot;.&quot; Hello.class</p>
<p>&nbsp;&nbsp;这样写就OK了， 后面也许还会碰到类似的环境变量问题。</p>
<p>3 在与Hello.h相同的路径下创建一个CPP文件Hello.cpp。内容如下：</p><pre>#include &quot;Hello.h&quot;<br />JNIEXPORT void JNICALL Java_Hello_SayHello&nbsp; (JNIEnv * env, jobject arg, jstring instring)<br />{<br />    const char *str = env-&gt;GetStringUTFChars( instring, JNI_FALSE );<br />&nbsp;&nbsp;&nbsp; printf(&quot;Hello,%s\n&quot;,str);<br />&nbsp;&nbsp;&nbsp; env-&gt;ReleaseStringUTFChars( instring, str );<br />&nbsp;&nbsp;&nbsp; return;<br />}<br /></pre><pre>这个Hello.cpp 的代码，跟IBM的例子略有不同，详细原因 大家自己去查查jni.h</pre><pre>4．编译生成共享库</pre><pre>a. 编译命令，生成Hello.o</pre><pre>g++ -I /usr/lib/jvm/java-6-sun-1.6.0.03/include -I /usr/lib/jvm/java-6-sun-1.6.0.03/include/linux -fPIC -c Hello.cpp<br /></pre><pre>b.生成动态库文件，libhello.so.1.0</pre><pre>g++ -shared -Wl,-soname,libhello.so.1 -o libhello.so.1.0 Hello.o</pre><pre>这里的2个命令也跟IBM文章的例子有所不同。因为使用gcc编译得到动态库，在jni调用的时候，某些情况会有异常,所以这里改用g++。</pre><pre><p>接下来将生成的共享库拷贝为标准文件名</p><p><code>cp libhello.so.1.0 libhello.so</code> </p><p>最后通知动态链接程序此共享文件的路径。</p><p><code>export LD_LIBRARY_PATH='pwd':$LD_LIBRARY_PATH</code> </p><p>&nbsp;</p><p>这里用<code>export 加入共享文件的路径，有时候会有点问题，比如：环境变量不会马上更新等等。</code></p><p><code>还有一个办法，就是直接将libhello.so拷贝到 /usr/lib 或者/lib 等系统库目录下</code></p><p><code></code>&nbsp;</p><p><code></code>&nbsp;</p><code></code><p><code>5．编写一个简单的Java程序来测试我们的本地方法。</code></p><p>将如下源码存为ToSay.java：</p><p><table cellspacing="0" cellpadding="0" border="0"><tbody><tr><td><pre>import Hello;
import java.util.*;
public class ToSay
{
	public static void main(String argv[])
	{
		ToSay say = new ToSay();
	}
	public ToSay()
	{
		Hello h = new Hello();
		//调用本地方法向John问好
		h.SayHello(&quot;John&quot;);			
	}
}
用javac编译ToSay.java，生成ToSay.class <br />向执行普通Java程序一样使用java ToSay，我们会看到在屏幕上出现Hello,John。</pre><pre>&nbsp;</pre><pre>6 以下是IBM的文章中的建议：</pre><pre><p><a name="3"><span>应用中注意事项</span></a></p><p></p><p>1． 如果可以通过TCP/IP实现Java代码与本地C/C++代码的交互工作，那么最好不使用以上提到的JNI的方式，因为一次JNI调用非常耗时，大概要花0.5～1个毫秒。</p><p>2． 在一个Applet应用中，不要使用JNI。因为在 applet 中可能引发安全异常。</p><p>3． 将所有本地方法都封装在单个类中，这个类调用单个 DLL。对于每种目标操作系统，都可以用特定于适当平台的版本替换这个 DLL。这样就可以将本地代码的影响减至最小，并有助于将以后所需的移植问题包含在内。</p><p>4． 本地方法要简单。尽量将生成的DLL 对任何第三方运行时 DLL 的依赖减到最小。使本地方法尽量独立，以将加载DLL 和应用程序所需的开销减到最小。如果必须要运行时 DLL，则应随应用程序一起提供它们。</p><p>5． 本地代码运行时，没有有效地防数组越界错误、错误指针引用带来的间接错误等。所以必须保证保证本地代码的稳定性，因为，丝毫的错误都可能导致Java虚拟机崩溃。</p><p>&nbsp;</p><p>&nbsp;</p><p>7 思考的问题</p><p>我以前一直在想，如果我使用JNI来调用C/C++实现的socket通信库，会不会比java自己的nio性能要好？</p><p>貌似IBM的兄弟的回答是否定的。</p><p>&nbsp;</p><p>也有人不同意这个意见，说因为Java代码与本地C/C++代码的交互工作不是那么频繁，即使浪费0.5～1个毫秒，也没什么关系。</p><p>貌似BEA的weblogic(weblogic 9以后的版本)就采用这个做法。</p><p>&nbsp;</p><p>以后有机会我就来做个试验，测试一下，呵呵。</p><p>当然，有兄弟已经测试过了，来告诉我，就最好啦，哈哈！！！</p></pre></td></tr></tbody></table></p><p>&nbsp;</p></pre>]]></description>
		</item>
		    
		
		<item>
			<title>classloader相关基础知识 [转]</title>
			<link>http://longrenrex.blog.sohu.com/90363579.html</link>
			<comments>http://longrenrex.blog.sohu.com/90363579.html#comment</comments>
			<dc:creator>龙居</dc:creator>
			<pubDate>Tue, 17 Jun 2008 17:30:28 +0800</pubDate>
			<category>JAVA</category>
			<guid>http://longrenrex.blog.sohu.com/90363579.html</guid>
			<description><![CDATA[<ul><strong>JVM</strong> <br />jvm是jre里头一个动态连接函数库,jdk里面的jre一般用于运行java本身的程序,比如javac,等等.programfiles下面的jre用于运行用户编写的java程序. <br />JRE下的bin\client 或者 bin\server 的jvm.dll就是JVM了 <br /><br />当一台机器上有多个jvm可选择的时候,jvm的选择步骤: <br />1)当前目录有没有jre目录(不准确), <br />2)父目录下的jre子目录 <br />3)注册表HEKY_LOCAL_MACHINE\SoftWare\Java\Java Runtime Environment\ <br />所以当运行的是jdk\bin\java.exe的时候,用的jre是bin的父目录jdk下面的jre\ <br />运行java.exe找到了jre后有一个验证程序,验证jre和java.exe的版本是否一致,如果不一致则会发生错误</ul><br />
<ul>java -verbose:class Main 显示调用的详细信息</ul><br />
<ul><span><span>classloader</span></span>的两种载入方式:1)pre-loading预先载入,载入基础类 2)load-on-demand按需求载入 <br />只有实例化一个类才会被<span><span>classloader</span></span>载入,仅仅申明并不会载入</ul><br />
<ul><strong>java动态载入class的两种方式:</strong> <br />1)implicit隐式,即利用实例化才载入的特性来动态载入class <br />2)explicit显式方式,又分两种方式: <br />1)java.lang.Class的forName()方法 <br />2)java.lang.<span><span>ClassLoader</span></span>的loadClass()方法</ul><br />
<ul><strong>static块在什么时候执行?</strong> <br />1)当调用forName(String)载入class时执行,如果调用<span><span>ClassLoader</span></span>.loadClass并不会执行.forName(String,false,<span><span>ClassLoader</span></span>)时也不会执行. <br />2)如果载入Class时没有执行static块则在第一次实例化时执行.比如new ,Class.newInstance()操作 <br />3)static块仅执行一次</ul><br />
<ul><br /><strong>Class类的实例.</strong> <br />&gt;&gt;Class类无法手工实例化,当载入任意类的时候自动创建一个该类对应的Class的实例, <br />&gt;&gt;某个类的所有实例内部都有一个栏位记录着该类对应的Class的实例的位置., <br />&gt;&gt;每个java类对应的Class实例可以当作是类在内存中的代理人.所以当要获得类的信息(如有哪些类变量,有哪些方法)时,都可以让类对应的Class的实例代劳.java的Reflection机制就大量的使用这种方法来实现 <br />&gt;&gt;每个java类都是由某个<span><span>classLoader</span></span>(<span><span>ClassLoader</span></span>的实例)来载入的,因此Class类别的实例中都会有栏位记录他的<span><span>ClassLoader</span></span>的实例,如果该栏位为null,则表示该类别是由bootstrap loader载入的(也称root laoder),bootstrap loader不是java所写成,所以没有实例. <br /><br />原生方法:forName0()等方法,native修饰符</ul><br />
<ul><strong>自定义<span><span>ClassLoader</span></span>:</strong> <br />如实例化一个URL<span><span>ClassLoader</span></span>. URL<span><span>ClassLoader</span></span> ucl = new URL<span><span>ClassLoader</span></span>(new URL[]{new URL(&quot;file:/e:/bin/&quot;)}),URL<span><span>ClassLoader</span></span>优先找当前目录,再在url中找.class加载.URL中别忘在最后加&quot;/&quot;表示目录</ul><br />
<ul><strong>各个java类由哪些<span><span>classLoader</span></span>加载?</strong> <br />1)java类可以通过实例.getClass.get<span><span>ClassLoader</span></span>()得知 <br />2)接口由App<span><span>ClassLoader</span></span>(System <span><span>ClassLoader</span></span>,可以由<span><span>ClassLoader</span></span>.getSystem<span><span>ClassLoader</span></span>()获得实例)载入 <br />3)<span><span>ClassLoader</span></span>类由bootstrap loader载入</ul><br />
<ul><strong><span><span>ClassLoader</span></span> hierachy:</strong> <br />jvm建立-&gt;初始化动作-&gt;产生第一个<span><span>ClassLoader</span></span>,即bootstrap loader-&gt;bootstrap loader在sum.misc.Launcher类里面的Ext<span><span>ClassLoader</span></span>,并设定其Parent为null-&gt;bootstrap loader载入sun.misc.Launcher$App<span><span>ClassLoader</span></span>,并设定其parent为Ext<span><span>ClassLoader</span></span>(但是App<span><span>ClassLoader</span></span>也是由bootstrap loader所载入的)-&gt;App<span><span>ClassLoader</span></span>载入各个xx.class,xx.class也有可能被Ext<span><span>classLoader</span></span>或者bootstrap loader载入. <br />&gt;&gt;自定义的<span><span>ClassLoader</span></span>的.getParent()是App<span><span>ClassLoader</span></span>.parent和他的加载器并没有关系 <br />&gt;&gt;Ext<span><span>ClassLoader</span></span>和App<span><span>ClassLoader</span></span>都是URL<span><span>ClassLoader</span></span>的子类.App<span><span>ClassLoader</span></span>的URL是由系统参数java.class.path取出的字符串决定,而java.class.path由 运行java.exe时 的-cp或-classpath或CLASSPATH环境变量决定 <br />&gt;&gt;Ext<span><span>ClassLoader</span></span>查找的url是系统变量java.ext.dirs,java.ext.dirs默认为jdk\jre\lib\ext <br />&gt;&gt;Bootstrap loader的查找url是sun.boot.class.path <br />&gt;&gt;在程序运行后调用System.setProperty()来改变系统变量并不能改变以上加载的路径,因为<span><span>classloader</span></span>读取在System.setProperty之前.sun.boot.class.path是在程序中写死的,完全不能修改 <br /><br />委派模型 <br />当<span><span>classloader</span></span>有类需要载入时先让其parent搜寻其搜寻路径帮忙载入,如果parent找不到,在由自己搜寻自己的搜寻路径载入,<span><span>ClassLoader</span></span> hierachy本来就有这种性质</ul><br />
<ul><br /><strong>NoClassDefFoundError和ClassNotFoundException</strong> <br />NoClassDefFoundError:当java源文件已编译成.class文件,但是<span><span>ClassLoader</span></span>在运行期间在其搜寻路径load某个类时,没有找到.class文件则报这个错 <br />ClassNotFoundException:试图通过一个String变量来创建一个Class类时不成功则抛出这个异常</ul>]]></description>
		</item>
		    
		
	</channel>
</rss>
