Subscribe: Untitled
http://www.freepatentsonline.com/rssfeed/rssapp712.xml
Added By: Feedage Forager Feedage Grade B rated
Language: English
Tags:
branch  computer processor  configured  execution  includes  instruction  instructions  processing  processor  register  vector 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: Untitled

Untitled





 



Method, apparatus, and system for speculative abort control mechanisms

Thu, 25 Aug 2016 08:00:00 EDT

An apparatus and method is described herein for providing robust speculative code section abort control mechanisms. Hardware is able to track speculative code region abort events, conditions, and/or scenarios, such as an explicit abort instruction, a data conflict, a speculative timer expiration, a disallowed instruction attribute or type, etc. And hardware, firmware, software, or a combination thereof makes an abort determination based on the tracked abort events. As an example, hardware may make an initial abort determination based on one or more predefined events or choose to pass the event information up to a firmware or software handler to make such an abort determination. Upon determining an abort of a speculative code region is to be performed, hardware, firmware, software, or a combination thereof performs the abort, which may include following a fallback path specified by hardware or software. And to enable testing of such a fallback path, in one implementation, hardware provides software a mechanism to always abort speculative code regions.



PROCESSOR EXCEPTION HANDLING

Thu, 25 Aug 2016 08:00:00 EDT

Data processing apparatus comprises a processor configured to execute instructions, the processor having a pipelined instruction fetching unit configured to fetch instructions from memory during a pipeline period of two or more processor clock cycles prior to execution of those instructions by the processor; exception logic configured to respond to a detected processing exception having an exception type selected from a plurality of exception types, by storing a current processor status and diverting program flow to an exception address dependent upon the exception type so as to control the instruction fetching unit to initiate fetching of an exception instruction at the exception address; and an exception cache configured to cache information, for at least one of the exception types, relating to execution of the exception instruction at the exception address corresponding to that exception type and to provide the cached information to the processor in response to detection of an exception of that exception type.



MEMORY CONTROLLER AND DECODING METHOD

Thu, 25 Aug 2016 08:00:00 EDT

According to one embodiment, a memory controller includes a decoder configured to perform approximate maximum likelihood decoding, the decoder including: an initial value generation unit configured to calculate first data on the basis of a received word read from a non-volatile memory; a storage unit configured to store the first data and a predetermined number of second data; an update unit configured to calculate new second data by using the predetermined number of second data stored and update the storage unit; an arithmetic unit configured to output an addition result of the first data and the latest second data as decoded word information; and a selection unit configured to select a decoded word with the maximum likelihood on the basis of a plurality of the decoded word information.



PATH SELECTION BASED ACCELERATION OF CONDITIONALS IN COARSE GRAIN RECONFIGURABLE ARRAYS (CGRAS)

Thu, 25 Aug 2016 08:00:00 EDT

The present invention discloses a solution to accelerate control flow loops by utilizing the branch outcome. The embodiments of the invention eliminate fetching and execution of unnecessary operations and also the overhead due to predicate communication thus overcoming the inefficiencies associated with existing techniques. Experiments on several benchmarks are also disclosed, demonstrating that the present invention achieves optimal acceleration with minimum hardware overhead.



TECHNIQUE FOR TRANSLATING DEPENDENT INSTRUCTIONS

Thu, 25 Aug 2016 08:00:00 EDT

In response to determining an operation is a dependent operation, a mapper of a processor determines the source registers of the operation from which the dependent operation depends. The mapper translates the dependent operation to a new operation that uses as its source operands at least one of the determined source registers and a source register of the dependent operation. The new operation is independent of other pending operations and therefore can be executed without waiting for execution of other operations, thus reducing execution latency.



EFFICIENT INSTRUCTION FUSION BY FUSING INSTRUCTIONS THAT FALL WITHIN A COUNTER-TRACKED AMOUNT OF CYCLES APART

Thu, 25 Aug 2016 08:00:00 EDT

A technique to enable efficient instruction fusion within a computer system. In one embodiment, a processor logic delays the processing of a second instruction for a threshold amount of time if a first instruction within an instruction queue is fusible with the second instruction.



Hardware Instruction Generation Unit for Specialized Processors

Thu, 25 Aug 2016 08:00:00 EDT

Methods, devices and systems are disclosed that interface a host computer to a specialized processor. In an embodiment, an instruction generation unit comprises attribute, decode, and instruction buffer stages. The attribute stage is configured to receive a host-program operation code and a virtual host-program operand from the host computer and to expand the virtual host-program operand into an operand descriptor. The decode stage is configured to receive the first operand descriptor and the host-program operation code, convert the host-program operation code to one or more decoded instructions for execution by the specialized processor, and allocate storage locations for use by the specialized processor. The instruction buffer stage is configured to receive the decoded instruction, place the one or more decoded instructions into one or more instruction queues, and issue decoded instructions from at least one of the one or more instruction queues for execution by the specialized processor.



PESSIMISTIC DEPENDENCY HANDLING

Thu, 25 Aug 2016 08:00:00 EDT

Techniques are disclosed relating to handling dependencies between instructions. In one embodiment, an apparatus includes decode circuitry and dependency circuitry. In this embodiment, the decode circuitry is configured to receive and instruction that specifies a destination location and determine a first storage region that includes the destination location. In this embodiment, the storage region is one of a plurality of different storage regions accessible by instructions processed by the apparatus. In this embodiment, the dependency circuitry is configured to stall the instruction until one or more older instructions that specify source locations in the first storage region have read their source locations. The disclosed techniques may be described as “pessimistic” dependency handling, which may, in some instances, maintain performance while limiting complexity, power consumption, and area of dependency logic.



APPARATUS AND METHOD FOR LOW-LATENCY INVOCATION OF ACCELERATORS

Thu, 25 Aug 2016 08:00:00 EDT

An apparatus and method are described for providing low-latency invocation of accelerators. For example, a processor according to one embodiment comprises: a command register for storing command data identifying a command to be executed; a result register to store a result of the command or data indicating a reason why the commend could not be executed; execution logic to execute a plurality of instructions including an accelerator invocation instruction to invoke one or more accelerator commands, the accelerator invocation instruction to store command data specifying the command within the command register; one or more accelerators to read the command data from the command register and responsively attempt to execute the command identified by the command data, wherein if the one or more accelerators successfully execute the command, the one or more accelerators are to store result data comprising the results of the command in the result register; and if the one or more accelerators cannot successfully execute the command, the one or more accelerators are to store result data indicating a reason why the command cannot be executed, wherein the execution logic is to temporarily halt execution until the accelerator completes execution or is interrupted, wherein the accelerator includes logic to store its state if interrupted so that it can continue execution at a later time.



MULTI-LEVEL HIERARCHICAL ROUTING MATRICES FOR PATTERN-RECOGNITION PROCESSORS

Thu, 18 Aug 2016 08:00:00 EDT

Multi-level hierarchical routing matrices for pattern-recognition processors are provided. One such routing matrix may include one or more programmable and/or non-programmable connections in and between levels of the matrix. The connections may couple routing lines to feature cells, groups, rows, blocks, or any other arrangement of components of the pattern-recognition processor.



RECONFIGURABLE GRAPH PROCESSOR

Thu, 18 Aug 2016 08:00:00 EDT

A graph processor has a planar matrix array of system resources. Resources in a same matrix or different planar matrices are interconnected through port blocks or global switched memories. Each port block includes a broadcast switch element and a receive switch element. The graph processor executes atomic execution paths that are generated from data flow graphs or computer programs by a scheduler. The scheduler linearizes resources and memories. The scheduler further maintains a linearized score board for tracking states of the resources.



MANAGEMENT OF TRACKING QUEUES USED IN OUT-OF-ORDER PROCESSING WITHIN COMPUTING ENVIRONMENTS

Thu, 18 Aug 2016 08:00:00 EDT

A queue management capability enables allocation and management of tracking queue entries, such as load and/or store queue entries, at execution time. By introducing execution-time allocation of load/store queue entries, the allocation point of those entries is delayed further into the execution stage of the instruction pipeline, reducing the overall time the entry remains allocated to a specific instruction. The queue management capability may also resolve deadlock conditions resulting from execution-time allocation of the queue entries and/or provide a mechanism to avoid such deadlock conditions.



SPECULATIVE BRANCH HANDLING FOR TRANSACTION ABORT

Thu, 18 Aug 2016 08:00:00 EDT

Embodiments relate to speculative branch handling for transaction abort. An aspect includes detecting a beginning of a current execution of a transaction. Another aspect includes, based on detecting the beginning of the transaction, disabling speculative execution based on branch prediction of an initial branch instruction of the transaction, wherein the initial branch instruction branches to two possible paths, and wherein a first path of the two possible paths comprises an abort handler. Another aspect includes disabling updating of a history table for the initial branch instruction.



BRANCH TARGET BUFFER COLUMN PREDICTOR

Thu, 18 Aug 2016 08:00:00 EDT

A processor receives a first instruction with a first instruction address within a first instruction stream. The processor selects a row of a branch target buffer and a row of a one-dimensional array based on the first instruction address. The processor reads information in the current row of the one-dimensional array, where the current row of one-dimensional array includes a first target address and a column of the row of the branch target buffer expected to contain a second target address. The processor receives a second instruction within a second instruction stream, which includes a second instruction address equal to the first target address. The processor reads information included in the row of the branch target buffer, where the information included the row of the branch target buffer includes the second target address. The processor encounters a branch including a third target address within the first instruction stream.



LOAD QUEUE ENTRY REUSE FOR OPERAND STORE COMPARE HISTORY TABLE UPDATE

Thu, 18 Aug 2016 08:00:00 EDT

Embodiments relate to load queue entry reuse for operand store compare (OSC) history table update. An aspect includes allocating a load queue entry in a load queue to a load instruction that is issued into an instruction pipeline, the load queue entry comprising a valid tag that is set and a keep tag that is unset. Another aspect includes based on the flushing of the load instruction, unsetting the valid tag and setting the keep tag. Another aspect includes reissuing the load instruction into the instruction pipeline. Another aspect includes based on determining that the allocated load queue entry corresponds to the reissued load instruction, setting the valid tag and leaving the keep tag set. Another aspect includes based on completing the reissued load instruction, and based on the valid tag and the keep tag being set, updating the OSC history table corresponding to the load instruction.



LOAD QUEUE ENTRY REUSE FOR OPERAND STORE COMPARE HISTORY TABLE UPDATE

Thu, 18 Aug 2016 08:00:00 EDT

Embodiments relate to load queue entry reuse for operand store compare (OSC) history table update. An aspect includes allocating a load queue entry in a load queue to a load instruction that is issued into an instruction pipeline, the load queue entry comprising a valid tag that is set and a keep tag that is unset. Another aspect includes based on the flushing of the load instruction, unsetting the valid tag and setting the keep tag. Another aspect includes reissuing the load instruction into the instruction pipeline. Another aspect includes based on determining that the allocated load queue entry corresponds to the reissued load instruction, setting the valid tag and leaving the keep tag set. Another aspect includes based on completing the reissued load instruction, and based on the valid tag and the keep tag being set, updating the OSC history table corresponding to the load instruction.



DYNAMIC RESOURCE ALLOCATION ACROSS DISPATCH PIPES

Thu, 18 Aug 2016 08:00:00 EDT

Dynamic resource allocation is provided in which additional resources, such as additional architected registers, are provided to an instruction, if it is determined that resources in addition to those configured to be provided to the instruction are to be used for the particular instruction. An instruction to be executed is dispatched on a pipe of a pipeline and that pipe is configured to have a set number of architected registers for use by the instruction. However, if one or more other architected registers are needed, those additional architected registers are dynamically allocated to the instruction by assigning one or more source ports of an additional pipe to the instruction.



BRANCH TARGET BUFFER COLUMN PREDICTOR

Thu, 18 Aug 2016 08:00:00 EDT

A processor receives a first instruction with a first instruction address within a first instruction stream. The processor selects a row of a branch target buffer and a row of a one-dimensional array based on the first instruction address. The processor reads information in the current row of the one-dimensional array, where the current row of one-dimensional array includes a first target address and a column of the row of the branch target buffer expected to contain a second target address. The processor receives a second instruction within a second instruction stream, which includes a second instruction address equal to the first target address. The processor reads information included in the row of the branch target buffer, where the information included the row of the branch target buffer includes the second target address. The processor encounters a branch including a third target address within the first instruction stream.



METHOD, APPARATUS, AND SYSTEM FOR SPECULATIVE ABORT CONTROL MECHANISMS

Thu, 18 Aug 2016 08:00:00 EDT

An apparatus and method is described herein for providing robust speculative code section abort control mechanisms. Hardware is able to track speculative code region abort events, conditions, and/or scenarios, such as an explicit abort instruction, a data conflict, a speculative timer expiration, a disallowed instruction attribute or type, etc. And hardware, firmware, software, or a combination thereof makes an abort determination based on the tracked abort events. As an example, hardware may make an initial abort determination based on one or more predefined events or choose to pass the event information up to a firmware or software handler to make such an abort determination. Upon determining an abort of a speculative code region is to be performed, hardware, firmware, software, or a combination thereof performs the abort, which may include following a fallback path specified by hardware or software. And to enable testing of such a fallback path, in one implementation, hardware provides software a mechanism to always abort speculative code regions.



MICROPROCESSOR USING COMPRESSED AND UNCOMPRESSED MICROCODE STORAGE

Thu, 18 Aug 2016 08:00:00 EDT

A microprocessor includes compressed and uncompressed microcode memory storages, having N-bit wide and M-bit wide addressable words, respectively, where N



DYNAMIC WAVEFRONT CREATION FOR PROCESSING UNITS USING A HYBRID COMPACTOR

Thu, 18 Aug 2016 08:00:00 EDT

A method, a non-transitory computer readable medium, and a processor for repacking dynamic wavefronts during program code execution on a processing unit, each dynamic wavefront including multiple threads are presented. If a branch instruction is detected, a determination is made whether all wavefronts following a same control path in the program code have reached a compaction point, which is the branch instruction. If no branch instruction is detected in executing the program code, a determination is made whether all wavefronts following the same control path have reached a reconvergence point, which is a beginning of a program code segment to be executed by both a taken branch and a not taken branch from a previous branch instruction. The dynamic wavefronts are repacked with all threads that follow the same control path, if all wavefronts following the same control path have reached the branch instruction or the reconvergence point.



SPECULATIVE BRANCH HANDLING FOR TRANSACTION ABORT

Thu, 18 Aug 2016 08:00:00 EDT

Embodiments relate to speculative branch handling for transaction abort. An aspect includes detecting a beginning of a current execution of a transaction. Another aspect includes, based on detecting the beginning of the transaction, disabling speculative execution based on branch prediction of an initial branch instruction of the transaction, wherein the initial branch instruction branches to two possible paths, and wherein a first path of the two possible paths comprises an abort handler. Another aspect includes disabling updating of a history table for the initial branch instruction.



VECTOR OPERATIONS WITH OPERAND BASE SYSTEM CONVERSION AND RE-CONVERSION

Thu, 18 Aug 2016 08:00:00 EDT

Methods and apparatuses relating to vector operations with operand base system conversion and re-conversion are described. In one embodiment, a method includes executing a single instruction by receiving a vector element of a first input vector and a vector element of a second input vector expressed in a first base system, converting the vector elements into a second lower base system to form a converted vector element of the first input vector and a converted vector element of the second input vector, performing an operation on the converted vector element of the first input vector and the converted vector element of the second input vector to form a result, accumulating in a register a portion of the result with a portion of a result of a prior operation expressed in the second lower base system, and converting contents of the register into the first base system.



SYSTEM, APPARATUS, AND METHOD FOR IMPROVED EFFICIENCY OF EXECUTION IN SIGNAL PROCESSING ALGORITHMS

Thu, 18 Aug 2016 08:00:00 EDT

Embodiments of methods, apparatuses, and machine-readable mediums for performing a bit reversal instruction in a computer processor are described. In some embodiments, the execution of such instruction causes the bit ordering for a source operand to be reversed and stored.



METHOD, INFORMATION PROCESSING APPARATUS, AND MEDIUM

Thu, 18 Aug 2016 08:00:00 EDT

A method includes: calculating a percentage of an instruction belonging to a certain instruction type among instruction types included in each of a plurality of blocks partitioned from a program; extracting an execution address and a number of execution instructions from an arithmetic processing unit that executes the program and performs sampling of the execution address and the number of execution instructions at a plurality of time points, calculating a first execution frequency of the instruction included in each of the plurality of blocks based on the extracted execution address and the number of execution instructions; calculating a second execution frequency of the instruction belonging to the instruction type by multiplying the first execution frequency of the block by the percentage of the instruction in the block; calculating total number of second execution frequencies calculated for each of the plurality of blocks.



Instruction Class for Digital Signal Processors

Thu, 18 Aug 2016 08:00:00 EDT

A class of digital signal processor instructions, comprising at least a first instruction type and a second instruction type, is proposed. The class of instructions may be added to the instruction set of a digital signal vector processor and a program instruction is selected from the digital signal processor instruction set. The digital signal processor is adapted to cause execution of a method comprising obtaining a program instruction, selecting a real valued input as one of a first real valued input and a second real valued input (the first and second real valued inputs organized as adjacent elements of a first input vector), performing an arithmetic operation on the selected real valued input to provide a real valued result, and providing a first real valued output and a second real valued output during a first operation cycle (the first and second real valued outputs organized as adjacent elements of a second output vector). The real valued input is selected as the first real valued input if the program instruction is of the first instruction type and as the second real valued input if the program instruction is of the second instruction type. Furthermore (if the program instruction is of one of the first instruction type and the second instruction type), the real valued result is provided as the first real valued output and as the second real valued output, and the second output vector is a real valued second output vector for real-complex multiplication with a complex valued third vector.



Apparatus of wave-pipelined circuits

Thu, 11 Aug 2016 08:00:00 EDT

The present invention classifies all critical paths into two basic types: a series critical path and a feedback critical path, and divides each of wave-pipelined circuits into two components: a static logic part, called critical path component (CPC), and a dynamic logic part, formalized into four wave-pipelining components (WPC) shared by all wave-pipelined circuits. Each wave-pipelining ready code in HDL comprises two components: a WPC instantiation and a CPC instantiation wire-connected and linked by a new link statement. Each WPC has new wave constants which play the same role as generic constants do, but whose initial values are determined and assigned by a synthesizer after code analysis, so designers can use after-synthesization information in their code before synthesization for wave-pipelining technology. The responsibility of analyzing and manipulating wave-pipelining ready code, generating and implementing wave-pipelined circuits on a design-wide or chip-wide scale in HDL is shifted from designers to synthesizers.



INTERCONNECT CIRCUITS AT THREE DIMENSIONAL (3-D) BONDING INTERFACES OF A PROCESSOR ARRAY

Thu, 11 Aug 2016 08:00:00 EDT

Embodiments of the invention relate to processor arrays, and in particular, a processor array with interconnect circuits for bonding semiconductor dies. One embodiment comprises multiple semiconductor dies and at least one interconnect circuit for exchanging signals between the dies. Each die comprises at least one processor core circuit. Each interconnect circuit corresponds to a die of the processor array. Each interconnect circuit comprises one or more attachment pads for interconnecting a corresponding die with another die, and at least one multiplexor structure configured for exchanging bus signals in a reversed order.



HETEROGENEOUS MULTICORE PROCESSOR WITH GRAPHENE-BASED TRANSISTORS

Thu, 11 Aug 2016 08:00:00 EDT

Techniques described herein generally include methods and systems related to the use of processors that include graphene-containing computing elements while minimizing or otherwise reducing the effects of high leakage energy associated with graphene computing elements. Furthermore, embodiments of the present disclosure provide systems and methods for scheduling instructions for processing by a chip multiprocessor that includes graphene-containing computing elements arranged in multiple processor groups.



ELECTRONIC DEVICE FOR PACKING MULTIPLE COMMANDS IN ONE COMPOUND COMMAND FRAME AND ELECTRONIC DEVICE FOR DECODING AND EXECUTING MULTIPLE COMMANDS PACKED IN ONE COMPOUND COMMAND FRAME

Thu, 11 Aug 2016 08:00:00 EDT

An electronic device includes a control circuit and a bus interface. The control circuit packs a plurality of commands in a compound command frame. The bus interface communicates with another electronic device via a bus between the electronic device and the another electronic device, and packs the compound command frame in a single packet and transmits the single packet over the bus.



Last Branch Record Indicators For Transactional Memory

Thu, 11 Aug 2016 08:00:00 EDT

In one embodiment, a processor includes an execution unit and at least one last branch record (LBR) register to store address information of a branch taken during program execution. This register may further store a transaction indicator to indicate whether the branch was taken during a transactional memory (TM) transaction. This register may further store an abort indicator to indicate whether the branch was caused by a transaction abort. Other embodiments are described and claimed.



Prioritising of Instruction Fetching in Microprocessor Systems

Thu, 11 Aug 2016 08:00:00 EDT

A method and a system are provided for prioritising the fetching of instructions for each of a plurality of executing instruction threads in a multi-threaded processor. Instructions come from at least one source of instructions. Each thread has a number of threads buffered for execution in an instruction buffer. A first metric for each thread is determined based on the number of instructions currently buffered. A second metric is then determined for each thread, this being an execution based metric. A priority order for the threads is determined from the first and second metrics, and an instruction is fetched from the source for the thread with the highest determined priority which is requesting an instruction.



FAN OUT OF RESULT OF EXPLICIT DATA GRAPH EXECUTION INSTRUCTION

Thu, 11 Aug 2016 08:00:00 EDT

An apparatus for fan out of a result of a first instruction can include first through fourth sets of memory cells and circuitry. The first set can be configured to store the result of the first instruction. The second set can be configured to store an operation code of a second instruction. The third set can be configured to store information of the second instruction. The fourth set can be configured to store an operand for the second instruction. The circuitry can be configured to connect the fourth set to an execution unit and to cause, in response to a presence of the information in the third set, the execution unit to be configured to receive a content of the first set as the operand for the second instruction. A format of the second instruction can include a sets of bits designated for the operation code and for the information.



SYSTEM LEVEL TESTING OF MULTI-THREADING FUNCTIONALITY

Thu, 11 Aug 2016 08:00:00 EDT

A testing facility is provided to test the multithreading functionality of a computing environment. The testing of this functionality includes building independent instruction streams to test threads of a multi-threaded environment while honoring architecturally imposed common fields and constraints, if any, of the threads. Certain features may be enabled/disabled for all threads. The instruction streams generated for testing this functionality may vary from being identical for all the threads being tested to being totally different, such as having different architectures.



SYSTEM FOR SELECTING A TASK TO BE EXECUTED ACCORDING TO AN OUTPUT FROM A TASK CONTROL CIRCUIT

Thu, 11 Aug 2016 08:00:00 EDT

The speed of task scheduling by a multitask OS is increased. A task processor includes a CPU, a save circuit, and a task control circuit. The CPU is provided with a processing register and an execution control circuit operative to load data from a memory into a processing register and execute a task in accordance with the data in the processing register. The save circuit is provided with a plurality of save registers respectively associated with a plurality of tasks. In executing a predetermined system call, the execution control circuit notifies the task control circuit as such. The task control circuit switches between tasks for execution upon receipt of the system call signal, by saving, in the save register associated with a task being executed, the data in the processing register, selecting a task to be executed next, and loading data in the save register associated with the selected task into the processing register.



VECTOR PROCESSOR CONFIGURED TO OPERATE ON VARIABLE LENGTH VECTORS WITH REGISTER RENAMING

Thu, 04 Aug 2016 08:00:00 EDT

A computer processor is disclosed. The computer processor comprises a vector unit comprising a vector register file comprising at least one vector register to hold a varying number of elements. The number of architected vector registers in the vector register file differs from the number of physical vector registers in the vector register file.



VECTOR PROCESSOR CONFIGURED TO OPERATE ON VARIABLE LENGTH VECTORS WITH OUT-OF-ORDER EXECUTION

Thu, 04 Aug 2016 08:00:00 EDT

A computer processor is disclosed. The computer processor comprise a vector unit comprising a vector register file comprising at least one vector register to hold a varying number of elements. The computer processor further comprises out-of-order issue logic that holds a pool of vector instructions, selects a vector instruction from the pool, and sends the vector instruction for execution. The vector instruction operates on the varying number of elements of the at least one vector register.



MONOLITHIC VECTOR PROCESSOR CONFIGURED TO OPERATE ON VARIABLE LENGTH VECTORS

Thu, 04 Aug 2016 08:00:00 EDT

A computer processor comprising a vector unit is disclosed. The vector unit may comprise a vector register file comprising at least one register to hold a varying number of elements. The vector unit may further comprise a vector length register file comprising at least one register to specify the number of operations of a vector instruction to be performed on the varying number of elements in the at least one register of the vector register file. The computer processor may be implemented as a monolithic integrated circuit.



VECTOR PROCESSOR CONFIGURED TO OPERATE ON VARIABLE LENGTH VECTORS USING IMPLICITLY TYPED INSTRUCTIONS

Thu, 04 Aug 2016 08:00:00 EDT

A computer processor is disclosed. The computer processor may comprise a vector unit comprising a vector register file comprising one or more registers to hold a varying number of elements. The computer processor may further comprise processing logic configured to implicitly type each of the varying number of elements in the vector register file. The computer processor may be implemented as a monolithic integrated circuit.



VECTOR PROCESSOR CONFIGURED TO OPERATE ON VARIABLE LENGTH VECTORS USING GRAPHICS PROCESSING INSTRUCTIONS

Thu, 04 Aug 2016 08:00:00 EDT

A computer processor is disclosed. The computer processor may comprise a vector unit comprising a vector register file comprising at least one register to hold a varying number of elements. The computer processor may further comprise processing logic configured to operate on the varying number of elements in the vector register file using one or more graphics processing instructions. The computer processor may be implemented as a monolithic integrated circuit.



VECTOR PROCESSOR CONFIGURED TO OPERATE ON VARIABLE LENGTH VECTORS WITH ASYMMETRIC MULTI-THREADING

Thu, 04 Aug 2016 08:00:00 EDT

A computer processor is disclosed. The computer processor comprises one or more processor resources. The computer processor further comprises a plurality of hardware thread units coupled to the one or more processor resources. The computer processor may be configured to permit simultaneous access to the one or more processor resources by only a subset of hardware thread units of the plurality of hardware thread units. The number of hardware threads in the subset may be less than the total number of hardware threads of the plurality of hardware thread units.



PROCESSOR WITH HYBRID PIPELINE CAPABLE OF OPERATING IN OUT-OF-ORDER AND IN-ORDER MODES

Thu, 04 Aug 2016 08:00:00 EDT

A method and circuit arrangement provide support for a hybrid pipeline that dynamically switches between out-of-order and in-order modes. The hybrid pipeline may selectively execute instructions from at least one instruction stream that require the high performance capabilities provided by out-of-order processing in the out-of-order mode. The hybrid pipeline may also execute instructions that have strict power requirements in the in-order mode where the in-order mode conserves more power compared to the out-of-order mode. Each stage in the hybrid pipeline may be activated and fully functional when the hybrid pipeline is in the out-of-order mode. However, stages in the hybrid pipeline not used for the in-order mode may be deactivated and bypassed by the instructions when the hybrid pipeline dynamically switches from the out-of-order mode to the in-order mode. The deactivated stages may then be reactivated when the hybrid pipeline dynamically switches from the in-order mode to the out-of-order mode.



CONCURRENT MULTIPLE INSTRUCTION ISSUE OF NON-PIPELINED INSTRUCTIONS USING NON-PIPELINED OPERATION RESOURCES IN ANOTHER PROCESSING CORE

Thu, 04 Aug 2016 08:00:00 EDT

A method and circuit arrangement utilize inactive non-pipelined operation resources in one processing core of a multi-core processing unit to execute non-pipelined instructions on behalf of another processing core in the same processing unit. Adjacent processing cores in a processing unit may be coupled together such that, for example, when one processing core's non-pipelined execution sequencer is busy, that processing core may issue into another processing core's non-pipelined execution sequencer if that other processing core's non-pipelined execution sequencer is idle, thereby providing intermittent concurrent execution of multiple non-pipelined instructions within each individual processing core.



METHOD AND APPARATUS FOR REALIZING SELF-TIMED PARALLELIZED MANY-CORE PROCESSOR

Thu, 04 Aug 2016 08:00:00 EDT

A self-timed parallelized multi-core processor and method for operating the processor are provided. The processor has an instruction decoder unit to receive a program code instruction, determine an operating code and latency for the program code instructions, and assign a loop index to the program code instruction. The processor further includes an instruction decomposer unit coupled to the instruction decoder unit, the instruction decomposer configured to create a primitive by decomposing the instruction, replace the loop index with a core index, and broadcast the primitive. The processor further has a plurality of self-timed processing cores coupled to the instruction decomposer unit, each core having a unique core index and having a dispatch unit for comparing the core index in the primitive with the core index of its processing core, each core acting on the primitive when the index of the processing core is within a threshold of the core index.



APPARATUS AND METHOD FOR ARCHITECTURAL PERFORMANCE MONITORING IN BINARY TRANSLATION SYSTEMS

Thu, 04 Aug 2016 08:00:00 EDT

Methods and apparatuses relate to emulating architectural performance monitoring in a binary translation system. In one embodiment, a processor includes an architectural performance counter to maintain an architectural value associated with instruction execution, a register to store the architectural value of the architectural performance counter, binary translation logic to embed an architectural value from the architectural performance counter into a stream of translated instructions having a transactional code region and to store the architectural value into the register, and an execution unit to execute the transactional code region of the stream of translated instructions. The binary translation logic is configured to add the architectural value from the register to the architectural performance counter upon completion of the transactional code region of the stream of translated instructions. In one embodiment, a binary translation system overcomes software incompatibilities by using microarchitectural support to transparently and accurately emulate architectural performance counter behavior.



PARSING-ENHANCEMENT FACILITY

Thu, 04 Aug 2016 08:00:00 EDT

An instruction for parsing a buffer to be utilized within a data processing system including: an operation code field, the operation code field identifies the instruction; a control field, the control field controls operation of the instruction; and one or more general registers, wherein a first general register stores an argument address, a second general register stores a function code, a third general register stores length of an argument-character buffer, and the fourth of which contains the address of the function-code data structure.



VECTOR PROCESSOR CONFIGURED TO OPERATE ON VARIABLE LENGTH VECTORS USING INSTRUCTIONS TO COMBINE AND SPLIT VECTORS

Thu, 04 Aug 2016 08:00:00 EDT

A computer processor is disclosed. The computer processor may comprise a vector unit comprising a vector register file comprising at least one register to hold a varying number of elements. The computer processor may further comprise processing logic configured to operate on the varying number of elements in the vector register file using one or more instructions that separate a vector or combine two vectors. The computer processor may be implemented as a monolithic integrated circuit.



VECTOR PROCESSOR CONFIGURED TO OPERATE ON VARIABLE LENGTH VECTORS USING INSTRUCTIONS THAT CHANGE ELEMENT WIDTHS

Thu, 04 Aug 2016 08:00:00 EDT

A computer processor is disclosed. The computer processor may comprises a vector unit comprising a vector register file comprising at least one register to hold a varying number of elements. The computer processor may further comprise processing logic configured to operate on the varying number of elements in the vector register file using one or more instructions that produce results with elements of widths different than that of the input elements. The computer processor may be implemented as a monolithic integrated circuit.



VECTOR PROCESSOR CONFIGURED TO OPERATE ON VARIABLE LENGTH VECTORS USING DIGITAL SIGNAL PROCESSING INSTRUCTIONS

Thu, 04 Aug 2016 08:00:00 EDT

A computer processor is disclosed. The computer processor comprises a vector unit comprising a vector register file comprising one or more registers to hold a varying number of elements. The computer processor further comprises processing logic configured to operate on the varying number of elements in the vector register file using one or more digital signal processing instructions. The computer processor may be implemented as a monolithic integrated circuit.



METHOD AND APPARATUS FOR PERFORMING REGISTER ALLOCATION

Thu, 04 Aug 2016 08:00:00 EDT

A method of performing register allocation for at least one program code module. The method comprises constructing a restriction graph for program variables within at least one program instruction, and determining whether the constructed restriction graph is colourable. The method further comprises, if it is determined that the constructed restriction graph is not colourable, determining whether at least one alternative form of the at least one program instruction is available, and modifying the at least one program instruction to comprise an alternative form if it is determined that at least one alternative form is available.