Subscribe: Untitled
http://www.freepatentsonline.com/rssfeed/rsspat712.xml
Added By: Feedage Forager Feedage Grade B rated
Language: English
Tags:
code  data  execution  hardware  includes  instruction  instructions  memory  method  processing  processor  system  thread 
Rate this Feed
Rate this feedRate this feedRate this feedRate this feedRate this feed
Rate this feed 1 starRate this feed 2 starRate this feed 3 starRate this feed 4 starRate this feed 5 star

Comments (0)

Feed Details and Statistics Feed Statistics
Preview: Untitled

Untitled





 



Adjustment of threads for execution based on over-utilization of a domain in a multi-processor system by destroying parallizable group of threads in sub-domains

Tue, 26 May 2015 08:00:00 EDT

Embodiments provide various techniques for dynamic adjustment of a number of threads for execution in any domain based on domain utilizations. In a multiprocessor system, the utilization for each domain is monitored. If a utilization of any of these domains changes, then the number of threads for each of the domains determined for execution may also be adjusted to adapt to the change.



Identification and management of unsafe optimizations

Tue, 26 May 2015 08:00:00 EDT

Techniques for implementing identification and management of unsafe optimizations are disclosed. A method of the disclosure includes receiving, by a managed runtime environment (MRE) executed by a processing device, a notice of misprediction of optimized code, the misprediction occurring during a runtime of the optimized code, determining, by the MRE, whether a local misprediction counter (LMC) associated with a code region of the optimized code causing the misprediction exceeds a local misprediction threshold (LMT) value, and when the LMC exceeds the LMT value, compiling, by the MRE, native code of the optimized code to generate a new version of the optimized code, wherein the code region in the new version of the optimized code is not optimized.



Dynamic energy savings for digital signal processor modules using plural energy savings states

Tue, 26 May 2015 08:00:00 EDT

In an example embodiment, there is described herein an apparatus comprising an interface for communicating with a plurality of digital signal processors and logic operable to send and receive data via the interface. The logic is configured to determine a first set of digital signal processors to be maintained in a ready state, a second set of digital signal processors to be maintained in a first energy saving state, and a third set of digital signal processors to be maintained in a second energy saving state.



Generating hardware events via the instruction stream for microprocessor verification

Tue, 26 May 2015 08:00:00 EDT

A processor receives an instruction operation (OP) code from a verification system. The instruction OP code includes instruction bits and forced event bits. The processor identifies a forced event based upon the forced event bits, which is unrelated to an instruction that corresponds to the instruction bits. In turn, the processor executes the forced event.



Load/move and duplicate instructions for a processor

Tue, 26 May 2015 08:00:00 EDT

A method includes, in a processor, loading/moving a first portion of bits of a source into a first portion of a destination register and duplicate that first portion of bits in a subsequent portion of the destination register.



Enhanced instruction scheduling during compilation of high level source code for improved executable code

Tue, 26 May 2015 08:00:00 EDT

Systems and methods for static code scheduling are disclosed. A method can include receiving an intermediate representation of source code, building a directed acyclic graph (DAG) for the intermediate representation, and creating chains of dependent instructions from the DAG for cluster formation. The chains are merged into clusters and each node in the DAG is marked with an identifier of a cluster it is part of to generate a marked instruction DAG. Instruction DAG scheduling is then performed using information about the clusters to generate an ordered intermediate representation of the source code.



Storing in other queue when reservation station instruction queue reserved for immediate source operand instruction execution unit is full

Tue, 26 May 2015 08:00:00 EDT

A processing apparatus includes an execution unit which performs computation on two operand inputs each being selectable between read data from a register and an immediate value. The processing apparatus also includes another execution unit which performs computation on two operand inputs, one of which is selectable between read data from a register and an immediate value, and the other of which is an immediate value. A control unit determines, based on a received instruction specifying a computation on two operands, whether each of the two operands specifies read data from a register or an immediate value. Depending on the determination result, the control unit causes one of the execution units to execute the computation specified by the received instruction.



Accessing model specific registers (MSR) with different sets of distinct microinstructions for instructions of different instruction set architecture (ISA)

Tue, 26 May 2015 08:00:00 EDT

A microprocessor capable of running both x86 instruction set architecture (ISA) machine language programs and Advanced RISC Machines (ARM) ISA machine language programs. The microprocessor includes a mode indicator that indicates whether the microprocessor is currently fetching instructions of an x86 ISA or ARM ISA machine language program. The microprocessor also includes a plurality of model-specific registers (MSRs) that control aspects of the operation of the microprocessor. When the mode indicator indicates the microprocessor is currently fetching x86 ISA machine language program instructions, each of the plurality of MSRs is accessible via an x86 ISA RDMSR/WRMSR instruction that specifies an address of the MSR. When the mode indicator indicates the microprocessor is currently fetching ARM ISA machine language program instructions, each of the plurality of MSRs is accessible via an ARM ISA MRRC/MCRR instruction that specifies the address of the MSR.



Prefetch optimizer measuring execution time of instruction sequence cycling through each selectable hardware prefetch depth and cycling through disabling each software prefetch instruction of an instruction sequence of interest

Tue, 26 May 2015 08:00:00 EDT

A prefetch optimizer tool for an information handling system (IHS) may improve effective memory access time by controlling both hardware prefetch operations and software prefetch operations. The prefetch optimizer tool selectively disables prefetch instructions in an instruction sequence of interest within an application. The tool measures execution times of the instruction sequence of interest when different prefetch instructions are disabled. The tool may hold hardware prefetch depth constant while cycling through disabling different prefetch instructions and taking corresponding execution time measurements. Alternatively, for each disabled prefetch instruction in the instruction sequence of interest, the tool may cycle through different hardware prefetch depths and take corresponding execution time measurements at each hardware prefetch depth. The tool selects a combination of hardware prefetch depth and prefetch instruction disablement that may improve the execution time in comparison with a baseline execution time.



System and method for virtual machine conversion

Tue, 26 May 2015 08:00:00 EDT

System and method for conversion of virtual machine files without requiring copying of the virtual machine payload (data) from one location to another location. By eliminating this step, applicant's invention significantly enhances the efficiency of the conversion process. In one embodiment, a file system or storage system provides indirections to locations of data elements stored on a persistent storage media. A source virtual machine file includes hypervisor metadata (HM) data elements in one hypervisor file format, and virtual machine payload (VMP) data elements. The source virtual machine file is converted by transforming the HM data elements of the source file to create destination HM data elements in a destination hypervisor format different from the source hypervisor format; maintaining the locations of the VMP data elements stored on the persistent storage media constant during the conversion from source to destination file formats without reading or writing the VMP data elements; and creating indirections to reference the destination HM data elements in the destination hypervisor format and the existing stored VMP data elements.



Reception according to a data transfer protocol of data directed to any of a plurality of destination entities

Tue, 26 May 2015 08:00:00 EDT

A data processing system arranged for receiving over a network, according to a data transfer protocol, data directed to any of a plurality of destination identities, the data processing system comprising: data storage for storing data received over the network; and a first processing arrangement for performing processing in accordance with the data transfer protocol on received data in the data storage, for making the received data available to respective destination identities; and a response former arranged for: receiving a message requesting a response indicating the availability of received data to each of a group of destination identities; and forming such a response; wherein the system is arranged to, in dependence on receiving the said message.



System and method for communicating with sensors/loggers in integrated radio frequency identification (RFID) tags

Tue, 26 May 2015 08:00:00 EDT

A system and method is disclosed for communicating with sensors/loggers in integrated radio frequency identification (RFID) tags. An RFID reader uses a Communicate With Data Logger Command to communicate with a data logger in an RFID tag. The RFID reader performs data access processes using an Index Register and a Data Register of the RFID tag. The RFID reader selects one of (1) Index Read access (2) Index Write access (3) Data Write access (4) Data Read access with parity and (5) Data Read access with cyclic redundancy check (CRC). The RFID tag performs the requested data access and then performs an error detection process.



Debug in a multicore architecture

Tue, 19 May 2015 08:00:00 EDT

A method of monitoring thread execution within a multicore processor architecture which comprises a plurality of interconnected processor elements for processing the threads, the method comprising receiving a plurality of thread parameter indicators of one or more parameters relating to the function and/or identity and/or execution location of a thread or threads, comparing at least one of the thread parameter indicators with a first plurality of predefined criteria each representative of an indicator of interest, and generating an output consequential upon thread parameter indicators which have been identified to be of interest as a result of the said comparison.



System, method and computer program product for recursively executing a process control operation to use an ordered list of tags to initiate corresponding functional operations

Tue, 19 May 2015 08:00:00 EDT

In accordance with embodiments, there are provided mechanisms and methods for controlling a process using a process map. These mechanisms and methods for controlling a process using a process map can enable process operations to execute in order without necessarily having knowledge of one another. The ability to provide the process map can avoid a requirement that the operations themselves be programmed to follow a particular sequence, as can further improve the ease by which the sequence of operations may be changed.



Data mover moving data to accelerator for processing and returning result data based on instruction received from a processor utilizing software and hardware interrupts

Tue, 19 May 2015 08:00:00 EDT

Efficient data processing apparatus and methods include hardware components which are pre-programmed by software. Each hardware component triggers the other to complete its tasks. After the final pre-programmed hardware task is complete, the hardware component issues a software interrupt.



Multiprocessor messaging system

Tue, 19 May 2015 08:00:00 EDT

A multiprocessor system includes a first microprocessor and a second microprocessor. A first signaling pathway is configured to send message transmission coordination signals from the first microprocessor to the second microprocessor. The first signaling pathway may be coupled to at least two flag registers associated with the second microprocessor. A second signaling pathway is configured to send message transmission coordination signals from the second microprocessor to the first microprocessor. The second signaling pathway may be coupled to at least two flag registers associated with the first microprocessor. The first signaling pathway is independent of the second signaling pathway.



Hardware assist thread for increasing code parallelism

Tue, 19 May 2015 08:00:00 EDT

Mechanisms are provided for offloading a workload from a main thread to an assist thread. The mechanisms receive, in a fetch unit of a processor of the data processing system, a branch-to-assist-thread instruction of a main thread. The branch-to-assist-thread instruction informs hardware of the processor to look for an already spawned idle thread to be used as an assist thread. Hardware implemented pervasive thread control logic determines if one or more already spawned idle threads are available for use as an assist thread. The hardware implemented pervasive thread control logic selects an idle thread from the one or more already spawned idle threads if it is determined that one or more already spawned idle threads are available for use as an assist thread, to thereby provide the assist thread. In addition, the hardware implemented pervasive thread control logic offloads a portion of a workload of the main thread to the assist thread.



Shared load-store unit to monitor network activity and external memory transaction status for thread switching

Tue, 19 May 2015 08:00:00 EDT

An array of a plurality of processing elements (PEs) are in a data packet-switched network interconnecting the PEs and memory to enable any of the PEs to access the memory. The network connects the PEs and their local memories to a common controller. The common controller may include a shared load/store (SLS) unit and an array control unit. A shared read may be addressed to an external device via the common controller. The SLS unit can continue activity as if a normal shared read operation has taken place, except that the transactions that have been sent externally may take more cycles to complete than the local shared reads. Hence, a number of transaction-enabled flags may not have been deactivated even though there is no more bus activity. The SLS unit can use this state to indicate to the array control unit that a thread switch may now take place.



Data processing method and apparatus for prefetching

Tue, 19 May 2015 08:00:00 EDT

A data processing device includes processing circuitry 20 for executing a first memory access instruction to a first address of a memory device 40 and a second memory access instruction to a second address of the memory device 40, the first address being different from the second address. The data processing device also includes prefetching circuitry 30 for prefetching data from the memory device 40 based on a stride length 70 and instruction analysis circuitry 50 for determining a difference between the first address and the second address. Stride refining circuitry 60 is also provided to refine the stride length based on factors of the stride length and factors of the difference calculated by the instruction analysis circuitry 50.



Method and system for managing hardware resources to implement system functions using an adaptive computing architecture

Tue, 19 May 2015 08:00:00 EDT

An adaptable integrated circuit is disclosed having a plurality of heterogeneous computational elements coupled to an interconnection network. The interconnection network changes interconnections between the plurality of heterogeneous computational elements in response to configuration information. A first group of computational elements is allocated to form a first version of a functional unit to perform a first function by changing interconnections in the interconnection network between the first group of heterogeneous computational elements. A second group of computational elements is allocated to form a second version of a functional unit to perform the first function by changing interconnections in the interconnection network between the second group of heterogeneous computational elements. One or more of the first or second group of heterogeneous computational elements are reallocated to perform a second function by changing the interconnections between the one or more of the first or second group of heterogeneous computational elements.



High performance computing (HPC) node having a plurality of switch coupled processors

Tue, 19 May 2015 08:00:00 EDT

A High Performance Computing (HPC) node comprises a motherboard, a switch comprising eight or more ports integrated on the motherboard, and at least two processors operable to execute an HPC job, with each processor communicably coupled to the integrated switch and integrated on the motherboard.



Data accessing method for flash memory storage device having data perturbation module, and storage system and controller using the same

Tue, 19 May 2015 08:00:00 EDT

A data accessing method, and a storage system and a controller using the same are provided. The data accessing method is suitable for a flash memory storage system having a data perturbation module. The data accessing method includes receiving a read command from a host and obtaining a logical block to be read and a page to be read from the read command. The data accessing method also includes determining whether a physical block in a data area corresponding to the logical block to be read is a new block and transmitting a predetermined data to the host when the physical block corresponding to the logical block to be read is a new block. Thereby, the host is prevented from reading garbled code from the flash memory storage system having the data perturbation module.



Method for activating processor cores within a computer system

Tue, 12 May 2015 08:00:00 EDT

A method for activating processor cores within a computer system is disclosed. Initially, a value representing a number of processor cores to be enabled within the computer system is received. The computer system includes multiple processors, and each of the processors includes multiple processor cores. Next, a scale variable value representing a specific type of tasks to be optimized during an execution of the tasks within the computer system is received. From a pool of available processor cores within the computer system, a subset of processor cores can be selected for activation. The subset of processor cores is activated in order to achieve system optimization during an execution of the tasks.



Multiprocessor system, multiprocessor control method, and multiprocessor integrated circuit

Tue, 12 May 2015 08:00:00 EDT

In a multiprocessor system, in general, a processor assigned with a larger amount of tasks is apt to perform a larger amount of communication with other processors assigned with tasks, than a processor assigned with a smaller amount of tasks. Thus in order for each processor to be able to perform the routing process efficiently, tasks are assigned such that, when there are a first processor and a second processor, the number of processors each assigned with one or more tasks and directly connected with the second processor being smaller than the number of processors each assigned with one or more tasks and directly connected with the first processor, the amount of tasks assigned to the first processor is equal to or larger than the amount of tasks assigned to the second processor.



Efficient parallel computation of dependency problems

Tue, 12 May 2015 08:00:00 EDT

A computing method includes accepting a definition of a computing task, which includes multiple Processing Elements (PEs) having execution dependencies. The computing task is compiled for concurrent execution on a multiprocessor device, by arranging the PEs in a series of two or more invocations of the multiprocessor device, including assigning the PEs to the invocations depending on the execution dependencies. The multiprocessor device is invoked to run software code that executes the series of the invocations, so as to produce a result of the computing task.



Virtualization support for branch prediction logic enable / disable at hypervisor and guest operating system levels

Tue, 12 May 2015 08:00:00 EDT

A hypervisor and one or more guest operating systems resident in a data processing system and hosted by the hypervisor are configured to selectively enable or disable branch prediction logic through separate hypervisor-mode and guest-mode instructions. By doing so, different branch prediction strategies may be employed for different operating systems and user applications hosted thereby to provide finer grained optimization of the branch prediction logic for different operating scenarios.



Recovering from an error in a fault tolerant computer system

Tue, 12 May 2015 08:00:00 EDT

A leading thread and a trailing thread are executed in parallel. Assuming that no transient fault occurs in each section, a system is speculatively executed in the section, with the leading thread and the trailing thread preferably being assigned to two different cores. At this time, the leading thread and the trailing thread are simultaneously executed, performing a buffering operation on a thread local area without performing a write operation on a shared memory. When the respective execution results of the two threads match each other, the content buffered to the thread local area is committed and written to the shared memory. When the respective execution results of the two threads do not match each other, the leading thread and the trailing thread are rolled back to a preceding commit point and re-executed.



Efficient conditional ALU instruction in read-port limited register file microprocessor

Tue, 12 May 2015 08:00:00 EDT

A microprocessor having performs an architectural instruction that instructs it to perform an operation on first and second source operands to generate a result and to write the result to a destination register only if its architectural condition flags satisfy a condition specified in the architectural instruction. A hardware instruction translator translates the instruction into first and second microinstructions. To execute the first microinstruction, an execution pipeline performs the operation on the source operands to generate the result. To execute the second microinstruction, it writes the destination register with the result generated by the first microinstruction if the architectural condition flags satisfy the condition, and writes the destination register with the current value of the destination register if the architectural condition flags do not satisfy the condition.



Issue policy control within a multi-threaded in-order superscalar processor

Tue, 12 May 2015 08:00:00 EDT

A multi-threaded in-order superscalar processor 2 includes an issue stage 12 including issue circuitry 22, 24 for selecting instructions to be issued to execution units 14, 16 in dependence upon a currently selected issue policy. A plurality of different issue policies are provided by associated different policy circuitry 28, 30, 32 and a selection between which of these instances of the policy circuitry 28, 30, 32 is active is made by policy selecting circuitry 34 in dependence upon detected dynamic behavior of the processor 2.



Instruction execution

Tue, 12 May 2015 08:00:00 EDT

A method of executing an instruction set including a first instruction and a second instruction, includes reading the first instruction; determining whether the first instruction is an instruction which is integral with the second instruction; reading the second instruction; if the first instruction is integral with the second instruction, interpreting the operand field of the second instruction to indicate at least one value to be used in conjunction with at least one bit of the first instruction; and if the first instruction is not integral with the second instruction, interpreting the operand field of the second instruction to indicate an entry of a look-up table.



Utilization of a microcode interpreter built in to a processor

Tue, 12 May 2015 08:00:00 EDT

Augmented processor hardware contains a microcode interpreter. When encrypted microcode is included in a message from a service, the microcode may be passed to the microcode interpreter. Based on decryption and execution of the microcode taking place at the processor hardware, extended functionality may be realized.



Active memory command engine and method

Tue, 12 May 2015 08:00:00 EDT

A command engine for an active memory receives high level tasks from a host and generates corresponding sets of either DCU commands to a DRAM control unit or ACU commands to a processing array control unit. The DCU commands include memory addresses, which are also generated by the command engine, and the ACU command include instruction memory addresses corresponding to an address in an array control unit where processing array instructions are stored.



Information processing apparatus for restricting access to memory area of first program from second program

Tue, 12 May 2015 08:00:00 EDT

A processor determines whether a first program is under execution when a second program is executed, and changes a setting of a memory management unit based on access prohibition information so that a fault occurs when the second program makes an access to a memory when the first program is under execution. Then, the processor determines whether an access from the second program to a memory area used by the first program is permitted based on memory restriction information when the fault occurs while the first program and the second program are under execution, and changes the setting of the memory management unit so that the fault does not occur when the access to the memory area is permitted.



Method and device for passing parameters between processors

Tue, 12 May 2015 08:00:00 EDT

The disclosure provides a method for passing a parameter between processors. The method comprises the following steps: in a source program of a slave processor, directly introducing a static configuration parameter to be passed; obtaining a relative address of the static configuration parameter when converting the source program of the slave processor into a target program of the slave processor; and configuring directly, by a master processor, a parameter value of the static configuration parameter in the target program of the slave processor according to the obtained relative address of the static configuration parameter. The disclosure also provides a system for passing a parameter between processors. The system has no need to use external hardware such as a dual-port Random Access Memory (RAM) and a register, thus, the requirement of parameter transmission on the external hardware is reduced, and further the area and static power consumption of a chip are reduced. The disclosure reduces the cycle delay of the slave processor in accessing the dual-port RAM and the register, thereby effectively reducing the dynamic power consumption of the chip, improving the processing capability of the slave processor and enhancing the effective performance of the slave processor.



Client-allocatable bandwidth pools

Tue, 12 May 2015 08:00:00 EDT

Methods and apparatus for client-allocatable bandwidth pools are disclosed. A system includes a plurality of resources of a provider network and a resource manager. In response to a determination to accept a bandwidth pool creation request from a client for a resource group, where the resource group comprises a plurality of resources allocated to the client, the resource manager stores an indication of a total network traffic rate limit of the resource group. In response to a bandwidth allocation request from the client to allocate a specified portion of the total network traffic rate limit to a particular resource of the resource group, the resource manager initiates one or more configuration changes to allow network transmissions within one or more network links of the provider network accessible from the particular resource at a rate up to the specified portion.



Method for activating processor cores within a computer system

Tue, 05 May 2015 08:00:00 EDT

A technique for activating processor cores within a computer system is disclosed. Initially, a value representing a number of processor cores to be enabled within the computer system is received. The computer system includes multiple processors, and each of the processors includes multiple processor cores. Next, a scale variable value representing a specific type of tasks to be optimized during an execution of the tasks within the computer system is received. From a pool of available processor cores within the computer system, a subset of processor cores can be selected for activation. The subset of processor cores is activated in order to achieve system optimization during an execution of the tasks.



Detecting and reissuing of loop instructions in reorder structure

Tue, 05 May 2015 08:00:00 EDT

A processor for processing loop instructions can include an instruction reorder structure and a loop processing controller. The instruction reorder structure is configured to store decoded instructions according to program order and issue the decoded instructions for execution out of program order. The loop processing controller is configured to detect a loop in the decoded instructions stored in the instruction reorder structure and cause the instruction reorder structure to reissue the decoded instructions that form the loop for re-execution.



Executing machine instructions comprising input/output pairs of execution nodes

Tue, 05 May 2015 08:00:00 EDT

A computing machine is disclosed having a memory system for storing a collection of execution nodes, a head for reading a sequence of symbols in the execution nodes in the memory system, and writing a sequence of symbols in the memory system. The machine is configured to execute a computation with a collection of pairs of execution nodes. Each pair of execution nodes represents a machine instruction. One execution node in the pair represents input of the machine instruction represented by the execution nodes. Another execution node in the pair represents output of the machine instruction represented by the execution nodes. Each execution node has a state of the machine, a sequence of symbols and a number.



APC model extension using existing APC models

Tue, 05 May 2015 08:00:00 EDT

A method of extending advanced process control (APC) models includes constructing an APC model table including APC model parameters of a plurality of products and a plurality of work stations. The APC model table includes empty cells and cells filled with existing APC model parameters. Average APC model parameters of the existing APC model parameters are calculated, and filled into the empty cells as initial values. An iterative calculation is performed to update the empty cells with updated values.



Operand and limits optimization for binary translation system

Tue, 28 Apr 2015 08:00:00 EDT

Methods and systems for optimizing generation of natively executable code from non-native binary code are disclosed. One method includes receiving a source file including binary code configured for execution according to a non-native instruction set architecture. The method also includes translating one or more code blocks included in the executable binary code to source code, and applying an optimizing algorithm to instructions in the one or more code blocks. The optimizing algorithm is selected to reduce a number of memory address translations performed when translating the source code to native executable binary code, thereby resulting in one or more optimized code blocks. The method further includes compiling the source code to generate an output file comprising natively executable binary code including the one or more optimized code blocks.



Combined branch target and predicate prediction for instruction blocks

Tue, 28 Apr 2015 08:00:00 EDT

Embodiments provide methods, apparatus, systems, and computer readable media associated with predicting predicates and branch targets during execution of programs using combined branch target and predicate predictions. The predictions may be made using one or more prediction control flow graphs which represent predicates in instruction blocks and branches between blocks in a program. The prediction control flow graphs may be structured as trees such that each node in the graphs is associated with a predicate instruction, and each leaf associated with a branch target which jumps to another block. During execution of a block, a prediction generator may take a control point history and generate a prediction. Following the path suggested by the prediction through the tree, both predicate values and branch targets may be predicted. Other embodiments may be described and claimed.



System and method for Controlling restarting of instruction fetching using speculative address computations

Tue, 28 Apr 2015 08:00:00 EDT

A system and method for controlling restarting of instruction fetching using speculative address computations in a processor are provided. The system includes a predicted target queue to hold branch prediction logic (BPL) generated target address values. The system also includes target selection logic including a recycle queue. The target selection logic selects a saved branch target value between a previously speculatively calculated branch target value from the recycle queue and an address value from the predicted target queue. The system further includes a compare block to identify a wrong target in response to a mismatch between the saved branch target value and a current calculated branch target, where instruction fetching is restarted in response to the wrong target.



Implementation of multi-tasking on a digital signal processor with a hardware stack

Tue, 28 Apr 2015 08:00:00 EDT

The disclosure relates to the implementation of multi-tasking on a digital signal processor. Blocking functions are arranged such that they do not make use of a processor's hardware stack. Respective function calls are replaced with a piece of inline assembly code, which instead performs a branch to the correct routine for carrying out said function. If a blocking condition of the blocking function is encountered, a task switch can be done to resume another task. Whilst the hardware stack is not used when a task switch might have to occur, mixed-up contents of the hardware stack among function calls performed by different tasks are avoided.



System for accessing a register file using an address retrieved from the register file

Tue, 28 Apr 2015 08:00:00 EDT

A data processing system and method are disclosed. The system comprises an instruction-fetch stage where an instruction is fetched and a specific instruction is input into decode stage; a decode stage where said specific instruction indicates that contents of a register in a register file are used as an index, and then, the register file pointed to by said index is accessed based on said index; an execution stage where an access result of said decode stage is received, and computations are implemented according to the access result of the decode stage.



Low latency variable transfer network communicating variable written to source processing core variable register allocated to destination thread to destination processing core variable register allocated to source thread

Tue, 28 Apr 2015 08:00:00 EDT

A method and circuit arrangement utilize a low latency variable transfer network between the register files of multiple processing cores in a multi-core processor chip to support fine grained parallelism of virtual threads across multiple hardware threads. The communication of a variable over the variable transfer network may be initiated by a move from a local register in a register file of a source processing core to a variable register that is allocated to a destination hardware thread in a destination processing core, so that the destination hardware thread can then move the variable from the variable register to a local register in the destination processing core.



Methods and apparatus for storing expanded width instructions in a VLIW memory for deferred execution

Tue, 28 Apr 2015 08:00:00 EDT

Techniques are described for decoupling fetching of an instruction stored in a main program memory from earliest execution of the instruction. An indirect execution method and program instructions to support such execution are addressed. In addition, an improved indirect deferred execution processor (DXP) VLIW architecture is described which supports a scalable array of memory centric processor elements that do not require local load and store units.



Data processing device

Tue, 28 Apr 2015 08:00:00 EDT

A statue management section of a control section is provided with a corresponding real number storage section that stores a real number converted from a logical number by a configuration number converting section. When the corresponding real number storage section has stored configuration information with a real number of the next transition state, the state management section directly supplies the real number to the configuration information storage section in the next or later processing cycle.



Indirect designation of physical configuration number as logical configuration number based on correlation information, within parallel computing

Tue, 28 Apr 2015 08:00:00 EDT

A computing section is provided with a plurality of computing units and correlatively stores entries of configuration information that describes configurations of the plurality of computing units with physical configuration numbers that represent the entries of configuration information and executes a computation in a configuration corresponding to a designated physical configuration number. A status management section designates a physical configuration number corresponding to a status to which the computing section needs to advance the next time for the computing section and outputs the status to which the computing section needs to advance the next time as a logical status number that uniquely identifies the status to which the computing section needs to advance the next time in an object code. A determination section determines whether or not the computing section has stored an entry of configuration information corresponding to the status to which the computing section needs to advance the next time based on the logical status number that is output from the status management section. A rewriting section correlatively stores the entry of the configuration information and a physical configuration number corresponding to the entry of the configuration information in the computing section when the determination section determines that the computing section has not stored the entry of configuration information corresponding to the status to which the computing section needs to advance the next time.



Interleaving data accesses issued in response to vector access instructions

Tue, 28 Apr 2015 08:00:00 EDT

A vector data access unit includes data access ordering circuitry, for issuing data access requests indicated by elements of earlier and a later vector instructions, one being a write instruction. An element indicating the next data access for each of the instructions is determined. The next data accesses for the earlier and the later instructions may be reordered. The next data access of the earlier instruction is selected if the position of the earlier instruction's next data element is less than or equal to the position of the later instruction's next data element minus a predetermined value. The next data access of the later instruction may be selected if the position of the earlier instruction's next data element is higher than the position of the later instruction's next data element minus a predetermined value. Thus data accesses from earlier and later instructions are partially interleaved.



Data processing apparatus and method for controlling data processing apparatus

Tue, 28 Apr 2015 08:00:00 EDT

A data processing apparatus includes multiple processing means that are connected in a ring shape via corresponding communication means respectively. Each communication means includes a reception means for receiving data from a previous communication means, and a transmission means for transmitting data to a next communication means. Connection information is assigned to each of the reception means and the transmission means. The communication means, when receiving a packet that has same connection information as one assigned to its reception means, causes the corresponding processing means to perform data processing on the packet, sets the connection information assigned to its transmission means to the packet, and transmits the packet to the next communication means, and when receiving a packet that has connection information that is not same as one assigned to its reception means, transmits the packet to the next communication means without changing the connection information of the packet.