Last updated Monday, 07Sep1998  Edited by Jan Madsen 


Session 1: Opening 


In this paper we present efficient procedures for delay constrained minimization of the power due to leakage in CMOS digital circuits for a dual threshold voltage (Vt) technology. The availability of two or more threshold voltages on the same chip provides a new opportunity for circuit designers to make tradeoffs between power and delay. We present two efficient procedures that take as input a gate level netlist and assign the proper threshold voltage to each transistor so that the leakage power is minimized but without violating the delay constraints. Experimental results on the MCNC91 benchmark circuits show that up to one order of magnitude power reduction can be achieved without any delay increase when compared to a circuit where all transistors are low Vt devices. 
Session 2: Cell design and libraries 


A new methodology for the design of lowpower standard cell libraries is presented. The proposed approach addresses power consumption at various steps in the design flow, applying new design automation algorithms and incorporating innovative cell designs. CAD techniques are used to speed development of the library, allowing for quick analysis of power and delay characteristics, with subsequent feedback for redesign. The effectiveness of the proposed flow is demonstrated on several benchmark circuits implemented in a 0.35mm CMOS, 1.8 volt standard cell library designed using this methodology.


Shortening of design cycles is currently obtained by designing digital systems at gate or cell level, using precharacterized gate or cell delays to speed up the performance analysis. Modeling timing library format is one of the most difficult task of library designers. We present in this paper a piece wise linear approximation of the delay performance equations of submicronic CMOS library, considering input to output coupling, loading and input wave form induced non linearity. Application is given to a 0.25µm CMOS process for which linear performance equations are deduced for the generally used design range. Limits of linear representations are clearly identified. Validations are obtained through comparison with Spice simulations.




The transistor density is one of the parameters to be considered for an optimal use of CMOS process. Therefore, layout strategies have to be evaluated through metrics considering all the involved parameters. The objective of this paper is to study the real transistor density available for a given technology at the cell and circuit levels. This study has three main interests: (i) to evaluate the quality of the layout synthesis tools in terms of area; (ii) to give a feedback to the logic synthesis step, allowing an accurate area prediction from the gate level abstraction; and (iii) to determine a transistor density roadmap.

Session 3: Gate Level Time and Power Modeling 


An accurate model for the inputtooutput coupling capacitance of the CMOS inverter is proposed. This model takes into account capacitance contributions derived from sub micron sizes, which are neglected in traditional approximations. Model predictions are within 10% of HSPICE results for a wide range of inverter configurations. Excellent results are obtained when applied to the evaluation of the overshoot time, making it suitable to improve the timing analysis of CMOS inverters.


The impact of Miller capacitance on power consumption of digital CMOS VLSI circuits is investigated and analyzed. The fixed Miller, input and output gate capacitances have been compared to the more accurate voltagedependent counterparts. The result shows that the Miller capacitance amplification factor of digital circuits can have lower values than the common, yet practically valid assumption which predicts a factor of two for each hightolow or lowtohigh output transition.


In this paper, a metric for the capacitive and shortcircuit transition energy of a circuit, described at logiclevel, is proposed. The metric is based on the well known Weighted Switching Activity estimator which is improved to include the shortcircuit component of the transition energy. The shortcircuit equation is obtained by the integration of Sakurai's alpha power model of the transistors. The main parameters of the metric are the weights of the gate's input and output nodes and the normalized pentode resistances of the P and N nets. The model proposed has been validated by means of the HSPICE electrical simulator. A technology of 0.7um ECPD07 of ES2 has been selected using level 13 equations. The results show that if the weight of the inputs is kept below 20 the relative error is smaller than 11%. Additionally, the proposed model is used to compute fast upper and lower bounds of the total transition energy.


This paper characterizes the energy consumed by a full complementary digital CMOS circuit during its poweron phase. In the analysis, the energy dissipation in the conducting transistors has been taken into account as well as the loss due to leakage and the energy stored in the circuit. The analysis shows that it is possible to obtain important savings in energy by controlling the rise time t_r of the power supply voltage during poweron (adiabatic poweron). However, the analysis also shows that the power savings are not proportional to the increase in the rise time due to the unavoidable energy loss produced when the digital circuit is configured. Thus, the energy consumed during the poweron process is not proportional to t_r^(1) but it tends to be constant. In addition, when the rise time is very long, the energy consumed during the poweron actually increases because the leakage power becomes significant.

Session 4: Panel 


Panelists:
D. Flandre, UCLDICE, BELGIUM Jim Garside, Univ. of Manchester, UK Alain Guyot, INPGTIMA, FRANCE Wolfgang Nebel, University of Oldenburg and OFFIS, GERMANY

Session 5: High Level Power Modeling and Optimization 




A protocol for delayinsensitive asynchronous communication based on the timemultiplexing of two bit on a dual rail line is proposed and analyzed. Compared to conventional dualrail protocols, it halves the number of bus wires and, according to simulations, significantly reduces energy consumption with small throughput loss and nearly no area overhead.


Conventional hardware description languages do not provide all the facilities required for efficient behavioural modelling of asynchronous systems. This paper presents a new HDL incorporating CSPlike channel communication and other features making it more suitable for this task. A model of a simple microprocessor is used to illustrate how the language and tools can be applied to real design problems. We use the tool to investigate the complex relationships that can exist between the speed of individual blocks and the system's overall performance, and look at power modelling based on channel activity.


The introduction of VHDL and highlevel tools has reduced the design time of VLSI circuits. These tools are however not adapted to lowpower techniques. This paper presents a methodology to adapt the tools to the lowpower gatedclock technique, and adds some improvements to the gatedclock, for further power consumption reduction and design simplicity.

Session 6: Architectural Techniques and ROM Modeling 




This paper addresses the problem of modeling the power consumption of onchip ROMs for gatelevel and RTlevel power estimations. A route to memory power model development is presented that is also applicable to other memory architectures. The model proposed operates within an error margin of less than 5%.


Large memories require too much resources to be fully extracted and simulated. In this paper we present an accurate power and timing modeling technique that reduces significantly characterization resources. The model is based on current controlled generators, and can be used in several regular structures. Lowpower memories are easier to characterize using this technique, since they are block partitioned. Moreover, this modeling technique is fully parameterable and is very useful to efficiently develop and optimize memory compilers. It has been successfully applied to characterize and improve performances of a lowpower ROM compiler.

Session 7: Power Simulation and Estimation 




One way of minimizing the time required to perform simulationbased power estimation is that of reducing the length of the input trace to be simulated, at the price of the introduction of some errors in the estimation results. Existing techniques exploit the knowledge of some statistical and/or spectral information about the original input trace to generate a shorter stream that matches such characteristics as much as possible; this is with the objective of minimizing the estimation error. Very often, however, the stream to be simulated consists of validation patterns provided by the designer, whose power consumption may vary sensibly over time as the system responds to the inputs. In these cases, classical stream compaction solutions are not very suitable, and may result in unacceptable errors. In this paper, we introduce a compaction technique that specifically targets userprovided traces characterized by a large variance of the average power dissipation over time. The proposed approach leverages an existing multilevel power simulation engine that can be used for accurate estimation of input streams of this type. The effectiveness of the proposed compaction procedure is demonstrated by the experimental results we have obtained on the complete set of the Iscas'85 combinational benchmarks.


Power models for RTLevel power estimation of common components, exploiting their functionality, regularity, symmetry, and separability are introduced. The accuracy of the proposed models is higher than the existing ones, as it is identical to that of a real delay gate level power estimator. The power model of Kbit Ripple Carry Adder is presented in detail manner. Comparison results prove that the computational complexity of the proposed models is similar or lower than that of the existing ones for the majority of the components.


We propose a power estimation technique for controllers that operates at the registertransfer level to provide early warning of a power problems. Our estimator is based on the use of entropy as a measure of the average activity in the final implementation of a circuit given FSM network description. FSM networks are constructed from a partition of a initial FSM state transition graph (STG). The technique has been implemented and tested on a variety of benchmarks.

Session 8: Timing Issues and Tools 






This paper investigates retiming and clock skew scheduling for improving the performance of synchronous circuits. It is shown that when both long and short paths are considered, circuits optimized by the simultaneous application of retiming and clock scheduling can achieve shorter clock periods than optimized circuits generated by applying either of the two techniques separately. A novel mixedinteger linear program is given for the problem of simultaneous retiming and clock scheduling with a target clock period and tolerance to delay variations under setup and hold constraints. Experiments with LGSynth93 and ISCAS89 benchmark circuits demonstrate the effectiveness of simultaneous retiming and clock scheduling. For one third of the test circuits, the operating frequency increased by at least 14% over the optimized circuits obtained by applying retiming or clock scheduling separately.


We present an interactive environment, named AStErIx, to help the development of new logic synthesis algorithms and visualize graphically the synthesis results. This environment is based on public domain tools (Sis and LEDA), with added facilities to improve the programming support (based on the Obelix library), and a graph interface (Idefix) to analyze the experimental results. The whole system is governed by a friendly interface (Panoramix) to call interactively the logic synthesis commands and scripts.

Session 9: Interconnections & Technology 


We present in this paper new analytical formulations defining the ramp duration at the entrance and at the end of an interconnection wire in a CMOS structure. With this method, it is possible to describe the signal degradation with respect to the line characteristics. Moreover, waveform information is necessary to model accurately the non linearity of the delay with input ramp effects. Waveform estimated using our method is within 10% of Spice and is over two orders of magnitude faster than that of the simulations.


It is well known that in deep submicron technologies the coupling capacitance between adjacent wires is a critical portion of the total wire capacitance, while at the same time the capacitance between wire and substrate has become the fringing component. High frequency signals travelling across multiple level interconnect structures generate proximity effects, i.e. crosstalk effects, between adjacent wires. Such effects include delay and noise injection and are a serious performance limitation in deep submicron VLSI circuits. An analytical model of the crosstalk effects would be extremely useful both in the design frontend and in the design backend. For instance, a net ranking procedure based on such model could efficiently identify potential signal integrity problems between nets. A compact model of the coupled noise pulse amplitude which improves considerably the simple charge sharing model has been proposed in [4]. In our paper we will demonstrate that such model turns out to be quite inaccurate in several cases that often occur in practical circuits, because it does not consider the wire resistance. Moreover we will introduce an heuristic technique that allows to take into account the resistive effects, thus achieving a considerable accuracy improvement at an equivalent computational cost.


This paper presents a modeling technique to be used in substrate coupling computation of mixedsignal circuits. Substrate crosstalk can negatively impact the performance of analog circuit elements such as mixers or amplifiers and can lead to a total breakdown of circuit parts. In order to forecast substraterelated problems and to analyze its influence EDAtools are needed that can be applied during layout verification. Depending on the degree of accuracy interfaces between the designed elements and the substrate region have to be modeled. A very important issue is the calculation of junction capacitors. We propose a method that is capable of accurately determining lumped junction capacitors without measurement or device simulation.


Performance and complexity are considered here as two orthogonal axes. Performance metrics are recalled. Then different complexity metrics and scales are proposed. Different definitions of complexity are used depending on the considered level of abstraction. Finally, SOI and bulk CMOS technologies are compared in this space.

Session 10: Adiabatic 




Finitetime thermodynamics is illustrated by some examples of process optimalization. The objective is minimum entropy production during an electrical process, i.e. the charging of a capacitor. Nonlinear electronic processes are investigated in particular, because of their importance in computers and displays.


Lowloss circuits composed of CMOS transmission gates as switches are proposed. Functional instability caused by parasitic capacitors and leakage current is overcome by connecting each wire to the ground through the inverse of the logic function which connects it to the quasiadiabatic clock. A canonical retractile PLA (Programmable Logic Array) and an autonomous FSM (Finite State Machine) is presented.

Session 11: Asynchronous 




We present a technique for the estimation of power consumption in asynchronous circuits through the modelling of transition switching activity. Unlike most existing techniques for analytic (nonsimulation) power estimation that use reachability state traversal and Markov chain analysis, our method is based on an invariant analysis of Petri net models using matrix representations. This approach is in general more efficient than Markov chain analysis, due to the avoidance of state explosion, but may lose accuracy for some classes of nets. The asynchronous circuits under analysis are speedindependent designs that are synthesized from Signal Transition Graph descriptions using the tool Petrify.


This paper presents a new methodology for synthesis of asynchronous circuits, which employs the partial states derived from STG rather then total ones. The approach described in this paper is devoted to a synthesis methodology from large STG specifications of control dominated circuits in the concurrent systems and has advantage of reduced processing time comparing to other methods.


A stack controller design for use with conventional register files is evaluated. The design is based on asynchronous circuit techniques, but is suitable for use both in asynchronous and in synchronous environments. In comparison with another compatible asynchronous design it yields higher speed and lower energy consumption. In addition, a comparison with a gatedclock design is provided: the asynchronous solution is smaller (93% of the transistors), consumes significantly less energy per operation (56% according to simulation) and can operate within the same clock cycle time in 0.5um CMOS technology.

Session 12: Arithmetic & Processors 


Processors having both lowpower consumption and highperformance are more and more required in the portable systems market. Although it is easy to find processors with one of these characteristics, it is harder to find a processor having both of them at the same time. In this paper, we evaluate the possibility of designing a highperformance, lowconsumption processor and investigate whether instructionlevel parallelism architectures can be adapted to lowpower processors. We find that an adaptation of highperformance architecture, such as the VLIW architecture, to lowpower 8b or 16b microprocessors yields a significant improvement in the processor's performance while keeping the same energy consumption.


In this paper, we introduce the architecture of a new embedded field programmable processor array (EFPPA) which consists of a lowpower multiprocessor system embedded with standard programmable logic blocks and memory. Each block (processor, programmable logic,...) is coupled to a transfer controller (TC) responsible of all the transfers between blocks. Instead of using a classical crossbar interconnection network, we propose a low cost hierarchical ring which combines simple interface and high performance communications when data locality is observed. Based on the EFPPA, high performance reconfigurable systems can be easily built and we demonstrate that this architecture is an interesting alternative to traditional DSP for lowpower applications. By using the 8bit CoolRisc processor [1,2], an EFPPA including a cluster of 16 processors, 16 TC, working respectively at 25 and 50 MHz, and 1kbytes data SRAM for each processor, consumes 2 W with a peak performance of 1200 Mops. The chip size has been evaluated in 0.35 µm to 52 mm2.




A lowpower, selftimed, CMOS array multiplier, optimized for asynchronous DSP but also applicable to synchronous DSP applications is presented. In order to reduce average power consumption, a strategy termed conditionalevaluation is introduced whereby addition is carried out only in rows of the carrysave array whose bitproduct is nonzero. Simulation results are presented for a transistorlevel, 8bit x 8bit implementation which shows an averagecase energy consumption of 73pJ with an average delay of 30.5ns.

Last updated Monday, 07Sep1998  Edited by Jan Madsen 