Last updated Monday, 07-Sep-1998 Edited by Jan Madsen

Session 1: Opening

1.1 "Welcome"
Vice-Rector, Prof. Knut Conradsen

1.2 "The PATMOS 1998 Program"
A.-M. Trullemans

1.3 "Multiple Vth CMOS for Leakage Control in Deep Submicron IC's"
Kaushik Roy, Liqiong Wei

1.4 "Efficient Procedures for Minimizing the Standy Power in Dual Vt CMOS Circuits"
Sarma B.K. Vrudhula, Qi Wang

In this paper we present efficient procedures for delay constrained minimization of the power due to leakage in CMOS digital circuits for a dual threshold voltage (Vt) technology. The availability of two or more threshold voltages on the same chip provides a new opportunity for circuit designers to make tradeoffs between power and delay. We present two efficient procedures that take as input a gate level netlist and assign the proper threshold voltage to each transistor so that the leakage power is minimized but without violating the delay constraints. Experimental results on the MCNC91 benchmark circuits show that up to one order of magnitude power reduction can be achieved without any delay increase when compared to a circuit where all transistors are low Vt devices.


Session 2: Cell design and libraries

2.1 "An Innovative Methodology for the Design Automation of Low Power Libraries"
Roberto Zafalon, Nicola Dragone, Carlo Guardiani, Pascal Meier

A new methodology for the design of low-power standard cell libraries is presented. The proposed approach addresses power consumption at various steps in the design flow, applying new design automation algorithms and incorporating innovative cell designs. CAD techniques are used to speed development of the library, allowing for quick analysis of power and delay characteristics, with subsequent feedback for redesign. The effectiveness of the proposed flow is demonstrated on several benchmark circuits implemented in a 0.35mm CMOS, 1.8 volt standard cell library designed using this methodology.

2.2 "Piece wise linear performance modeling of submicronic CMOS library"
M. Rezzoug, D. Auvergne

Shortening of design cycles is currently obtained by designing digital systems at gate or cell level, using precharacterized gate or cell delays to speed up the performance analysis. Modeling timing library format is one of the most difficult task of library designers. We present in this paper a piece wise linear approximation of the delay performance equations of submicronic CMOS library, considering input to output coupling, loading and input wave form induced non linearity. Application is given to a 0.25Ám CMOS process for which linear performance equations are deduced for the generally used design range. Limits of linear representations are clearly identified. Validations are obtained through comparison with Spice simulations.

2.3 "Application of Sensitivity Analysis in Modelling Power and Delay for HFET DCFL Circuits"
Javier Garcia, Javier del Pino, Benito Gonzalez, A. Hernandez, Antonio Nunez

2.4 "Estimation of layout densities for CMOS digital circuits"
Lionel Torres, Daniel Auvergne, F. Moraes, Michel Robert

The transistor density is one of the parameters to be considered for an optimal use of CMOS process. Therefore, layout strategies have to be evaluated through metrics considering all the involved parameters. The objective of this paper is to study the real transistor density available for a given technology at the cell and circuit levels. This study has three main interests: (i) to evaluate the quality of the layout synthesis tools in terms of area; (ii) to give a feedback to the logic synthesis step, allowing an accurate area prediction from the gate level abstraction; and (iii) to determine a transistor density roadmap.


Session 3: Gate Level Time and Power Modeling

3.1 "CMOS inverter input-to-output coupling capacitance modelling for timing analysis"
Jorge Juan-Chico, A.J. Acosta, A. Barriga, M.J. Bellido

An accurate model for the input-to-output coupling capacitance of the CMOS inverter is proposed. This model takes into account capacitance contributions derived from sub micron sizes, which are neglected in traditional approximations. Model predictions are within 10% of HSPICE results for a wide range of inverter configurations. Excellent results are obtained when applied to the evaluation of the overshoot time, making it suitable to improve the timing analysis of CMOS inverters.

3.2 "Impact of the Miller capacitance on Power Consumption"
Atila Alvandpour, Per Larsson-Edefors, Christer Svensson

The impact of Miller capacitance on power consumption of digital CMOS VLSI circuits is investigated and analyzed. The fixed Miller, input and output gate capacitances have been compared to the more accurate voltage-dependent counterparts. The result shows that the Miller capacitance amplification factor of digital circuits can have lower values than the common, yet practically valid assumption which predicts a factor of two for each high-to-low or low-to-high output transition.

3.3 "A Metric for the Capacitive and Short-Circuit Transition Energy at Logic Level"
S. Manich, Joan Figueras i Pamies

In this paper, a metric for the capacitive and short-circuit transition energy of a circuit, described at logic-level, is proposed. The metric is based on the well known Weighted Switching Activity estimator which is improved to include the short-circuit component of the transition energy. The short-circuit equation is obtained by the integration of Sakurai's alpha -power model of the transistors. The main parameters of the metric are the weights of the gate's input and output nodes and the normalized pentode resistances of the P and N nets. The model proposed has been validated by means of the HSPICE electrical simulator. A technology of 0.7um ECPD07 of ES2 has been selected using level 13 equations. The results show that if the weight of the inputs is kept below 20 the relative error is smaller than 11%. Additionally, the proposed model is used to compute fast upper and lower bounds of the total transition energy.

3.4 "Characterization of the Energy Loss during the Power-on of CMOS Circuits"
Josep Rius, Antoni Ferre, Joan Figueras i Pamies

This paper characterizes the energy consumed by a full complementary digital CMOS circuit during its power-on phase. In the analysis, the energy dissipation in the conducting transistors has been taken into account as well as the loss due to leakage and the energy stored in the circuit. The analysis shows that it is possible to obtain important savings in energy by controlling the rise time t_r of the power supply voltage during power-on (adiabatic power-on). However, the analysis also shows that the power savings are not proportional to the increase in the rise time due to the unavoidable energy loss produced when the digital circuit is configured. Thus, the energy consumed during the power-on process is not proportional to t_r^(-1) but it tends to be constant. In addition, when the rise time is very long, the energy consumed during the power-on actually increases because the leakage power becomes significant.


Session 4: Panel

"1997 SIA Roadmap and the implications on the design methods and CAD tools for the future"
Christian Piguet


    Daniel Auvergne, LIRMM, FRANCE
    D. Flandre, UCL-DICE, BELGIUM
    Jim Garside, Univ. of Manchester, UK
    Alain Guyot, INPG-TIMA, FRANCE
    Wolfgang Nebel, University of Oldenburg and OFFIS, GERMANY


Session 5: High Level Power Modeling and Optimization

5.1 "Innovative Power Reduction Techniques for SAR ADC's"
Colin Lyden, John Doyle, Kevin Gallagher

5.2 "Time-Multiplexed Dual-Rail Protocol for Low-Power Delay-Insensitive Asynchronous Communication"
Roberto Saletti, M. Storto

A protocol for delay-insensitive asynchronous communication based on the time-multiplexing of two bit on a dual rail line is proposed and analyzed. Compared to conventional dual-rail protocols, it halves the number of bus wires and, according to simulations, significantly reduces energy consumption with small throughput loss and nearly no area overhead.

5.3 "Behavioural Modelling of Asynchronous Systems for Power and Performance Analysis"
Philip Endecott, Steve Furber

Conventional hardware description languages do not provide all the facilities required for efficient behavioural modelling of asynchronous systems. This paper presents a new HDL incorporating CSP-like channel communication and other features making it more suitable for this task. A model of a simple microprocessor is used to illustrate how the language and tools can be applied to real design problems. We use the tool to investigate the complex relationships that can exist between the speed of individual blocks and the system's overall performance, and look at power modelling based on channel activity.

5.4 "Low-Power Gated-Clock Schemes Usign High-Level VHDL Modelling"
Bernard Steenis, Christian Piguet

The introduction of VHDL and high-level tools has reduced the design time of VLSI circuits. These tools are however not adapted to low-power techniques. This paper presents a methodology to adapt the tools to the low-power gated-clock technique, and adds some improvements to the gated-clock, for further power consumption reduction and design simplicity.


Session 6: Architectural Techniques and ROM Modeling

6.1 "Architectural Techniques for Design of Energy-Efficient Video Encoders"
Vasily Moshnyaga

6.2 "Power Consumption of On-Chip ROMs: Analysis and Modeling"
Lars Kruse, Eike Schmidt, Ed. Huijbregts, Gerd Jochens, Wolfgang Nebel

This paper addresses the problem of modeling the power consumption of on-chip ROMs for gate-level and RT-level power estimations. A route to memory power model development is presented that is also applicable to other memory architectures. The model proposed operates within an error margin of less than 5%.

6.3 "An Accurate Power and Timing Modeling Technique Applied To A Low-Power ROM Compiler"
Ben Ammar, Arnaud Turier

Large memories require too much resources to be fully extracted and simulated. In this paper we present an accurate power and timing modeling technique that reduces significantly characterization resources. The model is based on current controlled generators, and can be used in several regular structures. Low-power memories are easier to characterize using this technique, since they are block partitioned. Moreover, this modeling technique is fully parameterable and is very useful to efficiently develop and optimize memory compilers. It has been successfully applied to characterize and improve performances of a low-power ROM compiler.


Session 7: Power Simulation and Estimation

7.1 "A Reconfigurable Power-Simulator Based on an Accurate Execution Model"
Jaap Smit

7.2 "A Stream Compaction Technique Based on Multi-Level Power Simulation"
Enrico Macii, A. Macii, M. Poncino, R. Scarsi, Luca Benini, Giovanni De Micheli

One way of minimizing the time required to perform simulation-based power estimation is that of reducing the length of the input trace to be simulated, at the price of the introduction of some errors in the estimation results.

Existing techniques exploit the knowledge of some statistical and/or spectral information about the original input trace to generate a shorter stream that matches such characteristics as much as possible; this is with the objective of minimizing the estimation error.

Very often, however, the stream to be simulated consists of validation patterns provided by the designer, whose power consumption may vary sensibly over time as the system responds to the inputs. In these cases, classical stream compaction solutions are not very suitable, and may result in unacceptable errors.

In this paper, we introduce a compaction technique that specifically targets user-provided traces characterized by a large variance of the average power dissipation over time. The proposed approach leverages an existing multi-level power simulation engine that can be used for accurate estimation of input streams of this type.

The effectiveness of the proposed compaction procedure is demonstrated by the experimental results we have obtained on the complete set of the Iscas'85 combinational benchmarks.

7.3 "Accurate Data Path Models For RT-Level Power Estimation"
Spyros A. Theoharis, C. E. Goutis, G. Theodoridis, Dimitrios Soudris

Power models for RT-Level power estimation of common components, exploiting their functionality, regularity, symmetry, and separability are introduced. The accuracy of the proposed models is higher than the existing ones, as it is identical to that of a real delay gate level power estimator. The power model of K-bit Ripple Carry Adder is presented in detail manner. Comparison results prove that the computational complexity of the proposed models is similar or lower than that of the existing ones for the majority of the components.

7.4 "On the Problem of FSM Networks Power Estimation Capability"
Lilia Kashirova, A. Karabanov

We propose a power estimation technique for controllers that operates at the register-transfer level to provide early warning of a power problems. Our estimator is based on the use of entropy as a measure of the average activity in the final implementation of a circuit given FSM network description. FSM networks are constructed from a partition of a initial FSM state transition graph (STG). The technique has been implemented and tested on a variety of benchmarks.


Session 8: Timing Issues and Tools

8.1 "POPS: a new tool for path evaluation"
Severine Cremoux, Nadine Azemard-Crestani, Daniel Auvergne

8.2 "False Path Analysis in Sequential Circuits"
Karem Sakallah, Jeffrey L. Bell

8.3 "Retiming and Clock Scheduling for High-Performance Synchronous Circuits"
Marios C. Papaefthymiou, Eby G. Friedman, Xun Liu

This paper investigates retiming and clock skew scheduling for improving the performance of synchronous circuits. It is shown that when both long and short paths are considered, circuits optimized by the simultaneous application of retiming and clock scheduling can achieve shorter clock periods than optimized circuits generated by applying either of the two techniques separately. A novel mixed-integer linear program is given for the problem of simultaneous retiming and clock scheduling with a target clock period and tolerance to delay variations under setup and hold constraints. Experiments with LGSynth93 and ISCAS89 benchmark circuits demonstrate the effectiveness of simultaneous retiming and clock scheduling. For one third of the test circuits, the operating frequency increased by at least 14% over the optimized circuits obtained by applying retiming or clock scheduling separately.

8.4 "AStErIx: An Interactive Environment for Logic Synthesis and Analysis"
Ricardo Dos Santos Ferreira, A.-M.Trullemans

We present an interactive environment, named AStErIx, to help the development of new logic synthesis algorithms and visualize graphically the synthesis results. This environment is based on public domain tools (Sis and LEDA), with added facilities to improve the programming support (based on the Obelix library), and a graph interface (Idefix) to analyze the experimental results. The whole system is governed by a friendly interface (Panoramix) to call interactively the logic synthesis commands and scripts.


Session 9: Interconnections & Technology

9.1 "Modeling ramp duration with interconnect wire"
Denis Deschacht, Eric Vanier

We present in this paper new analytical formulations defining the ramp duration at the entrance and at the end of an interconnection wire in a CMOS structure. With this method, it is possible to describe the signal degradation with respect to the line characteristics. Moreover, waveform information is necessary to model accurately the non linearity of the delay with input ramp effects. Waveform estimated using our method is within 10% of Spice and is over two orders of magnitude faster than that of the simulations.

9.2 "Modeling the Effect of Wire Resistance in Deep Submicron Coupled Interconnects for Accurate Crosstalk Based Net Sorting"
Carlo Guardiani, Christiano Forzan, Bruno Franzini

It is well known that in deep submicron technologies the coupling capacitance between adjacent wires is a critical portion of the total wire capacitance, while at the same time the capacitance between wire and substrate has become the fringing component. High frequency signals travelling across multiple level interconnect structures generate proximity effects, i.e. crosstalk effects, between adjacent wires. Such effects include delay and noise injection and are a serious performance limitation in deep submicron VLSI circuits. An analytical model of the crosstalk effects would be extremely useful both in the design front-end and in the design back-end. For instance, a net ranking procedure based on such model could efficiently identify potential signal integrity problems between nets. A compact model of the coupled noise pulse amplitude which improves considerably the simple charge sharing model has been proposed in [4]. In our paper we will demonstrate that such model turns out to be quite inaccurate in several cases that often occur in practical circuits, because it does not consider the wire resistance. Moreover we will introduce an heuristic technique that allows to take into account the resistive effects, thus achieving a considerable accuracy improvement at an equivalent computational cost.

9.3 "Accurate Junction Capacitance Modeling for Substrate Crosstalk Calculation"
Mathias Klemme, E. Barke

This paper presents a modeling technique to be used in substrate coupling computation of mixed-signal circuits. Substrate crosstalk can negatively impact the performance of analog circuit elements such as mixers or amplifiers and can lead to a total breakdown of circuit parts. In order to forecast substrate-related problems and to analyze its influence EDA-tools are needed that can be applied during layout verification. Depending on the degree of accuracy interfaces between the designed elements and the substrate region have to be modeled. A very important issue is the calculation of junction capacitors. We propose a method that is capable of accurately determining lumped junction capacitors without measurement or device simulation.

9.4 "Performance/Complexity Space Exploration: Bulk vs. SOI"
Selim J. Abou-Samra, Alain Guyot

Performance and complexity are considered here as two orthogonal axes. Performance metrics are recalled. Then different complexity metrics and scales are proposed. Different definitions of complexity are used depending on the considered level of abstraction. Finally, SOI and bulk CMOS technologies are compared in this space.


Session 10: Adiabatic

10.1 "Some aspects on adiabatic switching"
Josef Nossek

10.2 "Optimal Charging of Capacitors"
B. Desoete, Alexis De Vos

Finite-time thermodynamics is illustrated by some examples of process optimalization. The objective is minimum entropy production during an electrical process, i.e. the charging of a capacitor. Non-linear electronic processes are investigated in particular, because of their importance in computers and displays.

10.3 "Low Power Conservative Circuits in CMOS"
Bela A. Frigyik, Arpad Csurgay, Ferenc Kovacs

Low-loss circuits composed of CMOS transmission gates as switches are proposed. Functional instability caused by parasitic capacitors and leakage current is overcome by connecting each wire to the ground through the inverse of the logic function which connects it to the quasi-adiabatic clock. A canonical retractile PLA (Programmable Logic Array) and an autonomous FSM (Finite State Machine) is presented.


Session 11: Asynchronous

11.1 "Design of Speed-Independent CMOS Cells from Signal Transition Graph"
Christian Piguet, Jacques Zahnd

11.2 "Estimations of Power Consumption in Asynchronous Logic as Derived from Graph Based Circuit Representations"
Lee Lloyd, Albert. M. Koelmans, Enric Pastor, Alex V. Yakovlev

We present a technique for the estimation of power consumption in asynchronous circuits through the modelling of transition switching activity. Unlike most existing techniques for analytic (non-simulation) power estimation that use reachability state traversal and Markov chain analysis, our method is based on an invariant analysis of Petri net models using matrix representations. This approach is in general more efficient than Markov chain analysis, due to the avoidance of state explosion, but may lose accuracy for some classes of nets. The asynchronous circuits under analysis are speed-independent designs that are synthesized from Signal Transition Graph descriptions using the tool Petrify.

11.3 "Verification driven synthesis of asynchronous circuits from STG specification"
Ilya V. Klotchkov, Alexander B. Smirnov

This paper presents a new methodology for synthesis of asynchronous circuits, which employs the partial states derived from STG rather then total ones. The approach described in this paper is devoted to a synthesis methodology from large STG specifications of control dominated circuits in the concurrent systems and has advantage of reduced processing time comparing to other methods.

11.4 "A low-power, high-speed stack controller designed using asynchronous techniques"
Francesco Pessolano, MB Josephs

A stack controller design for use with conventional register files is evaluated. The design is based on asynchronous circuit techniques, but is suitable for use both in asynchronous and in synchronous environments. In comparison with another compatible asynchronous design it yields higher speed and lower energy consumption. In addition, a comparison with a gated-clock design is provided: the asynchronous solution is smaller (93% of the transistors), consumes significantly less energy per operation (56% according to simulation) and can operate within the same clock cycle time in 0.5um CMOS technology.


Session 12: Arithmetic & Processors

12.1 "Low-Power VLIW Processors: A High-Level Evaluation"
Jean-Michel Puiatti, Josep Llosa, Christian Piguet, Eduardo Sanchez

Processors having both low-power consumption and high-performance are more and more required in the portable systems market. Although it is easy to find processors with one of these characteristics, it is harder to find a processor having both of them at the same time. In this paper, we evaluate the possibility of designing a high-performance, low-consumption processor and investigate whether instruction-level parallelism architectures can be adapted to low-power processors. We find that an adaptation of high-performance architecture, such as the VLIW architecture, to low-power 8b or 16b microprocessors yields a significant improvement in the processor's performance while keeping the same energy consumption.

12.2 "A Low-Power Multiprocessor Architecture For Embedded Reconfigurable Systems"
Christophe Amerijckx, Jean-Didier Legat

In this paper, we introduce the architecture of a new embedded field programmable processor array (E-FPPA) which consists of a low-power multiprocessor system embedded with standard programmable logic blocks and memory. Each block (processor, programmable logic,...) is coupled to a transfer controller (TC) responsible of all the transfers between blocks. Instead of using a classical crossbar interconnection network, we propose a low cost hierarchical ring which combines simple interface and high performance communications when data locality is observed. Based on the E-FPPA, high performance reconfigurable systems can be easily built and we demonstrate that this architecture is an interesting alternative to traditional DSP for low-power applications. By using the 8-bit CoolRisc processor [1,2], an E-FPPA including a cluster of 16 processors, 16 TC, working respectively at 25 and 50 MHz, and 1kbytes data SRAM for each processor, consumes 2 W with a peak performance of 1200 Mops. The chip size has been evaluated in 0.35 Ám to 52 mm2.

12.3 "Low-Power Design of Asynchronous Microprocessors"
Jean-Luc Nagel, Christian Piguet

12.4 "A Self-Timed Multiplier using Conditional Evaluation"
V. A. Bartlett, E. Grass

A low-power, self-timed, CMOS array multiplier, optimized for asynchronous DSP but also applicable to synchronous DSP applications is presented. In order to reduce average power consumption, a strategy termed conditional-evaluation is introduced whereby addition is carried out only in rows of the carry-save array whose bit-product is non-zero. Simulation results are presented for a transistor-level, 8-bit x 8-bit implementation which shows an average-case energy consumption of 73pJ with an average delay of 30.5ns.

Last updated Monday, 07-Sep-1998 Edited by Jan Madsen