DSpace Community:
http://dspace.cityu.edu.hk:80/handle/2031/710
2016-02-09T05:50:11ZConfigurable architectures for mixed high precision floating point arithmetic
http://dspace.cityu.edu.hk:80/handle/2031/8164
Title: Configurable architectures for mixed high precision floating point arithmetic
Authors: Jaiswal, Manish Kumar
Abstract: Floating point arithmetic is widely used in many scientific and engineering computations,
numerical and signal processing applications. Its huge dynamic range and convenient
scaling of the number range provides a convenient platform for designers to realize their
algorithms. However, implementing arithmetic operations for floating point numbers in
hardware is very challenging. Also, due to increasing demand of more high precision
arithmetic, IEEE-754 floating point standard has defined and incorporated the quadruple
precision (128-bit) format, in 2008.
Field Programmable Gate Array (FPGA) are becoming a major competitor for the
high performance computing machines, and even current era Super-Computers are using
the FPGAs to off-load and accelerate the parallelizable complex routines over them.
Since last 2-3 decades, FPGAs are potentially researched and adopted for a large set of
floating point related applications. A significant range of literature are focused on the
design of efficient floating point arithmetic implementations for the FPGA platforms.
Despite several advancement and many implementation strategies, the area requirements
and performance numbers of these arithmetic computations are appears as a main bottleneck,
specially when size increases (from single precision to quadruple precision).
In view of above, a part of the current research work is aimed for the high performance
and area efficient architectures for floating point arithmetic, specially for double and
quadruple precision format, on FPGAs platforms, which can be easily extended for ASIC
synthesis platform. In this thesis, FPGA based architectures for double and quadruple (high) precision multiplication and division arithmetic are proposed, which out-perform
the best available literature works, in terms of area, speed and latency.
A significant portion of this thesis is focused on the development of standard cell
based ASIC (Application Specific Integrated Circuit) architectures for "dynamically configurable
multi-mode multi-precision (mixed) floating point arithmetic". Based on the
IEEE-754 standard formats, three categories of configurable multi-mode multi-precision
architectures, for basic arithmetic (adder/subtractor, multiplier and division), are developed
as: (Dual-mode) Double Precision with dual (two-parallel) Single Precision
(DPdSP) Arithmetic Architecture; (Dual-mode) Quadruple Precision with dual (twoparallel)
Double Precision (QPdDP) Arithmetic Architecture; and (Tri-mode) Quadruple
Precision with dual (two-parallel) Double Precision, quad (four-parallel) Single Precision
(QPdDPqSP) Arithmetic Architecture. These architectures aim towards a unified
multi-mode multi-precision architecture, for better resource utilization. These proposed
architectures, designed for high precision computation, can be dynamically configured
for multiple lower precision computations. These proposed architectures support normal
as well as sub-normal computations. Literature contains very limited work on this area,
and mainly talks for dual-mode architectures, with only normal support. The proposed
dual-mode architectures, for each targeted arithmetic, show a significant benefit over existing
dual-mode works, whereas the tri-mode architectures stands among fresh proposals.
These multi-mode arithmetic architectures are further combined to form multi-mode
multi-precision floating point arithmetic unit (FPU).
This thesis, currently, aimed for the mixed high precision arithmetic architectures
for the standard IEEE-754 formats, however, it can be easily extended for any custom
precision arithmetic architecture.
Notes: CityU Call Number: QA76.9.C62 J34 2014; xviii, 183 p. : ill. 30 cm.; Thesis (Ph.D.)--City University of Hong Kong, 2014.; Includes bibliographical references (p. 170-176)2014-01-01T00:00:00ZPrecoder design for MIMO systems over spatially correlated Ricean fading channels
http://dspace.cityu.edu.hk:80/handle/2031/8148
Title: Precoder design for MIMO systems over spatially correlated Ricean fading channels
Authors: Zhang, Lin (張琳)
Abstract: In response to the considerable increase in mobile data tra±c driven by multimedia
and cloud-based services, multiple input multiple output (MIMO) is one of the most
powerful communication technologies to deal with this continuously growing demand.
By using a precoder, to be designed with the knowledge of channel state information
(CSI) at the transmitter (CSIT) to transform the input signal prior to MIMO transmission, the bit error rate (BER) and data rate can be improved. A precoder designed
with perfect instantaneous CSI can achieve either best BER performance or best data
rate performance. However, perfect CSI is practically unavailable because of estimation errors, feedback delay, and quantization errors. Imperfect CSI can substantially
degrade the system performance. Furthermore, the frequent feedback of instantaneous CSI costs expensive bandwidth overhead. Statistical CSIT, including channel
mean and spatial correlation, is an e±cient measure to CSI. Its slowly varying nature
does not need frequent CSI feedback to the transmitter, so that it can save much
bandwidth overhead. So, the design of an optimal precoder based on statistical CSIT
is of vital importance.
The objective of this thesis is to investigate precoder design methods over spatially
correlated Ricean fading channels for MIMO systems with statistical CSI. Regarding
the statistical CSI feedback, the spatial correlation requires more feedback overhead
than the mean. Also, the estimation of the spatial correlation using training data will
consume bandwidth and will incur delay, apart from computations. Therefore, analytical spatial correlation analysis to derive a correlation expression for given spatial
antenna configurations can reduce feedback and bandwidth in training data.
Clustered channels with a hierarchical angle structure to describe azimuth angle
in terms of the direction of departures (DOD) at the transmitter antenna array and the direction of arrival (DOA) at the receiver antenna array have been used to model
communication channels in standards such as the 3GPP spatial channel model (SCM).
The cluster is a resolvable channel path composed of a number of unresolvable subpaths. In the hierarchical angle structure, the DOAs and DODs of the sub-paths are
expressed as the sum of the cluster's centered angle and the sub-paths' angle offsets. In
this thesis, two different hierarchical angle models are investigated to derive analytical
spatial correlation formulas for clustered channels. The first model assumes that the
centered angles of the clusters are independent Gaussian random variables while the
sub-paths' angle offsets are deterministic as defined by 3GPP SCM. For the above
angle model, existing methods either require many expansion terms or limit clustered
angle spread within a small range to achieve the desired accuracy. This thesis derives
a simplified spatial correlation analysis by using the Gauss-Hermite quadrature, to
avoid numerical integration for uniform linear array (ULA) and uniform circular array
(UCA). Compared with the existing expansion solutions, the number of terms, e.g.,
less than 10, required to generate accurate spatial correlations is much reduced.
The second hierarchical angle model treats the cluster's centered angle and subpaths' angle offsets as random variables. Hence the hierarchical angle is a bivariate,
which is different from the single random variable approach of the first model and
existing methods. It is assumed that the centered angle is Gaussian distributed while
the angle offset is Laplacian distributed. An analytical correlation formula is derived
for the above angle model for ULA and UCA . Computer evaluation shows that the
derived formula matches well with the simulated correlations with channel parameters
defined in the 3GPP SCM. The analytical spatial correlation expressions are useful
for system performance evaluation and precoder design.
In the literature, several precoder design methods using statistical CSIT over correlated Ricean fading channels were proposed. However, these methods can only provide either asymptotic solutions with degraded performance or non-eigen-structured
iterative solutions with slow convergence and high computational complexity. In this
thesis, the eigen-structure of the precoder is exploited to improve the convergence
and computations. This eigen-structure approach is to convert the precoder design
into a joint power allocation and unitary beamforming design problem.
Kronecker correlation model is commonly used for modeling the spatial covariance matrix. Two transmit precoding schemes are proposed for MIMO systems over
correlated Ricean channels with Kronecker covariance matrix. The first scheme deals
with the case of correlated receive antennas' received data and uncorrelated transmit
antennas' transmitted waveform. It is known that the optimal BER based precoder is
the one-dimensional scheme using the largest eigen-mode (rank one) and equal power
control scheme for low and high signal-to-noise ratios (SNRs), respectively. Based on
these asymptotic solutions at low and high SNRs, a simple scheme is proposed that
assumes only two values for power allocation. A bigger value is assigned to the largest
eigen-beam and a smaller value to the rest of eigen-beams. The two power control
values are optimized to minimize a pair error probability (PEP) bound. Simulations
show that the simplified solution can achieve a performance close to the existing
optimal solution with fast convergence speed.
The second scheme handles the general case of correlated transmit and receive
antennas. For this general correlation problem, the PEP bound is used as design
criterion with an average power constraint. Expressing the constrained optimization problem in terms of power control matrix and unitary matrix of the precoder,
the objective function and power constraint become nonlinear functions of the power
control parameters and unitary matrix. This optimization problem suffers from local
solution and convergence. By defining a new set of power constraint variables, the
power constraint is now a linear function of the new power control variables. For
given unitary matrix, the constrained optimization is a convex problem and the new
power parameters can be solved by numerical methods namely interior point method.
For given the power control parameters, we propose to employ the Riemannian optimization method to solve for the unitary beamforming matrix from the Lie group of
unitary space. The above iterative optimization procedure is shown to achieve local
optimal solution and guarantee convergence by computer simulation.
A generalized precoding scheme is proposed to handle the channel covariance
matrix of no specified spatial correlation structure. This general correlation structure can cover any double correlated channel including those of distributed antenna
systems. The existing iterative method that needs to search a full-rank precoding
matrix of large dimension has high computational complexity and slow convergence. Unfortunately, the convergence cannot be guaranteed. Our precoding scheme is also
an eigen-structure based solution composed of power allocation and unitary beamforming. Using a formulation similar to the Kronecker case, power allocation can be
solved as a sequential quadratic programming (SQP) problem and the unitary matrix obtained from the optimization method on Riemannian manifold. In comparing
with the existing method, the proposed method has a much lower matrix dimension
and thus has significant less computation. Simulation results show that the proposed
method can give a local optimal solution with guaranteed convergence.
Notes: CityU Call Number: TK5102.92 .Z45 2014; xx, 120 p. : ill. 30 cm.; Thesis (Ph.D.)--City University of Hong Kong, 2014.; Includes bibliographical references (p. 113-120)2014-01-01T00:00:00ZSRAM-based architectures for high-speed IP address lookup and packet classification
http://dspace.cityu.edu.hk:80/handle/2031/8147
Title: SRAM-based architectures for high-speed IP address lookup and packet classification
Authors: Lu, Ziyan (陸紫妍)
Abstract: When a packet arrives at a flow-aware router in the Internet, the router performs two basic functions, namely IP address lookup and packet classification, to decide how to process the packet. First, the packet header is checked against an access control list and/or firewall to determine whether it will be accepted or rejected. This operation uses multiple TCP/IP header fields to classify packets into flows, and it is called packet classification. Packet classification is also used to support access control, per-flow based quality-of-service provisioning, traffic policing, billing and accounting, and policy-based forwarding for virtual private network (VPN). For the basic packet forwarding, the router uses the packet's IP destination address as the key to look up its routing table to determine the packet's next hop. This operation is called IP address lookup.
IP address lookup and packet classification are the two most computation intensive tasks, and they are often the bottlenecks of packet processing in high-speed routers. For 100 Gbps communication line, the packet arrival rate can be up to 312.5 million packets per second. In this thesis, application-specific hardware architectures to speed up these two operations are presented. An algorithmic RAM-based IP address lookup method called bit-shuffled trie is presented in the thesis. By rearranging the bits of the prefixes, memory efficient index tables can be constructed to support IP address lookup. The address lookup engine can be implemented using pipelined architecture with simple processing logic. The proposed method has superior memory efficiency. The memory cost for a 474K prefixes IPv4 routing table is only 1.1MB, and the memory cost for a 215K 64-bit prefixes IPv6 routing table is about 1.7MB. The exceptional memory efficiency of the proposed method allows us to implement the IP address lookup engine for both IPv4 and IPv6 on a single FPGA device. Incremental updates to the routing table can be handled efficiently. On average, about 8 memory-write operations to the data structures are required to process an insertion or deletion.
In typical algorithmic packet classification methods, the data structure is tailored for the given ruleset. It is common among published algorithmic methods that the worst case number of memory accesses per classification depends on the properties of the ruleset, such as the distribution of the address prefixes and port ranges. As a result, existing methods do not guarantee constant classification rate. A novel multi-pipeline architecture for packet classification is presented in this thesis. The method has outstanding performance in both space and time. The method incorporate the prefix inclusion coding scheme to achieve outstanding memory efficiency. For rulesets with 10 thousand rules, the storage cost of our method is between 16 to 24.5 bytes per rule. The hardware uses fixed-length linear pipelines. Hence, the classification rate is constant regardless of the ruleset properties. To demonstrate the feasibility of the method, the proposed architecture is implemented on a Virtex-6 FPGA and the device can achieve a classification rate of 340 million packets per second (MPPS).
Notes: CityU Call Number: TK5105.875.I57 L8 2014; ix, 129 p. : ill. 30 cm.; Thesis (Ph.D.)--City University of Hong Kong, 2014.; Includes bibliographical references (p. 124-127)2014-01-01T00:00:00ZPerformance analysis and improvement of wireless networks
http://dspace.cityu.edu.hk:80/handle/2031/7986
Title: Performance analysis and improvement of wireless networks
Authors: Zou, Mingrui (鄒明芮)
Abstract: In the recent years, wireless communications have received an explosion of interests.
Early adopters of the wireless networks are primarily in the military and
emergency services areas. With the development of wireless communication technologies,
it becomes common in peoples' daily life. We use wireless devices, such
as mobile phones and laptops to get access to the worldwide Internet. By using
wireless networks, we can share the information in our small building group or
over the world at anytime and anywhere. There are different types of wireless
networks. Our research focus on the performance analysis and improvement of
existing wireless networks, i.e., wireless metropolitan area networks (WMANs)
which is described by the IEEE 802.16 standard and wireless local area networks
(WLANs) which is based on the IEEE 802.11 standard.
We first introduce the background and previous modeling work of IEEE 802.16
and 802.11 networks in detail, respectively. Then we investigate the network performance
of an unsaturated IEEE 802.16 network with the contention-based access
mechanism. To capture the bursty characteristics of BE traffic, we model
packet arrivals at each subscriber station as a Markov modulated Poisson process
(MMPP). Based on the MMPP arrivals assumption, we derive analytical expressions
for the network throughput and packet delay. We validate our analytical
model by comparing with simulation results under various operating parameters.
To demonstrate the benefit of adopting the MMPP arrival process, our analytical
model is compared with previous work in which the arrival of packets is modeled
by a Poisson process.
Finally, we consider improving the throughput of IEEE 802.11 network by randomization
of transmission power levels. Successive interference cancellation can
resolve collisions involving multiple packets and significantly improve the throughput
of an 802.11 network. This technique, however, requires packets to be sent
with different power levels. We first develop a detailed analytical model to determine
the resulting network throughput with the above multiple-packet reception
capability. We then study the problem of determining the optimal probability
distribution associating with these power levels when the network is operated in
infrastructure and ad hoc modes, respectively. In the infrastructure mode, the
problem is formulated as an optimization problem with solution to be broadcast to all the nodes by the access point. In the ad hoc mode, the same problem is
formulated as a mixed strategy game where individual node strategically chooses
the probability distribution of the transmitting power levels so to maximize its
own throughput. We show that the Nash equilibrium of this game is Pareto optimal
and fair. Furthermore, the resulting throughput of the distributed approach
is close to the optimal performance of the infrastructure mode studied earlier.
Notes: CityU Call Number: TK5103.2 .Z67 2013; viii, 88 leaves : ill. 30 cm.; Thesis (Ph.D.)--City University of Hong Kong, 2013.; Includes bibliographical references (leaves [82]-88)2013-01-01T00:00:00Z