More recent articles: see Google Scholar.

2018-10 Randen - fast backtracking-resistant random generator with AES+Feistel+Reverie (Github)
Unpredictable and backtracking-resistant random number generation faster than MT19937.

2016-03 Fast strong hash functions: SipHash/HighwayHash (Github)
High-speed hashes with thorough mixing and near-cryptographic strength. Provides SipHash and a tree version (1.5x and 4.2x speedup) plus "HighwayHash" (10x speedup).

2011-10 Efficient Algorithms for Large-Scale Image Analysis
PhD thesis demonstrating the feasibility of analyzing gigapixel images within minutes on a single workstation. Introduces seven new algorithms for various stages of the analysis pipeline that outperfom previous techniques by factors of 10-100 while maintaining output quality.

2011-09 Engineering the Ideal Gigapixel Image Viewer [bibtex]
Smooth pan and zoom in gigapixel images via lossless compression, asynchronous I/O and shaders.

2011-08-31 Lossless asymmetric single instruction multiple data codec [bibtex]
Novel SIMD predictor and entropy coder: 50% compression and 3 GB/s (per core) decompression.

2011-08 Engineering a Multi-Core Radix Sort [bibtex]
Expanded version published at EuroPar 2011; 10% speedup vs. 2010-08-17 technical report.

2010-09 Highly optimized weighted-IHS pan sharpening with edge-preserving denoising [bibtex]
Pan sharpening: fast (100 MPixel/s) and high-quality (reduced noise, adaptive weights).

2010-09 Fast, High-Quality Line Antialiasing by Prefiltering with an Optimal Cubic Polynomial [bibtex]
Software line rasterizer with optimal low-pass filter; outperforms mid-range GPUs.

2010-09 Highly Efficient Screening for Point-Like Targets via Concentric Shells [bibtex]
Asymptotically optimal pipelined divide and conquer algorithm for finding point-like objects.

2010-08-17 Faster Radix Sort via Virtual Memory and Write-Combining [bibtex]
Sort throughput > 88% of memory bandwidth (1.24x speedup vs. a Fermi GPU).

2009-03-27 An Efficient Parallel Algorithm for Graph-Based Image Segmentation [bibtex]
Fast but high-quality image segmentation, made possible by a new parallel algorithm that
doesn't just chop images into tiles.

Additional Information: Paper, Presentation, Poster

2008-02 Determination of Maximally Stable Extremal Regions in Large Images [bibtex]
Efficient algorithm for extracting MSERs (e.g. for image segmentation).

2007-06-10 Timing Pitfalls and Solutions [171 KB]
Describes PC timing hardware, their pitfalls concerning reliable, high-resolution timing, and a solution.

2007-03-29 Automatische Gebäudemodellierung aus Laserscanning-Daten [DE, 2443 KB]
Diploma thesis: an algorithm for automatic building reconstruction from point clouds.

Additional Information: Presentation and Video

2006-03-26 Speeding up Memory Copy [135 KB]
A drop-in replacement for VC7.1's memcpy that is 3.5 times as fast on an Athlon XP.

2006-04-07 Optimizing File Accesses via Ordering and Caching [145 KB]
Study thesis: how to speed up file loading by a factor of 10.

2002-11-10 Introduction to Program Optimization [12 KB]
A quick rundown on optimizing for size and speed.