Rudi Cilibrasi, Ph.D.
Sacramento, CA, USA
email: cilibrar@gmail.com
research URL: http://cilibrar.com/
Education:
PhD, Computer Science, University of Amsterdam, Holland (2007)
BS, with Honors, Computer Science, Division of Engineering and
Applied Science
California Institute of Technology, Pasadena, CA (1996)
Publications:
- R.L. Cilibrasi, P.M.B. Vitanyi, The Google Similarity Distance, IEEE
Trans. Knowledge and Data Engineering, 19:3(2007), 370-383.
- R. Cilibrasi, P.M.B. Vitanyi, A New Quartet Tree Heuristic for Hierarchical Clustering, EU-PASCAL Statistics and Optimization of Clustering Workshop, 5-6 Juli 2005, London, UK.
- R. Cilibrasi, R. de
Wolf, P. Vitanyi. Algorithmic clustering of music based on string
compression, Computer Music J., 28:4(2004), 49-67.
- R.
Cilibrasi, P. Vitanyi. Automatic Extraction of Meaning from the Web.
IEEE International Symposium on Information Theory, Seattle, Washington, 2006.
- R. Cilibrasi, Zvi Lotker, Alfredo Navarra, Stephane Perennes, Paul Vitanyi. Lifespan of Peer to Peer Networks: Models and Insight, 10th International Conference On Principles Of Distributed Systems (OPODIS 2006), December 12-15, 2006, Bordeaux Saint-Emilion, France, Lecture Notes in Computer Science, Vol. 4305, Springer Verlag, Berlin, 2006, 290-305
- R. Cilibrasi, P.
Vitanyi. Clustering by compression, IEEE Trans. Information
Theory, 51:4(2005), 1523 - 1545.
http://xxx.lanl.gov/abs/cs.CV/0312044.
- R. Cilibrasi, L. van Iersel, S. Kelk, J. Tromp.
On the complexity of several haplotyping problems,
to appear in proceedings of WABI2005.
http://arxiv.org/pdf/q-bio.GN/0505023.
- R. Cilibrasi, L. van Iersel, S. Kelk, J. Tromp.
On the complexity of the Single Individual SNP Haplotyping Problem,
Feb 2006: Accepted to appear in Algorithmica.
http://arxiv.org/pdf/q-bio.GN/0508012.
- R. Cilibrasi, P. Vitanyi. Similarity of Objects and the Meaning of Words,
Proc. 3rd Conf. Theory and Applications of Models of Computation (TAMC),
15-20 May, 2006. http://arxiv.org/pdf/cs.CV/0602065
- R. Cilibrasi Domain Independent Hierarchical Clustering
Nieuwsbrief van de Nederlandse Vereniging voor Theoretische Informatica, 2004, nummer 8.
- J. Tromp, R. Cilibrasi, The
Limits of Rush-Hour Complexity, March 23, 2006.
http://arxiv.org/pdf/cs.CC/0502068.
- T.Roos, T.Heikki, R.Cilibrasi, P.Myllymaki.
Compression-based Stemmatology: A Study of the Legend of St. Henry of Finland,
(tech. report number HIIT-2005-3), 2005.
- R.Cilibrasi. What is Java, Really? In Linux Journal, October 1996
- R. Cilibrasi gave an expert consultation to D. Butler on consumer grid computing for Nature, published in November 2006: Amazon puts network power online
Work Experience:
Research Consultant
Contract programming and scientific consultations
Developed advanced algorithms for natural language processing, data
mining, and website architectures for various clients. (2007-2008)
FHP Wireless
Contract Developer
Worked with internal developers to create new variant of WEP encryption that
accelerated wireless traffic by moving encryption off the wireless card
firmware and into the Linux kernel, then subtly modifying the encryption
algorithm to allow certain types of precalculations to enable turbo
charged demonstrations ahead of critical-path wireless-card hardware
availability.
(1/02-4/02)
Cranite Systems
Lead Software Developer
Worked in a small team of engineers to create a wireless security
appliance. Our product consisted of a customized Linux kernel and
complementary Windows NDIS intermediate driver to provide for transparent
Layer 2 encryption and tunnelling using AES and TLS authentication. We
also had several user-level demons running to control access based on
a Microsoft RADIUS authentication server and centrally administrated
security policies. I contributed code to many areas of this project, but
most extensively to the AES encryption core functions and the
packet-manipulation primitives in C and C++.
(1/01-10/01)
Weema Technologies Inc.
Chief Technology Officer / Lead Software Architect
Created a Linux-based high-scalability TCP/UDP high-reliability streaming media
server in several tens of thousands of lines of C++. Made kernel patches
against 2.2 and 2.3 Linux kernels to support very high efficiency single
threaded live media serving via new socket-send system call. Designed and
implemented internal and external cryptographic security solutions, including
firewalling, tunnelling, and custom protocol design using RC4, MD5, and other
protocols. Developed real world knowledge of network failure modes under very
high loads.
(1/00-1/01)
Narus Inc., as offsite
solution architect. Duties included Unix systems-level redesign, advanced
Perl integration, website strategic planning, and miscellaneous
consultation.
(8/98-9/99)
M.S.Young and Associates,
as legacy-systems migration solution architect. Projects included a
Unix pseudoterminal based Web WAN replacement system, working in
conjunction with DuPont Chemical,
AutoImpact
, and other strategic partners under a high-value contract.
Also created artificial intelligence and simulation-modelling prototyping.
Finally, ongoing offsite Linux system administrator.
(3/97-7/98)
Idealab, as developer
and development manager for very fast-paced internet development
projects. Projects included Java and CGI-based websites, data
compression, steganography, and adaptive statistical methods.
(6/96-2/97)
Tanner Research, a
Pasadena-based VLSI software and research company. One of my tasks was to
design and implement new advanced techniques for rapid design-rule checking
systems. Another duty was to research and critique various strategies for
rewriting one of their major software products. Evaluated several
container-class framework approaches and solutions, weighing
criteria such as run-time efficiency, memory-efficiency, portability,
and (most importantly) long-term internal benefits, such as
flexibility, safety, and maintainability. (6/95-9/95)
Caltech as Assistant Head TA for CS 1,2, and 3, a
series of introductory courses designed to teach the rudiments
of C programming (CS 1), data structures (CS 2), and parallel
programming (CS 3). Responsibilities included recruiting,
interviewing, selecting and hiring a team of 16 TA's. Additionally,
designed and implemented course material and labwork, delivered
biweekly lectures, and provided hands-on help for students during
laboratory periods. (9/95-6/96)
Contract developer / Linux consultant / Private teacher , as time,
opportunity, and interest allows. (3/95-present)
Consulted for Bacchus Inc. Primarily developed graphics
support tools in C++. Functionality included: file-format conversion
from 3D Studio to DEC OFF format; binary space partitioning algorithms
using heuristics to minimize polygonal splits; illumination models for
shading. (9/94-6/95)
Microsoft as a software developer in
the Advanced Technology department, specifically the 4D Graphics
Group. My duties included writing a supporting math library in
C++, and leading a team to write a real-time interactive 3D graphics
demo for Bill Gates. As core programmer, delegated subtasks to other
members of the team as appropriate. Postscript
review available.
(Rating: Outstanding) (6/94-9/94)
Knowledge Adventure Inc. a software company in Pasadena.
Knowledge Adventure specializes in multimedia educational games for
use on the PC platform. My duties were primarily C++ and C
programming tasks. (1/94-6/94)
Green Hills Software in Santa Barbara as a software
engineer / technical support representative. Green Hills makes
programming tools; Their main products are multiplatform compilers.
My duties included determining whether reported bugs were, in fact,
bugs in our compiler, or simply a result of customer error. Isolated
and aided in fixing bugs. (6/93-9/93)
Special Skills:
Strong grasp of machine learning, artificial intelligence, and
scientific computing via extensive research using data compression, Support
Vector Machines, neural networks, simulated annealing, optimization, Minimum
Description Length theory, and other techniques. Applied to diverse
areas such as computational linguistics, music analysis, radio astronomy,
bioinformatics, and automated ontological reasoning using Google.
Programming in a variety of languages, including
Ruby, Perl, C++, Java, C, Python, BASIC, SQL, Lisp, Pascal, Prolog, and some
assembly languages, including 680x0 and 80x86. Experience with UNIX,
MS-DOS/MS-Windows, Amiga, and Macintosh operating systems. Experience
with most major development environments including Sun's JDK,
Microsoft Developer Studio (Visual J++, Visual C++, Visual Basic),
Microsoft Access/SQL, most open-source tools, etc.
Understand UNIX, as user, developer, and
system administrator. Experience with shell programming in sh,(t)csh, ksh, and
zsh. Fluent with large variety of tools; for software development,
autoconf/automake, bison/yacc, (f)lex, svn/cvs/rcs, make, python and m4,
to name a few. Clear knowledge of networking from the link layer on up;
have used TCP and UDP protocols extensively for fun and profit. Experimented
with raw packet analysis and synthesis as a hobby. Built a security tool to
monitor a subnet for all active telnet sessions, demultiplex them, and
maintain in memory a set of virtual vt100 screens which could be displayed
using a curses interface, enabling real-time monitoring of all sessions
or eavesdropping. (to demonstrate a good reason to move to ssh)
Deep understanding of computer from VLSI layout / transistor level
up to high-level languages; have created layout and simulated a CPU and a
floating point processor using the Magic toolset. Have designed and built on
wirewrap hardware a 80186 based computer with dynamic memory refresh, keyboard,
and serial port. Also created assembly language operating system from scratch.
Have worked with device drivers and driver-like software on many platforms,
including graphics coprocessor instruction sets on the Amiga computer,
hardware ethernet MAC addresses in the Linux kernel (changes incorporated
into source tree), terminate-and-stay-resident programs under DOS to
intercept keystrokes, etc.
Strong understanding of cryptography - in particular,
public and private key encryption, authentication, one-way hash
functions, electronic cash, key-exchange protocols, and elliptic curve
cryptography. Understand many integer factoring algorithms such as
Pollard Rho and Multiple Polynomial Quadratic Sieve. Familiar
with theory as well as practice of all popular encryption algorithms.
Clear grasp of MD5, RSA, DES, IDEA, RC4, ECC, and others. Sent bug fixes
to Bruce Schneier that were incorporated into Applied Cryptography 2nd Edition.
Excellent knowledge of free software with predilection
towards GNU tools in general. Specialized knowledge about Linux
in particular. Made contributions to the Linux kernel. (v. 1.3.38
and later) Maintain several open-source projects, including
Good understanding of computer graphics.
Have implemented several algorithms ranging from ray tracing to
binary space partitioning to palette quantization, both for
employment and as a hobby. Have also done anti-aliasing transforms,
fractals, and vector-based image morphing. Have also done computer
music at Microsoft and as a hobby, using waveform synthesis. Have
worked with voice synthesis and recognition since 1992.
Data Compression - implemented several
``classical'' data compression algorithms, and wrote some variants of
my own. Also took EE150, Data Compression, by Dr. R. J. McEliece, and
won a cash prize by creating a compression algorithm slightly better (but
much slower) than gzip on an unknown sample file set. Recreated
Claude Shannon's famous experiment to determine the entropy of the English
language using the internet, and published these results which were later
used by a professor at the University of Nevada in a class project.
Automated theorem proving - spent several months researching
methods of theorem proving in first-order predicate logic. Experience
with many different models for automated deduction, including diverse
tasks ranging from game-tree searches with pruning to neural nets.
Have also applied genetic algorithms to cryptanalysis of simple substitution
cipers with some interesting results regarding the relationship between
evolution speed and artificial punctuated population barriers.
Awards:
Appeared in Marquis Who's Who in Science and Engineering, 2006-2007 as
a notable figure in science and the creator of the CompLearn open source
data mining software.
Won fifth, third, second, and finally first place in
successive years in the Association for Computing Machinery programming
contest at the regional (Southern California and Nevada) level, as part of
a three-person team. (freshman-senior year) Competed in worldwide
ACM programming contest representing Caltech in 1996, finally finishing
within the top twenty teams in the world that year.
Won $10,000 Microsoft Scholarship in sophomore
year. This scholarship was awarded to a single student from
each participating school, for merit in computer and computer-related
fields.
Won first place in Caltech EE150.2 Data Compression
program contest. First prize was $500 cash, paid to the team or individual
who wrote the best compression program. The competitors were primarily
seniors and graduate-level students. (freshman year) Taught by Dr. McEliece.
Won the Rensselaer Medal, an award given to one
junior at each high school for outstanding achievements in science
and math.