Harvard Computer Science Technical Reports for 2014
- TR-01-14 [tr-01-04.ps.gz (415 K), tr-01-14.pdf (268 K)]
Stephen Chong, Christian Skalka and Jeffrey A. Vaughan. ``Self-Identifying Data for Fair Use''
Public-use earth science datasets are a useful resource with the unfortunate feature that their provenance
is easily disconnected from their content. “Fair-use policies” typically associated with these datasets
require appropriate attribution of providers by users, but sound and complete attribution is difficult if
provenance information is lost. To address this we introduce a technique to directly associate provenance
information with sensor datasets. Our technique is similar to traditional watermarking but is intended
for application to unstructured time-series datasets. Our approach is potentially imperceptible given
sufficient margins of error in datasets, and is robust to a number of benign but likely transformations
including truncation, rounding, bit-flipping, sampling, and reordering. We provide algorithms for both
one-bit and blind mark checking, and show how our system can be adapted to various data representation
types. Our algorithms are probabilistic in nature and are characterized by both combinatorial and empirical
analyses. Mark embedding can be applied at any point in the data lifecycle, allowing adaptation
of our scheme to social or scientific concerns.