NDSS'20 Fall Review Cycle Paper #136 Reviews and Comments =========================================================================== Paper #136 DeepAuth: Protecting Integrity of Deep Neural Networks using Wavelet-based Steganography Review #136A =========================================================================== Overall Recommendation ---------------------- 2. Leaning towards reject Writing Quality --------------- 4. Well-written Reviewer Confidence ------------------- 2. Passable confidence Paper Summary ------------- Paper 136 proposes using a wavelet-based steganography approach to determine if a deep convolutional neural network image classifier has been modified after training. The authors study the effects of various parameter settings of several well-known image classifiers such as MNIST, CIFAR-10, and ImageNet. The results shown little degradation in accuracy by adding the steganographic secret to the model. They also show that the weights are not changed significantly by the proposed approach. What is not clear is why not just sign the model if the goal is to ensure that the model has not been altered. Strengths --------- MLaaS is becoming more widespread and ensuring that the correct models are utilized is an important problem. The experiments are thorough. The paper is well written. Weaknesses ---------- Signing the model and its parameters is a much better way to ensure that the model parameters have not been altered instead of trying to bind a steganographic secret to the model parameters. The authors do not discuss a signature-based protection method at all. The authors do not adequately motivate the proposed solution is appropriate in the context of MLaaS. No threat model is defined. Previous authors have proposed adding watermarks to DNNs to claim ownership of the model. Although the authors are proposing using steganography for ensuring the integrity of the model, this previous work significantly decreases the novelty of the paper. Detailed Comments for Authors ----------------------------- Is the proposed steganographic-based authenticity method only applicable to images? On page 1, are the authors suggesting that the attacker is directly perturbing the MLaaS model’s weights using poisoning? Typically, in poisoning attacks, the attackers inject data into the labeled trained examples in order to achieve a desired prediction for similar samples. The authors do not define a threat model. Using a signature-based authenticity scheme seems to be much more reliable that inserting secrets into the model. In addition, it does not alter the model’s accuracy. Since authors are motivating this research using MLaaS as the attack scenario and the pretrained model is presumably run on the backend, using a signed model has much stronger security guarantees than adding a steganographic secret. I would suggest the authors devote much more time trying to develop the attack scenario. Fig. 3 very lightly touches on it, but the one paragraph discussion is not convincing. Convincing the reader that the problem is real, is more important than showing that the proposed method can work. What are the y axis units in Fig. 6? Percent? dB? Multiple works have proposed using watermarks to claim ownership of the models [41,42,52,53,54]. The authors have changed the problem definition, but the techniques are similar if not the same. This significantly decreases the novelty of the paper. What are the exponents in (b1) and (b2) of the weights circled by the green ovals in Fig 14? How does this work compare with “HiDDeN: Hiding Data With Deep Networks”, ECCV18? Minor Edits Review #136B =========================================================================== Overall Recommendation ---------------------- 2. Leaning towards reject Writing Quality --------------- 3. Adequate Reviewer Confidence ------------------- 3. Sufficient confidence Paper Summary ------------- The authors propose to use steganographic techniques to protect the integrity of deep neural networks when used in a Machine Learning as a Service setting. To this end, they change parameters of the DNN layers in a way that they embed a hash value, which can be checked during use. Strengths --------- + Steganographic method specifically tailored towards DNN models Weaknesses ---------- - Unclear application domain - Unclear robustness against attacks that aim at removing the hidden information Detailed Comments for Authors ----------------------------- While I value the construction of various steganographic schemes, in this particular case I miss a compelling use case of the technology. It seems to me that the main purpose is to assure that the DNN has not been tampered with (e.g. by re-training it with adversarial samples). However, the same functionality can be achieved by simply signing the DNN model -- this would also assure that the model is authentic and has not been modified. I do not see the need to embed the authentication information itself directly within the model. Unless the authors can convincingly argue why such an embedding is necessary (and which additional feature they provide in contrast to the "naive" approach mentioned above) I am not inclined to accept the paper. Minor details: The paper is not clear on the issue which data is needed to verify the embedded signature. Are parts of the original DNN needed? If yes, which ones? I am not sure which added value the use of a scrambling brings in terms of security over an encryption (see Equation 6). Why is the scrambling necessary? What happens if it is omitted? The experiments do not include metrics on the hardness to remove the embedded data, as it is commonly done in the field of steganography. Review #136C =========================================================================== Overall Recommendation ---------------------- 2. Leaning towards reject Writing Quality --------------- 3. Adequate Reviewer Confidence ------------------- 2. Passable confidence Paper Summary ------------- This paper tries to protect the integrity of deep neural networks using a technique called wavelet-based steganography. Strengths --------- Like other software applications, machine learning models also face the problem of verifying the integrity. The problem is important. Weaknesses ---------- - The example scenario of using the proposed method (fig. 3) has flaw. For instance, the MLaaS provider can deploy a manipulated network D’ and provide malicious predictions in step 3. However, when the customer tries to verify the network, the MLaas provider can send the original network D to the validator in step 5. Thus, the network will still be verified by the validator. - It is not clear how the outer layer information is embedded into the weights of layers. The output of a network is related to its input. Thus, when computing the hash of the outer layer, each input will produce a different hash value. How are these different (infinite many) hash values embedded? - The reported accuracy for resnet18 on Imagenet is actually much higher (not ‘below’ as claimed in the paper) than the original paper (83.71 compared to 72.12). Is it wrong? - Writing needs improvement. For example, the described training setting is hard to understand. ‘We optimized all models using Stochastic Gradient Descent (SGD), an initial learning rate of 1e- 4, 10 epochs, weights learn rate factor of 10, bias learn rate factor of 10 and batch size of 100’. Detailed Comments for Authors ----------------------------- See the above weaknesses.