The evolution of machine learning (ML) offers broader possibilities of use. However, wide applications also increase the risks of large attack surface on ML’s security and privacy. . ML models likely use private and sometimes sensitive data, for example, specific information about people (names, photos, addresses, preferences, etc.). In addition, the architecture of the network can be stolen. In response to these risks, several methods of anonymizing data and securing the different stages of the machine learning process have been and are still being developed. On the other hand, these solutions are only rarely applied.
In a professional context, the different steps (training/inference) and the data necessary for the operation of the model can be held by various stakeholders, such as customers and companies. In addition, they can occur or be stored in different places (model provider server, the data owner, the cloud, etc.). The risk of attack can be present in any of these entities. One promising method for obtaining reliable ML to ensure privacy is confidential computing. Given the importance and the challenges relating to the security and confidentiality of machine learning models, a research team from England proposed a systemization of knowledge (SoK) paper. In this paper, the authors introduced the problem and offered future solutions to achieve ML with Confidential Computing for the hardware, the system, and the framework.
The authors affirm that the Confidential Computing technology ensures a level of assurance of privacy and integrity when employing Trusted Execution Environments (TEE) to run codes on data. TEE is one of the newest methods for isolating and verifying code execution inside protected memory, also known as enclaves or secure world, and away from the host’s privileged system stacks like the operating system or hypervisor. It is based on the challenging keys: the root of trust measurement, the remote trust establishment and attestation, and the trustworthy code execution and compartmentalization. Owners of data/models must covertly supply their data/models to the TEE of the untrusted host in Confidential Computing-assisted ML. To be more precise, the owners prepare the model and/or data, do remote attestation to ensure the integrity of the remote TEE, and then create secure communication channels with the TEE. The primary feature offered by confidential computing is the separation of enclaves/TEEs from the untrusted environment with hardware assistance.
In this SoK article, several recommendations have been presented. The authors believe the privacy concept is still unclear compared to security or integrity. To have a well-founded privacy assurance, one has to establish the theoretically based protection aim, for instance, with differential privacy information. They insist that the upstream portion of the ML pipeline, such as data preparation, must be protected at all costs because its absence has unavoidable detrimental effects. By incorporating TEE-based verification into data signature, it may be accomplished. The whole ML pipeline protection may also benefit from several TEEs/Conclaves. It is necessary to carefully research the privacy and integrity weaknesses of various ML components (layers, feature maps, numeral calculations) before designing the ML framework to be TEE-aware and partitionable for heterogeneous TEEs. Additionally, managing the TEE system to effectively protect the most sensitive ML components with a high priority is necessary.
In this paper, we have seen an exciting and challenging new era related to protecting ML against privacy leaks and integrity breaches using confidential computing techniques. Although running the training and inference processes has been the subject of numerous studies. They continue to struggle with the lack of trust resources inside TEEs. The existing protection measures only guarantee the confidentiality and integrity of the training/inference stage in the full ML pipeline because ML requires significantly more reliable resources. Confidential computing establishes a more reliable execution environment for ML operations by achieving a hardware-based root-of-trust. The idea that hiding the training/inference process inside such enclaves is the best course of action must be reconsidered. Future researchers and developers must comprehend the privacy challenges that underlie the ML pipeline better so that future security measures can concentrate on the essential components.
Mahmoud is a PhD researcher in machine learning. He also holds a
bachelor's degree in physical science and a master's degree in
telecommunications and networking systems. His current areas of
research concern computer vision, stock market prediction and deep
learning. He produced several scientific articles about person re-
identification and the study of the robustness and stability of deep
networks.
Marktechpost is a California based AI News Platform providing easy-to-consume, byte size updates in machine learning, deep learning, and data science research
© 2021 Marktechpost LLC. All Rights Reserved. Made with ❤️ in California