The Yes We Trust community is a data privacy hub to stay updated on industry news, gain expert insights, and connect with other privacy-minded professionals. Community member interview is a series in which we gain valuable insights from our members. Authors contribute to these articles in their personal capacity. The views expressed are their own and do not necessarily represent the views of Yes We Trust. Got something to share? Get in touch at community@yes-we-trust.com. |
Clara Ripault is a French lawyer specializing in AI & data protection and based in Paris.
She started her career in London as an AI Legal Engineer, working with Cognitiv +, an AI start-up specializing in contract analysis and management. In this role, she worked with data scientists and developers to design AI algorithms capable of processing legal documents, including contracts and policies. For example, they established guidelines that algorithms and coding must follow to guarantee the production of accurate results by the AI system. Additionally, she took on the responsibility of recruiting a team of annotators, training them, and reviewing the precision of their outcomes.
Then she worked at a law firm in Paris as an AI and data protection lawyer. She advised a wide range of clients on the implementation of AI systems. After that, she joined an AI start-up specializing in voice analysis using NLP (natural language processing). She took on the responsibility of building a legal department. There, she learned how important it is to come up with pragmatic and business-oriented legal solutions.
And recently, Clara founded Acmai. Acmai is a law firm specializing in AI and data protection. Acmai adopts a pragmatic and operational approach to its clients' issues, aiming to deliver legal solutions that are practical and beneficial to the business.
We interviewed Clara to learn from her experience in the data privacy industry.
For me privacy is essential. It plays a crucial role in maintaining a fair and healthy society. I truly believe that democracy needs privacy to exist. Privacy is needed to fight against mass surveillance. If we’re under constant surveillance, it will refrain our curiosity, our knowledge acquisition, our media consumption, and eventually our freedom.
Businesses have to handle new kinds of cyberattacks that only apply to AI models. For instance data poisoning. As a result, they have to assess whether they’re using an AI that is secure enough. They also have to think and design new security rules and processes.
Security of AI models is not only a matter of the Security or Legal team but requires involvement from the whole company. For instance, developers and data scientists also have to take into account those risks because their decisions can impact the likelihood of the risk. For instance, an AI model is more subject to being data poisoned if inputs are automatically re-used to train and improve the model without any filters to check them.
Another common local challenge is the reuse of personal data. Challenges regarding the reuse of personal data include obtaining or denying the authorization of reuse (depending on which side we are on and the context of the processing) and ensuring the security and confidentiality of reused personal data.
A common use case is where an AI company develops its own models and provides them to its clients. Clients use AI models with their own data. To improve its models, the company wants to use its Clients’ data. To do so, the company needs to seek its Client’s authorization, organize data subjects’ information, and make sure it acts in a compliant way. Challenges around the reuse of personal data are even more important where businesses use GenAI. Due to the way GenAI works, there’s a risk of disclosure of reused personal data.
GDPR has a strong impact on the development and deployment of AI systems using personal data. There are a lot of examples but here are two of them :
Minimization: According to that principle, data shall be adequate, relevant, and limited. As a result, it’s not possible to start an AI project by saying we’re gonna build a training data set with every data category we’re able to collect and then we’ll see which data category is relevant. At the collection stage, the company must apply minimization by choosing carefully which types of data will be used and in which quantity. Minimization applies from the beginning of the AI project and must be applied to the creation of the training data set. To do so, businesses can for instance use synthetic data sets to test the model and determine data types that are relevant to train it. Then, the company will be able to document and justify that some data categories are necessary for training the model. Tests and documentation are strong tools to comply with that principle. They must also be used once the training phase and then the production phase has been launched to monitor compliance with minimization. Minimization must not be seen as a hurdle. It can be seen as a plus to justify the use of an important amount of data. For instance, during the training phase, it’s possible to use minimization to advocate an increase in the data quantity used where there’s not enough data in the training data set to obtain accurate results.
Exercise of rights: Exercise of rights can also impact the development and deployment of AI systems dealing with personal data. For instance, developers have to design and build the system in a certain way to ensure exercise of rights, such as the right of access, is effective. If they don’t anticipate that, it will be complex to extract personal data from the training database to comply with the right of access for instance.
I think that one of the biggest current challenges in terms of personal data is to reconcile the protection of personal data with innovation and economic interests.
Currently, the highlighted innovations are mainly generative AI. However, these AIs require a significant amount of data, including personal data, to be trained. The data present in the training sets is not always collected legally. It often happens that they are collected in violation of regulations on personal data, for example, without the individuals concerned being informed.
However, despite these violations, there are currently few strong responses fining companies developing these tools. It seems to me that one of the reasons for this is the desire not to hinder innovation, especially as it carries significant economic and geopolitical stakes. This is an argument that makes sense. However, it is essential not to lose sight of the fact that the protection of personal data is crucial to a free and democratic society.
For now, it’s too early to know how it will go. There are several claims around GenAI, and we’re waiting for answers.
Yes, I know the community through its webinars and because I used Didomi in one of my previous jobs.
I think one of the significant challenges will be to find a balance between the need for companies to collect data to train their algorithms and the rights of individuals to refuse and oppose it.
As companies require personal data for the training of their AIs, there is a significant trend among data-collecting companies to anticipate the reuse of data for training or improving their algorithms. Simultaneously, individuals are becoming more aware of this issue and are less inclined to accept such reuse. They are increasingly aware that in the context of GenAI, this reuse could, for example, lead to their personal data being disclosed to third parties.
Another significant challenge, somewhat related to the previous one, is to create trustworthy AIs. To achieve this, AIs should be developed ethically, considering principles such as loyalty, non-discrimination, and transparency. This ensures that developed AIs have a positive impact on society and are accepted and adopted by users. Paradoxically, the GDPR sometimes contradicts this. For example, Article 9 prohibits the processing of specific categories of data, even though it is technically necessary to process this data to avoid developing biased algorithms. Indeed, some processing of specific data categories helps debiais algorithms and ensure they are not discriminatory. This contradiction is about to be addressed by Article 10 of the AI Act, which will authorize, under certain conditions, the processing of this data for debiasing purposes.
When a company engages a processor or a third-party software using AI, special attention should be paid to the clauses regarding data reuse by the processor. It can happen that the contract stipulates that the subcontractor has the right to use the company's data to train its own algorithms. In this case, the data is, in a sense, transferred to the processors for them to use it for the improvement of their algorithms. This is not trivial and requires specific measures to comply with personal data protection regulations. For example, it is necessary to inform the individuals concerned and conduct a compatibility test before allowing the subcontractor to reuse the data.
Lawyers can assist companies in implementing and using artificial intelligence by alerting them to risks, proposing operational solutions to limit these risks, or training employees. For instance, when a company consults me to assess whether it can allow its employees to use a GenAI, I always advise them to:
Prefer paid versions, which are often more GDPR-compliant.
Implement an internal policy for the use of AI.
Establish concrete measures to help and encourage employees to adhere to the policy, such as a pop-up window summarizing the main rules of the policy that opens with each new use of the tool by an employee.
Provide training and raise awareness among employees.
Avoid prohibiting the use of such tools because otherwise, employees may use them clandestinely, posing an even greater risk to the company.