Scaling up is our greatest challenge, says Eiffage CIO Jean-Philippe Faure

L'Usine Digitale: You announced a partnership with Google Cloud. Can you go back to the reasons that led you to choose this supplier over another?

Jean-Philippe Faure: The choice is linked to all the work that has been done at the DSI. We did a lot of POCs throughout 2023. We tested a lot of solutions and in the end, we had to make a decision and be able to commit. We chose Google for two main reasons. On the one hand, it is an open world compared to Microsoft which is a very closed world in particular, and then it is a world which is based on security. Technologically, there are many solutions, the big players in the market have very strong technologies, but the open world and security are two elements that have made the difference for us.

Did you still consider the possibility that it was Microsoft?

We considered all market possibilities. We are never “pro-something”. We really do the pros and cons analysis. Obviously, this is not the decision of a single person but of a team who worked on it. There was more than a year of work to achieve these results, we had to know if we were capable of interfacing with the technological “worlds”, because each world has its complexity. Finally, we realized that with the Google world, we had a lot in common and it happened quite naturally.

We met all the actors and talked with them. I met many CIOs who came before me and who made these choices. Afterwards, we were also looking for a simple solution: there is a kind of compactness inherent to the group Eiffage and we also wanted it to be represented through our solution. Today we combine Google Cloud and Dataiku.

Where are you with Google Cloud today? What solutions do you use?

We are in production and have passed the creation phase of our “Data Cloud Foundation”. We have interconnected our systems and can use the entire Google technology platform today. We use everything related to GCP, which is all the different modules that exist, including the Gemini, Vertex AI, BigQuery and Apigee models. We are creating our Data Lake. It should be remembered that historically we have had a single ERP policy, which means that we have a Finance ERP, an HR ERP, a Material ERP, etc.

From all these repositories, we inject all this data into our Data Lake. We held a launch meeting with operational staff and our businesses since our approach is to do this for the businesses. We serve the professions and believe that if we offer the best technology and the necessary training, we believe we will find very interesting use cases for them.

Our number one challenge is scaling up. Because very often we have an “IA day” or a “Data day” with 600 to 700 projects coming out and it is very difficult afterwards, in governance, to arbitrate them, to prioritize them. Here, we have collectively decided to go for 15 priority projects which we will implement, because they are often very common to all entities.

You have developed an AI in-house based on Gemini. Can you go into detail on this?

We are at the very beginning. The goal is to find a private generative AI solution on which we can put all our data which will be secure. We will be able to inject all our technical briefs – documents resulting from calls for tenders to which we responded – into this AI. Every day we receive tenders, it takes some time to sort them and do it properly, and we believe that AI could help us.

We also injected data from our HR ERP called People. We are in the testing phase. Ultimately, everyone who asks a question about this ERP will have the opportunity to ask a question in natural language, and will have an answer thanks to this tool. The answer is based on such documentation, such arbitration that was made, etc. There is traceability of all decisions, and the response that relates to the employee will come from an element of Eiffage which will be structured.

So this is a fine-tuned and secure version of Gemini?

Yes, that's it. What we are looking to do is move forward vertically. We will start with HR, then finance, and then we will go to the business areas which also have a lot of needs and where there is a lot of technical documentation. To date, we are on functional streams that we control relatively well. Once the POC phase has passed and we get answers without hallucinations, we can move on to other areas.

The goal is to do them one after the other. We want to take our time, we are not here to make a revolution but an important development which brings comfort and added value to the user, to the profession. The objective is really to create value.

I come back to Gemini, which is a multimodal model. How do you use it?

We plan to bring all types of data to the model we have developed. It will be structured data, unstructured data, video, image. We start with documents that are structured and simple. We are learning at the same time and we are training our technical teams on these subjects even if Google also helps us, we are trying to move very quickly towards autonomy. We have training videos and are looking to put them in there, because we learn a lot more through video than we do through reading literature today, so it will obviously be multimodal.

To give you an estimate of what this data represents, we have around 10 TB of data for our historical ERPs, namely finance and HR. This is why we build a lot of models around these two areas, given the mass and quality of data that we have.

Setting up a data lake necessarily involves focusing on security. How did you do it?

We have put in place very strong governance so that precisely within our data lake, the data is only accessible after validation. We have created a governance group with the IT department and the business lines and everyone will, within this data lake, have specific access to data according to their scope. We also make everyone sign a charter because ethics are fundamental.

There is no question that Eiffage will one day be involved in stories about data that would be misused. All users and consumers of data must sign this ethics charter and comply with it. They will therefore only have access to a limited part of the data validated by the branch and by the IT department.

You are talking about ethics. With the arrival of ChatGPT a little over a year ago now, did you deal with “ShadowGPT” at the beginning?

Anyone who tells you otherwise will have their nose lengthen. For us it is very interesting because we have never banned it. When you ban something, everyone obviously wants to use it and obviously does so. What we explained is that there is the right to use AI, but do not put any information relating to Eiffage.

Finally, have you encountered any problems acculturating to data and AI?

No, it's quite the opposite, we chase. Today, everyone has gone to ChatGPT and there is a lot of anticipation about being able to build tools based on generative AI. This is why today we are going to open up, gradually, by stream, the capabilities of secure private generative AI.

There will of course be a counterpart: these are the costs. As soon as we run these models, we will request more computing power and pay for it. We have therefore put a principle in place: it is the branches which are responsible for usage costs. The IT department takes care of paying for the platform, security, takes care of training, sets up the supports, tools, etc. but consumption – the use made of data – is the responsibility of the branches. However, now that we are working on private generative AI, which does not fit into this model but is rather in a common model, we need to figure out how to manage these costs.

You mentioned use cases with Finance and HR ERPs, could you develop a concrete case?

There is a case in production which covers everything we wanted to do: it is called Metronia. It is a solution for monitoring concrete. When you make a work of art, you need to drill a tunnel. This involves vibrations and we constantly put sensors on the site – up to 1000 sometimes – and we receive information from these sensors – around 500,000 messages per day – which allows us to know if there is, around the site, a structure which will be more or less weakened. This is what more or less happened on the A13, for example. The goal is to measure the impact of this type of work and to avoid arriving at a case that no one wants to know about. We developed this subject and put it into production on one of the Toulouse metro lines, over several kilometers.

We have put a hybrid architecture in place with local data collection using sensors, and storage and then transformation in the cloud. For this project, a data scientist was hired and uses the entire solution we developed on a daily basis. We are there on assistance, of course, but the data scientist really has control over this project and we can secure the works.

Here, we are halfway between data and AI: there is the collection of data and the AI transcribes the vibrations that are perceived in terms of risk. Thanks to this data, the tunnel boring machine, which drives the machine that digs a tunnel, is able to decide whether to move forward or not, slow down the drilling speed, etc. These tools make it possible to protect the structure and protect everything surrounding the structure.

Selected for you