The use of AI and data protection in innovation projects at local level: the case of EPIU intelligence unit
Data collection and AI in EPIU
EPIU data has been gathered to feed a data lake (a system/repository of data) and the EPIU platform. There are different types of data that are stored in the data lake. On the one side, all the socioeconomic data of users comes from a combination of local census data, data from Getafe municipality, surveys, or the Healthy households’ office (OHS). On the other side, energy data is collected through the OHS, surveys, energy monitoring data, data collected from energy sensors on households (IoT) and energy data from third parties such as Datadis and simulations. EPIU has dedicated many efforts to generate data in the built environment at household and building level.
All this data feeds the AI unit to provide operational support to OHS, but also supports the identification of hidden energy poverty and generates patterns to characterize energy poverty which is useful to design tailor-made solutions.
AI units can be effective. However, they need enough volume of data to learn and provide effective solutions. Besides, data protection must be ensured so it can affect the development of such tools.
The limits of an energy intelligence unit: volume of data and data protection
AI tools, especially machine learning models, rely on data to learn patterns, correlations, and intricacies within a given domain. The more diverse and extensive the dataset, the better equipped the AI is to understand and generalize from it. A large volume of data helps in building robust and resilient tools.
However, it is not easy to obtain data. In the case of EPIU, real data access was limited for different reasons. Specific data coming from surveys, energy monitoring or OHS beneficiaries were limited in number due to budget restrictions. Existing data coming from other datasets from the municipality or energy companies were of restricted access for other reasons such as connectivity between them or data protection.
It is inevitable then that AI models, such as the one developed in EPIU, trained on limited or biased datasets can inadvertently perpetuate and amplify existing biases. It is a real limitation nowadays, however this should not stop the research and innovation in this area. Instead this should be taken into consideration to not depend too much on AI tools results to develop measures to tackle energy poverty.
The other limitation while developing an AI tool such as the EPIU intelligent unit are data protection requirements. Data protection is a critical aspect of todays’ information management, especially when it comes to handling datasets. The compilations of structured or unstructured data fuel the algorithms that power artificial intelligence, machine learning, and other applications. However, with this power comes the responsibility of safeguarding the privacy and integrity of the data contained within these datasets. EPIU project targets people suffering a situation of vulnerability and deals with energy data which makes it very sensitive in terms of data protection. Datasets often contain sensitive information, ranging from personal socioeconomic data or energy consumption information, so robust anonymization and encryption techniques are needed. This makes the process even more complex and when many stakeholders are involved more time and effort is needed. In the case of EPIU, Universidad Carlos III in Madrid led the intelligence unit at local level while data was mainly retrieved from Getafe municipality and EMSV (public housing company), energy companies and vulnerable consumers.
As datasets continue to grow in volume and significance, safeguarding them against privacy breaches, ethical lapses, and security threats becomes a collective responsibility.
Lessons learnt from EPIU intelligence unit and data protection.
During the EPIU implementation period, the basis for the intelligence unit was set and it will be enriched in the coming years to consolidate the use of AI to support the identification of energy poverty in the municipality. Some learnings from the experience are:
-
The need to count on a responsible officer inside projects such as EPIU who can be in charge of personal data and data protection. This will ensure the guarantee of an effective protection, or the governance of the data.
-
The importance of tracing the origin and quality of data. Within EPIU, 90% of available data was considered as “soft data” which means subjective data, difficult to measure, contextualize or enrich.
-
The obligation to ensure data protection in the testing of the digital environment. New threats appear every day such as IT attacks so all testing environments should be well protected. This is very relevant in the context of public innovations related to data too.
-
The added value of training public professionals on data at local level. Following the premise of “it is not known, it does not exist”, local authorities may prioritize capacity building at data level to ensure a correct approach to data gathering and use this to improve the quality of living in municipalities.
Data protection issues should be incorporated at the design stage as an inherent part of AI tools (Privacy by design/Privacy by default). Data processing procedures are best adhered to when they are already integrated in the technology.
More on EPIU
EPIU’s website https://hogaressaludables.getafe.es/en/
Or social media channels:
You Tube: https://www.youtube.com/channel/UCaE9esoYPZng6jW3bXk7yng?
Instagram https://www.instagram.com/epiugetafe/
Twitter https://twitter.com/epiugetafe