6.3 PROTECTING YOURSELF AND YOUR DATA BIG DATA Examples of structured data are transactions and financial records. As for semi-structured data, there are web server logs and streaming data sensors. For unstructured data: texts, documents and multimedia files. A data warehouse is a type of data management system to perform queries on big data. Data mining tools are a set of techniques which use special algorithms, statistical analysis, artificial intelligence and database systems to analyse data from different dimensions and perspectives. Big data is a combination of structured, semi-structured and unstructured data collected by organisations that can be used to extract relevant information for machine learning projects and for other advanced applications. Features of big data According to an early definition in 2001, big data is characterised by 3 Vs: the large volume of data in many environments; the wide variety of data types frequently stored in big data systems; the velocity at which much of the data is generated, collected and processed. More recently, other Vs have been added: veracity, which refers to the level of accuracy of data; value, because data can have real business value; variability, as data can have different meanings and be formatted in different ways. to cleanse: pulire raw: grezzo repository: archivio Although the term big data does not correspond to any specific volume of data, it usually involves terabytes and even exabytes of data created and collected over time. Big data storage Big data is often stored in a data lake, i.e a large repository which can store large amounts of unprocessed data of different kinds in its original form. Many big data environments combine multiple systems in a distributed architecture. For example, a central data lake might be integrated with other platforms, including relational databases or a data warehouse . The data can be left in its raw form and then filtered and organised as needed for particular uses or preand processed using data mining tools data preparation software. Big data processing Big data processing requires a large amount of computing power, often provided by hundreds or even thousands of server computers working together and using special technologies. Since this process is both cost-effective and challenging, clouds are popular locations for big data systems. Big data analytics Big data analytics is the name given to the process of gathering and analysing large volumes of data in order to extract information. To get relevant results from big data analytics applications, data scientists and data analysts must have a detailed understanding of the available data and a precise idea of what they are looking for. For this reason, data preparation is necessary. Data preparation includes: profiling, cleansing, validation, and transformation of data sets. Then, different tools can be used to analyse data such as data mining, statistical analysis, etc. 284 protecting computers