Assignment – 15
Design and implement a Python script to detect Deep Fake videos utilizing the
"Deepfake Detection Challenge" dataset available on Kaggle.
1. Define the objective of the "Deepfake Detection Challenge" dataset.      																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																	     answer:Deepfake Detection Datasets
Most online face forgery tools (such as DeepFaceLive [10] and Roop [11]) are open source and do not require sophisticated technical skills, so using open-source software such as Basic DeepFake maker [12] is the main method for creating deepfake datasets. Due to multiple forgery methods, deepfake data are increasing at a very high rate of approximately 300% every year [2], but the data published online have different forgery qualities. This section introduces several representative datasets and illustrates their advantages and disadvantages.
2.1. FaceForensics++
FaceForensics++ [13] is a pioneering large-scale dataset in the field of face manipulation detection. The main facial manipulations are representative, which include DeepFakes, Face2Face, FaceSwap, FaceShifter, and Neural Textures methods, and data are of random compression levels and sizes [14]. This database originates from YouTube videos with 1000 real videos and 4000 fake videos, the content of which contains 60% female videos and 40% male videos. In addition, there are three resolutions of videos: 480p (VGA), 720p (HD), and 1080p (FHD). As a pioneering dataset, it has different quality levels of data and equalized gender distributions. The deepfake algorithms include face alignment and Gauss–Newton optimization. However, this dataset suffers from low visual quality with high compression and visible boundaries of the fake mask. The main limitation of this dataset is the lack of advanced color-blending processing, resulting in some source facial colors being easily distinguishable from target facial colors. In addition, some target samples cannot effectively fit on the source faces because there exists facial landmark mismatch, which is shown in Figure 2.
Electronics 13 00585 g002Figure 2. Several FaceForensics++ samples. The manipulated methods are DeepFakes (Row 1), Face2Face (Row 2), FaceSwap (Row 3), and Neural Textures (Row 4). DeepFakes and FaceSwap methods usually create low-quality manipulated facial sequences with color, landmark, and boundary mismatch. Face2Face and Neural Textures methods can output slightly better-quality manipulated sequences but with different resolutions.
2.2. DFDC
From 2020 to 2023, Facebook, Microsoft, Amazon, and research institutions put efforts into this field and jointly launched a Deep Fake Detection Challenge (DFDC) [8] on Kaggle to solve the problem of deepfakes presenting realistic AI-generated videos of people performing illegal activities, with a strong impact on how people determine the legitimacy of online information. The DFDC dataset is currently the largest public facial forgery dataset, which contains 119,197 video clips of 10 s duration filmed by real actors. The manipulation data (See Figure 3) are generated by deepfake, GAN-based, and non-learned techniques with resolutions ranging from 320 × 240 to 3840 × 2160 and frame rates from 15 fps to 30 fps. Compared with FaceForensics++, this database has a large-enough sample amount, different poses, and a rich diversity of human races. In addition, the original videos are from 66 paid actors instead of YouTube videos, and fake videos are generated with similar attributes to original real videos. However, the main drawback is that the quality level of data is different due to several deepfake generative abilities. Therefore, some samples have the problem of boundary mismatch, and source faces and target faces have different resolutions.
Electronics 13 00585 g003Figure 3. DFDC samples. We manually utilized InsightFace facial detection model to extract human faces from the DFDC. Although some of the samples are without color blending and with obvious facial boundaries, the average quality is a little higher than the first-generation deepfake datasets.
2.3. Celeb-DF V2
Celeb-DF V2 is derived from 590 original YouTube celebrity videos and 5639 manipulated videos generated through FaceSwap [15] and DFaker as mainstream techniques. It consists of multiple age, race, and sex distributions with many visual improvements, making fake videos almost indistinguishable to the human eye [16]. The dataset exhibits a large variation of face sizes, orientations, and backgrounds. In addition, some post-processing work is added by increasing the high resolution of facial regions, applying color transfer algorithms and inaccurate face masks. However, the main limitation of this dataset is the low data amount with less sample diversity because all original samples are downloaded from YouTube celebrity videos, and there is small ethnic diversity, especially for Asian faces.Here, we present a few samples of Celeb-DF V2 (see Figure 4).
Electronics 13 00585 g004Figure 4. Celeb-DF V2 crop manipulated facial frames. Except for transgender and transracial fake samples (Row 3), it is hard to distinguish real and fake images with the human eye.
There are other higher-quality deepfake datasets created by extensive application of the GAN-based method; for example, DFFD [17], which was published in 2020, created an entire synthesis of faces by StyleGAN [18]. Comparing datasets published after 2020 with previous datasets, it can be observed the data amount is much larger with multiple forgery methods such as GAN and forgery tools. In addition, the original data sources are not limited to online videos such as YouTube and also consist of videos shot by real actors. Thus, we predict the trend of future DeepFakes datasets to be larger scale with various forgery methods, multiple shooting scenarios, and different human races. We summarize the advantages and disadvantages of several commonly used datasets in Table 1.
Table 1. The typical and commonly used datasets of facial forgery detection.
																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																	 2. Describe the characteristics of Deep Fake videos and the challenges
associated with their detection.																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																	 answer:
Download PDF
Article
Open access
Published: 08 May 2023
Deep fake detection and classification using error-level analysis and deep learning
Rimsha Rafique, Rahma Gantassi, Rashid Amin, Jaroslav Frnda, Aida Mustapha & Asma Hassan Alshehri 
Scientific Reports volume 13, Article number: 7422 (2023) Cite this article

19k Accesses

13 Citations

6 Altmetric

Metricsdetails

Abstract
Due to the wide availability of easy-to-access content on social media, along with the advanced tools and inexpensive computing infrastructure, has made it very easy for people to produce deep fakes that can cause to spread disinformation and hoaxes. This rapid advancement can cause panic and chaos as anyone can easily create propaganda using these technologies. Hence, a robust system to differentiate between real and fake content has become crucial in this age of social media. This paper proposes an automated method to classify deep fake images by employing Deep Learning and Machine Learning based methodologies. Traditional Machine Learning (ML) based systems employing handcrafted feature extraction fail to capture more complex patterns that are poorly understood or easily represented using simple features. These systems cannot generalize well to unseen data. Moreover, these systems are sensitive to noise or variations in the data, which can reduce their performance. Hence, these problems can limit their usefulness in real-world applications where the data constantly evolves. The proposed framework initially performs an Error Level Analysis of the image to determine if the image has been modified. This image is then supplied to Convolutional Neural Networks for deep feature extraction. The resultant feature vectors are then classified via Support Vector Machines and K-Nearest Neighbors by performing hyper-parameter optimization. The proposed method achieved the highest accuracy of 89.5% via Residual Network and K-Nearest Neighbor. The results prove the efficiency and robustness of the proposed technique; hence, it can be used to detect deep fake images and reduce the potential threat of slander and propaganda.

Similar content being viewed by others

Accurate structure prediction of biomolecular interactions with AlphaFold 3
Article 08 May 2024

Forecasting vaping health risks through neural network model prediction of flavour pyrolysis reactions
Article Open access
08 May 2024

Segment anything in medical images
Article Open access
22 January 2024
Introduction
In the last decade, social media content such as photographs and movies has grown exponentially online due to inexpensive devices such as smartphones, cameras, and computers. The rise in social media applications has enabled people to quickly share this content across the platforms, drastically increasing online content, and providing easy access. At the same time, we have seen enormous progress in complex yet efficient machine learning (ML) and Deep Learning (DL) algorithms that can be deployed for manipulating audiovisual content to disseminate misinformation and damage the reputation of people online. We now live in such times where spreading disinformation can be easily used to sway peoples’ opinions and can be used in election manipulation or defamation of any individual. Deep fake creation has evolved dramatically in recent years, and it might be used to spread disinformation worldwide, posing a serious threat soon. Deep fakes are synthesized audio and video content generated via AI algorithms. Using videos as evidence in legal disputes and criminal court cases is standard practice. The authenticity and integrity of any video submitted as evidence must be established. Especially when deep fake generation becomes more complex, this is anticipated to become a difficult task.

The following categories of deep fake videos exist: face-swap, synthesis, and manipulation of facial features. In face-swap deep fakes, a person's face is swapped with that of the source person to create a fake video to target a person for the activities they have not committed1, which can tarnish the reputation of the person2. In another type of deep fake called lip-synching, the target person’s lips are manipulated to alter the movements according to a certain audio track. The purpose of lip-syncing is to simulate the victim's attacker's voice by having someone talk in that voice. With puppet-master, deep fakes are produced by imitating the target's facial expressions, eye movements, and head movements. Using fictitious profiles, this is done to propagate false information on social media. Last but not least, deep audio fakes or voice cloning is used to manipulate an individual's voice that associates something with the speaker they haven’t said in actual1,3.

The importance of discovering the truth in the digital realm has therefore increased. Dealing with deep fakes is significantly more difficult because they are mostly utilized for harmful objectives and virtually anyone can now produce deep fakes utilizing the tools already available. Many different strategies have been put out so far to find deep fakes. Since most are also based on deep learning, a conflict between bad and good deep learning applications has developed4. Hence, to solve this problem, the United States Defense Advanced Research Projects Agency (DARPA) launched a media forensics research plan to develop fake digital media detection methods5. Moreover, in collaboration with Microsoft, Facebook also announced an AI-based deep fake detection challenge to prevent deep fakes from being used to deceive viewers6.

Over the past few years, several researchers have explored Machine Learning and Deep Learning (DL) areas to detect deep fakes from audiovisual media. The ML-based algorithms use labor-intensive and erroneous manual feature extraction before the classification phase. As a result, the performance of these systems is unstable when dealing with bigger databases. However, DL algorithms automatically carry out these tasks, which have proven tremendously helpful in various applications, including deep fake detection. Convolutional neural network (CNN), one of the most prominent DL models, is frequently used due to its state-of-the-art performance that automatically extracts low-level and high-level features from the database. Hence, these methods have drawn the researcher’s interest in scientists across the globe7.

Despite substantial research on the subject of deep fakes detection, there is always potential for improvement in terms of efficiency and efficacy. It may be noted that the deep fake generation techniques are improving quickly, thus resulting in increasingly challenging datasets on which previous techniques may not perform effectively. The motivation behind developing automated DL based deep fake detection systems is to mitigate the potential harm caused by deep fake technology. Deep fake content can deceive and manipulate people, leading to serious consequences, such as political unrest, financial fraud, and reputational damage. The development such systems can have significant positive impacts on various industries and fields. These systems also improve the trust and reliability of media and online content. As deep fake technology becomes more sophisticated and accessible, it is important to have reliable tools to distinguish between real and fake content. Hence, developing a robust system to detect deep fakes from media has become very necessary in this age of social media. This paper is a continuation of to study provided by Rimsha et al.8. The paper compares the performance of CNN architectures such as AlexNet and VGG16 to detect if the image is real of has been digitally altered. The main contributions of this study are as follows:

In this study, we propose a novel deep fake detection and classification method employing DL and ML-based methods.

The proposed framework preprocesses the image by resizing it according to CNN’s input layer and then performing Error Level Analysis to find any digital manipulation on a pixel level.

The resultant ELA image is supplied to Convolutional Neural Networks, i.e., GoogLeNet, ResNet18 and SqueezeNet, for deep feature extraction.

Extensive experiments are conducted to find the optimal hyper-parameter setting by hyper-parameter tuning.

The performance of the proposed technique is evaluated on the publically available dataset for deep fake detection

Related work
The first ever deep fake was developed in 1860, when a portrait of southern leader John Calhoun was expertly altered for propaganda by swapping his head out for the US President. These manipulations are typically done by splicing, painting, and copy-moving the items inside or between two photos. The appropriate post-processing processes are then used to enhance the visual appeal, scale, and perspective coherence. These steps include scaling, rotating, and color modification9,10. A range of automated procedures for digital manipulation with improved semantic consistency are now available in addition to these conventional methods of manipulation due to developments in computer graphics and ML/DL techniques. Modifications in digital media have become relatively affordable due to widely available software for developing such content. The manipulation is in digital media is increasing at a very fast pace which requires development of such algorithms to robustly detect and analyze such content to find the difference between right and wrong11,12,13.

Despite being a relatively new technology, deep fake has been the topic of investigation. In recent years, there had been a considerable increase in deep fake articles towards the end of 2020. Due to the advent of ML and DL-based techniques, many researchers have developed automated algorithms to detect deep fakes from audiovisual content. These techniques have helped in finding out the real and fake content easily. Deep learning is well renowned for its ability to represent complicated and high-dimensional data11,14. Matern et al.15 employed detected deep fakes from Face Forensics dataset using Multilayered perceptron (MLP) with an AUC of 0.85. However, the study considers facial images with open eyes only. Agarwal et al.16 extracted features using Open Face 2 toolkit and performed classification via SVM. The system obtained 93% AUC; however, the system provides incorrect results when a person is not facing camera. The authors in Ciftci et al.17 extracted medical signal features and performed classification via CNN with 97% accuracy. However, the system is computationally complex due to a very large feature vector. In their study, Yang et al.18 extracted 68-D facial landmarks using DLib and classified these features via SVM. The system obtained 89% ROC. However, the system is not robust to blurred and requires a preprocessing stage. Rossle et al.19 employed SVM + CNN for feature classification and a Co-Occurrence matrix for feature extraction. The system attained 90.29% accuracy on Face Forensics dataset. However, the system provides poor results on compressed videos. McCloskey et al.20 developed a deep fake detector by using the dissimilarity of colors between real camera and synthesized and real image samples. The SVM classifier was trained on color based features from the input samples. However, the system may struggle on non-preprocessed and blurry images.

A Hybrid Multitask Learning Framework with a Fire Hawk Optimizer for Arabic Fake News Detection aims to address the issue of identifying fake news in the Arabic language. The study proposes a hybrid approach that leverages the power of multiple tasks to detect fake news more accurately and efficiently. The framework uses a combination of three tasks, namely sentence classification, stance detection, and relevance prediction, to determine the authenticity of the news article. The study also suggests the use of the Fire Hawk Optimizer algorithm, a nature-inspired optimization algorithm, to fine-tune the parameters of the framework. This helps to improve the accuracy of the model and achieve better performance. The Fire Hawk Optimizer is an efficient and robust algorithm that is inspired by the hunting behavior of hawks. It uses a global and local search strategy to search for the optimal solution21. The authors in22 propose a Convolution Vision Transformer (CVT) architecture that differs from CNN in that it relies on a combination of attention mechanisms and convolution operations, making it more effective in recognizing patterns within images.The CVT architecture consists of multi-head self-attention and multi-layer perceptron (MLP) layers. The self-attention layer learns to focus on critical regions of the input image without the need for convolution operations, while the MLP layer helps to extract features from these regions. The extracted features are then forwarded to the output layer to make the final classification decision. However, the system is computationally expensive due to deep architecture. Guarnera et al.23 identified deep fake images using Expectation Maximization for extracting features and SVM, KNN, LDA as classification methods. However, the system fails in recognizing compressed images. Nguyen et al.24 proposed a CNN based architecture to detect deep fake content and obtained 83.7% accuracy on Face Forensics dataset. However, the system is unable to generalize well on unseen cases. Khalil et al.25 employed Local Binary Patterns (LBP) for feature extraction and CNN and Capsule Network for deep fake detection. The models were trained on Deep Fake Detection Challenge-Preview dataset and tested on DFDC-Preview and Celeb- DF datasets. A deep fake approach developed by Afchar et al.26 employed MesoInception-4 and achieved 81.3% True Positive Rate via Face Forensics dataset.

However, the system requires preprocessing before feature extraction and classification. Hence, results in a low overall performance on low-quality videos. Wang et al.27 evaluated the performance of Residual Networks on deep fake classification. The authors employed ResNet and ResNeXt, on videos from Face forensics dataset. In another study by Stehouwer et al.28, the authors presented a CNN based approach for deep fake content detection that achieved 99% overall accuracy on Diverse Fake Face Dataset. However, the system is computationally expensive due to a very large size feature vector. Despite significant progress, existing DL algorithms are computationally expensive to train and require high-end GPUs or specialized hardware. This can make it difficult for researchers and organizations with limited resources to develop and deploy deep learning models. Moreover, some of the existing DL algorithms are prone to overfitting, which occurs when the model becomes too complex and learns to memorize the training data rather than learning generalizable patterns. This can result in poor performance on new, unseen data. The limitations in the current methodologies prove there is still a need to develop a robust and efficient deep fake detection and classification method using ML and DL based approaches.

Proposed methodology
This section discusses the proposed workflow employed for deep fakes detection. The workflow diagram of our proposed framework is illustrated in Fig. 1. The proposed system comprises of three core steps (i) image preprocessing by resizing the image according to CNN’s input layer and then generating Error Level Analysis of the image to determine pixel level alterations (ii) deep feature extraction via CNN architectures (iii) classification via SVM and KNN by performing hyper-parameter optimization.


Workflow diagram of the proposed method.

Full size image
(i) Error level analysis
Error level analysis, also known as ELA, is a forensic technique used to identify image segments with varying compression levels. By measuring these compression levels, the method determines if an image has undergone digital editing. This technique works best on .JPG images as in that case, the entire image pixels should have roughly the same compression levels and may vary in case of tampering29,30.

JPEG (Joint Photographic Experts Group) is a technique for the lossy compression of digital images. A data compression algorithm discards (loses) some of the data to compress it. The compression level could be used as an acceptable compromise between image size and image quality. Typically, the JPEG compression ratio is 10:1. The JPEG technique uses 8 × 8 pixel image grids independently compressed. Any matrices larger than 8 × 8 are more difficult to manipulate theoretically or are not supported by the hardware, whereas any matrices smaller than 8 × 8 lack sufficient information.

Consequently, the compressed images are of poor quality. All 8 × 8 grids for unaltered images should have a same error level, allowing for the resave of the image. Given that uniformly distributed faults are throughout the image, each square should deteriorate roughly at the same pace. The altered grid in a modified image should have a higher error potential than the rest31.

ELA. The image is resaved with 95% error rate, and the difference between the two images is computed. This technique determines if there is any change in cells by checking whether the pixels are at their local minima8,32. This helps determine whether there is any digital tampering in the database. The ELA is computed on our database, as shown in Fig. 2.


Result of ELA on dataset images.

Full size image
(ii) Feature extraction using convolutional neural networks
The discovery of CNN has raised its popularity among academics and motivated them to work through difficult problems that they had previously given up on. Researchers have designed several CNN designs in recent years to deal with multiple challenges in various research fields, including deep fake detection. The general architecture of CNN as shown in Fig. 3, is usually made up of many layers stacked on top of one another. The architecture of CNN consists of a feature extraction module composed of convolutional layers to learn the features and pooling layers reduce image dimensionality. Secondly, it consists of a module comprising a fully connected (FC) layer to classify an image33,34  
General CNN architecture.


The image is input using the input layer passed down to convolution for deep feature extraction. This layer learns the visual features from the image by preserving the relationship between its pixels. This mathematical calculation is performed on an image matrix using filter/kernel of the specified size35. The max-pooling layer reduces the image dimensions. This process helps increase the training speed and reduce the computational load for the next stages36. Some networks might include normalization layers, i.e., batch normalization or dropout layer. Batch normalization layer stabilizes the network training performance by performing standardization operations on the input to mini-batches. Whereas, the dropout layer randomly drops some nodes to reduce the network complexity, increasing the network performance37,38. The last layers of the CNN include an FC layer with a softmax probability function. FC layer stores all the features extracted from the previous phases. These features are then supplied to classifiers for image classification38. Since CNN architectures can extract significant features without any human involvement, hence, we used pre-trained CNNs such as GoogLeNet39, ResNet1831, and SqueezeNet40 in this study. It may be noted that developing and training a deep learning architecture from scratch is not only a time-consuming task but requires resources for computation; hence we use pre-trained CNN architectures as deep feature extractors in our proposed framework.

Microsoft introduced Residual Network (ResNet) architecture in 2015 that consists of several Convolution Layers of kernel size 3 × 3, an FC layer followed by an additional softmax layer for classification. Because they use shortcut connections that skip one or more levels, residual networks are efficient and low in computational cost41. Instead of anticipating that every layer stack will instantly match a specified underlying mapping, the layers fit a residual mapping. As a result of the resulting outputs being added to those of the stacked layers, these fast connections reduce loss of value during training. This functionality also aids in training the algorithm considerably faster than conventional CNNs.

Furthermore, this mapping has no parameters because it transfers the output to the next layer. The ResNet architecture outperformed other CNNs by achieving the lowest top 5% error rate in a classification job, which is 3.57%31,42. The architecture of ResNet50 is shown in Fig. 443.



ResNet18 architecture44.

Full size image
SqueezNet was developed by researchers at UC Berkeley and Stanford University that is a very lightweight and small architecture. The smaller CNN architectures are useful as they require less communication across servers in distributed training. Moreover, these CNNs also train faster and require less memory, hence are not computationally expensive compared to conventional deep CNNs. By modifying the architecture, the researchers claim that SqueezeNet can achieve AlexNet level accuracy via a smaller CNN45. Because an 1 × 1 filter contains 9× fewer parameters than a 3 × 3 filter, the 3 × 3 filters in these modifications have been replaced with 1 × 1 filters. Furthermore, the number of input channels is reduced to 3 × 3 filters via squeeze layers, which lowers the overall number of parameters.

Last but not least, the downsampling is carried out very late in the network so the convolution layers’ large activation maps which is said to increase classification accuracy40. Developed by Google researchers, GoogLeNet is a 22-layer deep convolutional neural network that uses a 1 × 1 convolution filter size, global average pooling and an input size of 224 × 224 × 3. The architecture of GoogLeNet is shown in Fig. 5. To increase the depth of the network architecture, the convolution filter size is reduced to 1 × 1. Additionally, the network uses global average pooling towards the end of the architecture, which inputs a 7 × 7 feature map and averages it to an 1 × 1 feature map. This helps reduce trainable parameters and enhances the system's performance. A dropout regularization of 0.7 is also used in the architecture, and the features are stored in an FC layer39.



GoogLeNet architecture46.

Full size image
CNNs extract features from images hierarchically using convolutional, pooling, and fully connected layers. The features extracted by CNNs can be broadly classified into two categories: low-level features and high-level features. Low-level features include edges, corners, and intensity variations. CNNs can detect edges by convolving the input image with a filter that highlights the edges in the image. They can also detect corners by convolving the input image with a filter that highlights the corners. Morever, CNNs can extract color features by convolving the input image with filters that highlight specific colors. On the other hand, high-level features include texture, objects, and contextual and hierarchical features. Textures from images are detected by convolving the input image with filters that highlight different textures. The CNNs detect objects by convolving the input image with filters highlighting different shapes. Whereas, contextual features are extracted by considering the relationships between different objects in the image. Finally, the CNNs can learn to extract hierarchical features by stacking multiple convolutional layers on top of each other. The lower layers extract low-level features, while the higher layers extract high-level features.

(iii) Classification via support vector machines and k-nearest neighbors
We classified the deep CNN features via SVM and KNN classifiers in this phase. KNN has gained much popularity in the research community in classification and regression tasks since it outperforms many other existing classifiers due to its simplicity and robustness. KNN calculates the distance between a test sample (k) with its neighbours and then groups the k test sample to its nearest neighbour. The KNN classifier is shown in Fig. 6





The second classifier used in this study is SVM, a widely popular classifier used frequently in many research fields because of its faster speeds and superior prediction outcomes even on a minimal dataset. The classifier finds the plane with the largest margin that separates the two classes. The wider the margin better is the classification performance of the classifier30,47. Figure 7A depicts potential hyperplanes for a particular classification problem, whereas Fig. 7B depicts the best hyperplane determined by SVM for that problem.

3. Outline the key steps involved in the implementation of a Deep Fake video
detection algorithm using Python.																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																 answer:MRI-GAN: A Generalized Approach to Detect DeepFakes using Perceptual Image Assessment
This readme file gives basic overview of the scrope of this project, sample results, and steps needed to replicate the work, either from scratch or using pre-trained models. Reproducing the results from-scratch is very involved process and includes training of all the models. In either case data processing needs to be done. Details are described below.

The full research paper is available at: https://arxiv.org/abs/2203.00108

TLDR.

Abstract
DeepFakes are synthetic videos generated by swapping a face of an original image with the face of somebody else. In this paper, we describe our work to develop general, deep learning-based models to classify DeepFake content. We propose a novel framework for using Generative Adversarial Network (GAN)-based models, we call MRI-GAN, that utilizes perceptual differences in images to detect synthesized videos. We test our MRI-GAN approach, and a plain-frames-based model using the DeepFake Detection Challenge Dataset. Our plain frames-based-model achieves 91% test accuracy, and a model which uses our MRI-GAN framework with Structural Similarity Index Measurement (SSIM) for the perceptual differences achieves 74% test accuracy. The results of MRI-GAN are preliminary and maybe improved further by modifying the choice of loss function, tuning hyper-parameters, or by using a more advanced perceptual similarity metric.

MRI-GAN
MRI-GAN generates MRI of the input image. The MRI is DeepFake image contains artifacts which highlights regions of synthesisezed pixels. The MRI of non-DeepFake image is just black image.



MRI-GAN adversarial training




MRI-GAN training data formulation


MRI-GAN sample output on validation set


MRI-GAN training progress
    

Steps to replicate the overall work
Note: This is very involved process.

Set development environment. We have used conda for our python distribution and related libraries on Ubuntu 20.04 OS. Create a new environment using below command and activate it. We have provided our environment.yml in the codebase.

conda env create -f environment.yml

Download datasets and extract.

DFDC dataset from https://ai.facebook.com/datasets/dfdc/
Celeb-DF-v2 dataset from https://github.com/yuezunli/celeb-deepfakeforensics
FFHQ dataset from https://github.com/NVlabs/ffhq-dataset
FDF dataset from https://github.com/hukkelas/FDF
Configure the paths and other params.

Note: Configuration of these paths may have been optimized by setting relative paths, but due huge size of dataset and limitation of available storage space, we have set absolute paths for each entity to have flexibility to choose where to save individual outcomes. Downside of this choice is that, we have to set all paths individually which can be tedious.

config.yml is the key configuration to control the whole flow. Update paths of the dataset paths as needed. You would need to update all the paths which starts from /home/directory, other filenames does not need be changed.
DFDC dataset configuration
update ['data_path']['dfdc']['train'] : path of the training set
update ['data_path']['dfdc']['valid'] : path of the validation set
update ['data_path']['dfdc']['test'] : path of the test set
Update all key-value pairs under ['features']['dfdc']['landmarks_paths'] to point to where you want to save generated landmarks for DFDC
Update all key-value pairs under ['features']['dfdc']['crop_faces'] to point to where you want to save extracted images of faces for DFDC
update ['features']['dfdc']['mri_path'] : path where all MRIs will be saved. These MRIs are used for MRI-GAN training
update ['features']['dfdc']['train_mrip2p_faces'] : After MRI-GAN is trained, it is used to predict MRIs of DFDC. All predicted MRIs are saved here. Same for valid_mrip2p_faces and test_mrip2p_faces: update the paths.
Celeb-DF-v2 dataset configuration.
['data_path']['celeb_df_v2']['real'] : path of real samples (Celeb-real)
['data_path']['celeb_df_v2']['fake'] : path of fake samples (Celeb-synthesis)
['features']['celeb_df_v2']['landmarks_path']['train'] : path where landmarks will be saved
['features']['celeb_df_v2']['crop_faces']['train'] : path where extracted faces will be saved
FDF dataset configuration.
['data_path']['fdf']['data_path'] : path of samples (cc-by-nc-sa-2/128)
['data_path']['fdf']['landmarks_path']['train'] : path where landmarks will be saved
['features']['fdf']['json_filename'] : path to a json file where landmarks will be saved
['features']['fdf']['crops_path'] : path where extracted faces will be saved
FFHQ dataset configuration.
['data_path']['ffhq']['data_path'] : path of samples (images1024x1024)
['features']['ffhq']['json_filename'] : path to a json file where landmarks will be saved
['features']['ffhq']['crops_path'] : path where extracted faces will be saved
Data pre-processing. Enter following commands in sequence

python data_preprocess.py --gen_aug_plan (select random video files in the DFDC training set and make a plan to apply various random combinations of augmentation and distractions. This command generates the plan and saves in a .pkl file.)
python data_preprocess.py --apply_aug_to_all (Execute the plan generated in step #1. This command reads the .pkl file generated in step #1 and executes the plan one-by-one for each video file selected in DFDC training set)
python data_preprocess.py --extract_landmarks (Use pre-trained MTCNN to extract landmarks of each face detected in the video frames. Every 10th frame is used by default in each video. Landmarks are extracted for each video in train, validation and test set. All landmarks are saved in separate .json files for each video)
python data_preprocess.py --crop_faces (Save faces from landmarks json files for each video)
python data_preprocess.py --gen_mri_dataset (Generate MRI-DF dataset. This generates the images of perceptual dissimilarity for DFDC train set -(50% of DFDC train set as mentioned in the paper))
MRI-GAN training

Configure config.yml. Parameters under ['MRI_GAN']['model_params'] section can be tweaked. 'tau' is adjusted for different results. 'batch_size' can be changed depending upon GPU memory available for your machine.
python train_MRI_GAN.py --train_from_scratch (Train the MRI-GAN model. Check help for option on --train_resume to resume training if it was stopped earlier. Logs will be generated and saved under logs/<date_time_stamp> directory, model weights will also be saved in the same directory)
cp logs/<date_time_stamp>/MRI_GAN/checkpoint_best_G.chkpt assets/weights/MRI_GAN_weights.chkpt (Copy trained MRI-GAN weights)
python data_preprocess.py --gen_dfdc_mri (Use trained MRI-GAN to predict MRIs for DFDC dataset)
Train and test the DeepFake Detection model

python data_preprocess.py --gen_deepfake_metadata (Generate metadata csv files used by DataLoaders of PyTorch classes)
Using plain-frames method
Configure config.yml. Parameters under ['deep_fake']['model_params'] section can be tweaked. For plain-frames method set following params. 'train_transform' : 'complex' 'dataset' : 'plain' 'batch_size' can be changed depending upon GPU memory available for your machine.
python deep_fake_detect.py --train_from_scratch (Start training from scratch. Also check --train_resume command line option if you want to resume previously started training. After all epochs are done, testing of the model will start)
python deep_fake_detect.py --test_saved_model <path> (Test the model which was saved on disk. e.g. if the training was killed before all epochs were completed, this option can be used to test the model which was saved during training process)
Using MRI-based method
Configure config.yml. Parameters under ['deep_fake']['model_params'] section can be tweaked. For plain-frames method set following params. 'train_transform' : 'simple' 'dataset' : 'mri' 'batch_size' can be changed depending upon GPU memory available for your machine.
python deep_fake_detect.py --train_from_scratch (Start training from scratch. Also check --train_resume command line option if you want to resume previously started training. After all epochs are done, testing of the model will start)
python deep_fake_detect.py --test_saved_model <path> (Test the model which was saved on disk. e.g. if the training was killed before all epochs were completed, this option can be used to test the model which was saved during training process)
Other Notes
check --help of all scripts mentioned above to see more utility methods, e.g. to resume training of models if the trained was stopped in between.
Pre-trained models
Download all pre-trained model weights to reproduce the results.

MRI-GAN. Model with tau = 0.3 and Generator with the lowest loss: https://drive.google.com/uc?id=1qEfI96SYOWCumzPdQlcZJZvtAW_OXUcH
DeepFake detection models
Plain-frames based: https://drive.google.com/uc?id=1_Pxv6ptxqXKtDJNkodkDmMTD_KRo08za
MRI based: https://drive.google.com/uc?id=1xKzehNuq1B1th-_-U6OG9v2Q2Odws6VG
DeepFake Detection App
Use the model to test a given video file.

Download all pre-trained model weights.
Run the command-line App python detect_deepfake_app.py --input_videofile <path to video file> --method <detection method>. Detection method can be plain_frames or MRI
How to cite our research!
Pratikkumar Prajapati and Chris Pollett, MRI-GAN: A Generalized Approach to Detect DeepFakes using Perceptual Image Assessment. arXiv preprint arXiv:2203.00108 (2022)
or

@misc{2203.00108,
Author = {Pratikkumar Prajapati and Chris Pollett},
Title = {MRI-GAN: A Generalized Approach to Detect DeepFakes using Perceptual Image Assessment},
Year = {2022},
Eprint = {arXiv:2203.00108},
}																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																								                          4. Discuss the importance of dataset preprocessing in training a Deep Fake
detection model and suggest potential preprocessing techniques.																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																		 answer:5.1 Data preprocessing
Data preprocessing in the SHM systems deals with extraction of the effective data hidden by the noise and other factors to provide the basis for further data analysis (García, Ramírez-Gallego, Luengo, Benítez, & Herrera, 2016; Han, Kamber, & Booksx, 2015). The main function of data preprocessing is to extract the data sources related to the monitoring target based on the mining requirements, check the legality of the data, and to generate the next waiting core data for analysis (Chen, 2012). Data preprocessing consists of three parts: noise filtering, data classification, and data evaluation (Farrar, Doebling, & Nix, 2001; Garcia, Luengo, & Herrera, 2016; López, Fernández, García, Palade, & Herrera, 2013).

Inherently Safer Design
Arnab Chakrabarty, ... Tahir Cagin, in Multiscale Modeling for Process Safety Applications, 2016

8.4.2.2 Data preprocessing
Data preprocessing is an important step to prepare the data to form a QSPR model. There are many important steps in data preprocessing, such as data cleaning, data transformation, and feature selection (Nantasenamat et al., 2009). Data cleaning and transformation are methods used to remove outliers and standardize the data so that they take a form that can be easily used to create a model. This section focuses mostly on the applications of QSPR models and will emphasize the feature selection portion of data preprocessing. When trying to create a QSPR model, a data set may contain hundreds of variables (descriptors); however, many of these variables will contain redundant data. In order to simplify the dimensionality of the model, it is important to select only variables that contain unique and important information. Data mining procedures can be used to remove variables that do not contribute to the model. Another step in feature selection is feature fusion, the process by which multiple variables are combined using mathematical operations. This process is performed to reduce the number of variables in the model while still maintaining the level of information.

Robust geomechanical characterization by analyzing the performance of shallow-learning regression methods using unsupervised clustering methods
Siddharth Misra, ... Jiabo He, in Machine Learning for Subsurface Characterization, 2020

2.2 Data preprocessing
Data preprocessing aims to facilitate the training/testing process by appropriately transforming and scaling the entire dataset. Preprocessing is necessary before training the machine learning models. Preprocessing removes outliers and scales the features to an equivalent range. We use min-max scaling that ensures fast convergence of the gradient-based learning process, especially for neural network models. Min-max scaling is performed on one feature at a time using the following equation:

(5.1)
𝑦
′
𝑖
=
2
𝑦
𝑖
−
𝑦
min
𝑦
max
−
𝑦
min
−
1
where yi is the original value of a log response (y) and y′i is the scaled value of the log response (y) at a depth i. ymin and ymax are the minimum and maximum values of the log response (y), respectively. Min-max scaling is performed only on the 13 “easy-to-acquire” logs, which are considered as features for the shallow-learning task of synthesizing DTS and DTC logs. We do not scale the DTS and DTC logs, which are the targets for the machine learning task. As mentioned in previous chapters, a machine learning workflow first learns from the training dataset, then is evaluated on the testing dataset, and finally deployed on the new dataset. Any data preprocessing step should adopt the following sequence of steps: (1) perform data preprocessing on the training dataset; (2) learn the statistical parameters required for the data preprocessing of the training dataset; and (3) perform data preprocessing on the testing dataset and new dataset by applying the statistical parameters learnt from the preprocessing of the training dataset. In our case, minimum and maximum of each feature (log) is first learnt during the scaling of training dataset, and then those minimum and maximum values are used for scaling the corresponding features in the testing dataset and the new dataset.

Machine-learning models for predicting survivability in COVID-19 patients
Ijegwa David Acheme, Olufunke Rebecca Vincent, in Data Science for COVID-19, 2021

2.1.2 Data preprocessing
Data preprocessing is an iterative process for the transformation of the raw data into understandable and useable forms. Raw datasets are usually characterized by incompleteness, inconsistencies, lacking in behavior, and trends while containing errors [37]. The preprocessing is essential to handle the missing values and address inconsistencies. In this work, the data gathering was carried out to avoid out-of-range values, impossible data combinations such as (Sex: Male, Pregnant: Yes) were handled, missing values and redundancies were also treated during the data preprocessing stage resulting in a more reliable and relevant dataset fit for knowledge discovery.

Transforming data into suitable formats for a particular machine-learning problem is an essential consideration at the beginning of the project. The presence of irrelevant, redundant information, noisy, and unreliable data significantly affects the model outcomes and knowledge discovery, making the training phase more difficult. The data preparation and filtering steps take the most amounts of time spent on an ML project but worth it. The steps involved include cleaning, instance selection, normalization, transformation, feature extraction, and selection. The product of data preprocessing is the training set.

Change detection techniques for a remote sensing application: An overview
Rohini Selvaraj, Sureshkumar Nagarajan, in Cognitive Systems and Signal Processing in Image Processing, 2022

3 Data preprocessing
Data preprocessing is known to be the most critical stage of image processing to achieve greater accuracy. Satellite data is influenced by multiple factors, including geographical, spectral, temporal, and atmospheric conditions. Image preprocessing eliminates this effect by some correction processes such as geometric correction and radiometric correction. Geometric [17] is an inevitable process in change detection, geometric correction involves image registration and rectification. Geometric disasters are carried out because of the changes in sensor, sensor positioning, Earth axis, and terrain effect. Radiometric correction [12, 18] resolves the error produced by the variant in atmospheric circumstance, solar angle, sensor characteristic, and view angle. Radiometric correction [19] is carried out in different ways such as absolute radiometric correction (ARC) and relative radiometric correction (RRC).

Diagnosing of disease using machine learning
Pushpa Singh, ... Akansha Singh, in Machine Learning and the Internet of Medical Things in Healthcare, 2021

5.4.1 Data preprocessing
Data preprocessing is essential before its actual use. Data preprocessing is the concept of changing the raw data into a clean data set. The dataset is preprocessed in order to check missing values, noisy data, and other inconsistencies before executing it to the algorithm. Data must be in a format appropriate for ML. For example, if the algorithm processes only numeric data then if a class is labeled with “malignant” or “benign” then it must be replaced by “0” or “1.” Data transformation and feature extraction are used to expand the performance of classifiers and hence a classification algorithm will be able to create a meaningful diagnosis. Only relevant features are selected and extracted for the particular disease. For example, a cancer patient may have diabetes, so it is essential to separate related features of cancer from diabetes. An unsupervised learning algorithm such as PCA is a familiar algorithm for feature extraction. Supervised learning is appropriate for classification and predictive modeling.

Machine learning in genomics: identification and modeling of anticancer peptides
Girish Kumar Adari, ... P. Vijaya, in Data Science for Genomics, 2023

2.5.1 Data preprocessing
Data preprocessing is a basic and primary step for converting raw data into useful information. In general raw data could be incomplete, redundant, or noisy. By data preprocessing, all these mentioned issues can be resolved and used for generating machine learning models (Fig. 3.2) [24].


Sign in to download full-size image
Figure 3.2. Steps of data preprocessing.

The data sets that were used for this chapter are standard sets without any inconsistencies, noise, or imperfections. By using CD-HIT program, any peptide with 90% similarity is treated as redundant and is removed from the data sets. CD-HIT is a popular program (can find more about it in Further Reading section [25]), created by Dr.Weizhong Li at the Burnham Institute, for clustering and comparing protein or nucleotide sequences [26]. Both ACPs and non-ACPs of both data sets are subjected to CD-HIT program with a similarity threshold of 90%, and on combining the ACPs and non-ACPs, nonredundant, standard data sets are generated (Fig. 3.3).


Sign in to download full-size image
Figure 3.3. Peptides preprocessing.

For each of the following features: AAC, dipeptide composition (DPC), CTC, and distance distribution of residues (DDR) feature properties and their peptide compositions are calculated. Each of the features is assigned to X and class label to Y. Labels “Positive” and “Negative” are encoded as “1” and “0,” respectively. Feature selection is done by setting variance threshold 0.1.

Read more
Volume 3
Andreas Holzinger, in Encyclopedia of Biomedical Engineering, 2019

Data Preprocessing, Data Integration, and Data Fusion
Data preprocessing is a required first step before any machine learning machinery can be applied, because the algorithms learn from the data and the learning outcome for problem solving heavily depends on the proper data needed to solve a particular problem – which are called features. These features are key for learning and understanding, and therefore, machine learning is often considered as feature engineering. Data preprocessing, however, inflicts a heavy danger; for example, during the preprocessing, data can be inadvertently modified; for example, “interesting” data may be removed. Consequently, for discovery purposes, it would be wise to have a look at the original raw data first and maybe do a comparison between nonprocessed and preprocessed data.

Data integration is a hot topic generally and in health informatics specifically, and solutions can bridge the gap between clinical and biomedical research (bench vs. bed). This is becoming even more important due to the increasing amounts of heterogeneous, complex patient-related data sets, resulting from various sources including picture archiving and communication systems (PACS) and radiological information systems (RIS), hospital information systems (HIS), laboratory information systems (LIS), physiological and clinical data repositories, and all sorts of *omics data from laboratories, using samples from biobanks. The latter include large collections of DNA sequence data, proteomic and metabolic data, resulting from sophisticated high-throughput analytic technologies. Along with classical patient records containing large amounts of unstructured information (N.B., avoid the term unstructured data) and semistructured information, integration efforts incorporate enormous problems but at the same time offers new possibilities for translational research.

While data integration is on combining data from different sources and providing users with a unified view on these data (e.g., combining research results from different bioinformatics repositories), data fusion is matching various data sets that represent one and the same object into a single, consistent representation. In health informatics, these unified views are particularly important in high dimensions, for example, for integrating heterogeneous descriptions of the same set of genes. The main expectation is that fused data are more informative than the original inputs. Capturing all information describing a biological system is the implicit objective of all *omics methods; however, genomics, transcriptomics, proteomics, metabolomics, etc. need to be combined to approach this goal: valuable information can be obtained using various analytic techniques such as nuclear magnetic resonance, liquid chromatography, or gas chromatography coupled to mass spectrometry. Each method has inherent advantages and disadvantages but is complementary in terms of biological information, consequently combining multiple data sets, provided by different analytic platforms of utmost importance for discovery of new knowledge. For each platform, the relevant information is extracted in the first step. The obtained latent variables are then fused and further analyzed. The influence of the original variables is then calculated back and interpreted. There is plenty of open future research to include all possible sources of information.

Deep learning applications for disease diagnosis
Deepak Kumar Sharma, ... Suchitra Vavilala, in Deep Learning for Medical Applications with Unique Data, 2022

7.1.3 Preprocessing
Data preprocessing is a critical step to build accurate ML models. The preprocessed work includes noise reduction, data normalization, feature selection, and extraction [14]. To train a model, we initially split the model into three sections: data training, validation, and testing. The training set enables the model to learn to fit the data parameters of the classifier. The validation set is used to prevent overfitting, and the test set is used to evaluate the performance of the trained model. Evaluation is an integral part of the development process. It helps to determine whether the model will do a good job of predicting the target on new and future data [15].

For skin lesions, we have a variety of preprocessing techniques, including the contrast limited adaptive histogram equalization (CLAHE) technique, black frame removal, automatic color equalization, hair removal, Karhunen–Loeve transform, Gaussian filter, pseudorandom filter, nonskin masking, and color space transformation 																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																														                                 5. Propose and justify the choice of at least two machine learning or deep
learning algorithms suitable for Deep Fake video detection.																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																						 answer:The growing popularity of social networks such as Facebook, Twitter, and YouTube, along with the availability of high-advanced camera cell phones, has made the generation, sharing, and editing of videos and images more accessible than before. Recently, many hyper-realistic fake images and videos created by the deepfake technique and distributed on these social networks have raised public privacy concerns. Deepfake is a deep-learning-based technique that can replace face photos of a source person by a target person in a video to create a video of the target saying or doing things said or done by the source person. Deepfake technology causes harm because it can be abused to create fake videos of leaders, defame celebrities, create chaos and confusion in financial markets by generating false news, and deceive people.

Manipulating faces in photos or videos is a critical issue that poses a threat to world security. Faces play an important role in humans interactions and biometrics-based human authentication and identification services. Thus, plausible manipulations in face frames can destroy trust in security applications and digital communications [1]. As a result, analyzing and detecting faces from photos or videos constitute a central role in detecting fakes. Several research papers have been presented in this area; facial landmark detection-based methods [2,3], Viola–Jones face detector [4], dlib detector [5], BlazeFace [6], RetinaFace [7], and multi-task convolution neural network (MTCNN) [8], to name just a few.

The first deepfake video launched in 2017 when a Reddit user transposed celebrity faces into porn videos, and consequently, several deepfake video detection methods have been presented. Some of these methods detect the temporal inconsistencies across videos’ face frames using recurrence networks, while other methods detect visual artifacts inside frames using convolution networks [9].

This paper introduces a new efficient architecture, YOLO-InceptionResNetV2-XGBoost (YIX), which discovers the visual discrepancies and artifacts within video frames and then judges whether a given video is real or a deepfake. The combination of these three methods is justified as follows: The YOLO detector proves its efficiency in object detection and face recognition systems over the state-of-the-art detectors [10,11] since it has a good trade-off between performance and speed [12,13]. Additionally, it is characterized by its ability to produce fewer false positives in the background [14], thus improving the detection method performance. In Dave et al. [15], the YOLO detector is used for detecting and counting various classes of vehicles, aiming to improve smart traffic management systems. A face detection method based on YOLO is employed for detecting the faces from the WiderFace dataset [13]. The performance achieved by this method surpasses the performance of other face detectors, and it is designed for real-time detection on mobile or embedded devices. As a result, YOLO is proposed to be used as a face detector that extracts the faces from video frames. Moreover, CNN assures its success in automatically learning the key features from images and videos. Therefore, a fine-tuned InceptionResNetV2 CNN is proposed here as a feature extractor method aiming to discover the inconsistencies in spatial information of manipulated facial video frames. Furthermore, the XGBoost model produces competitive results. It is a highly flexible and scalable machine learning model which avoids overfitting. Again, Dave et al. [15] uses the XGBoost method on the top of the YOLO vehicle detector addressing the traffic congestion problem by estimating the optimized time of the green light window. A deep-learning-based feature extraction method with the XGBoost model is employed to diagnose COVID-19 and pneumonia patients on chest X-ray images [16]. This method based on XGBoost achieves high performance compared to other machine learning methods. Traditionally, a densely connected layer with Softmax activation function is used on the top of CNN [17,18,19]. The approach adopted here is to use the XGBoost to distinguish a deepfake video from a real one. This aims to combine the advantages of both CNN and XGBoost models to improve deepfake video detection since a single model may not be powerful enough to meet the required accuracy for detecting deepfakes. Furthermore, different state-of-the-art face detection methods, CNN models, and machine learning algorithms will be explored. The newly proposed hybrid method, YIX, outperforms in all scenarios on the CelebDF-FaceForencics++ (c23) dataset. In summary, this paper introduces the following contributions:

A new model, namely InceptionResNetV2-XGBoost, is presented to learn the spatial information and then detect the authenticity of videos. This is because deepfake videos suffer from visual artifacts and discrepancies within frames. The proposed model provides more accurate output by combining the InceptionResNetV2 as a trainable extractor that automatically extracts the informative features from video frames and XGBoost as a classifier at the top of the network to detect the deepfakes. This distinctive two-phase model assures the high reliability of feature extraction and detection.
A YOLO face detector, an improved version of YOLO v3, is used for detecting the face regions from videos, helping to enhance the performance of detecting the deepfakes in videos.
A comparative study for different deep-learning and classification approaches applied in the context of detecting deepfakes is introduced, in terms of AUC, accuracy, specificity, sensitivity, recall, precision, and F-measure.
The rest of the paper is organized as follows: Section 2 introduces a review of deepfake video creation and detection methods and popular existing deepfake datasets. Section 3 proposes a new architecture for detecting deepfakes in video frames. Section 4 is dedicated to the experimental results and analysis. Section 5 presents the conclusion and future work.

6. Evaluate the performance metrics that can be used to assess the
effectiveness of a Deep Fake detection model.
answer:3.1.1Standard Datasets
The standard datasets consist of real and manipulation data from autoencoder-based UADFV, Celeb-DF, DF-1.0, GAN-based DF-TIMIT(higher quality), and mixed-manipulation-based FF++(Raw), DFDC and ForgeryNet. The data scales extracted from these 7 datasets are proportional to reflect their original scale difference. Specifically, taking the frame number of 2,527,384 frames in FaceForensics++ as a baseline, the extracted frame number from other datasets doubles or decreases. Each dataset in the standard datasets was split to training, validation and test to perform method re-implementing and evaluation. Specifically, the video-level split of each dataset complies with the default dataset setting if it is released, instead, we carry out a reasonable split for the datasets as illustrated in Table 1. The frame-level data adopted in experiments were randomly extracted from split videos and keep a frame-level split ratio of 14:1:1, following the split strategy of FaceForensics++. Moreover, we maintain the distribution of real and fake data of each dataset while balance them in experiments.

3.1.2Imperceptible and Diverse Test (ID Test) Set
To explore the robustness of forensic approaches when confronting the threats of fake videos with high visual authenticity and rich content diversity, we construct a high-quality Imperceptible and Diverse test (ID test) set by integrating the hard (high imperceptible) examples from our benchmarking 7 public datasets and our hosted private dataset. The hard examples from public datasets undergo a two-phase selection pipeline, namely detection model selection and user perception selection. The detection model selection retains falsely accepted fake examples with high confidence. Then the user perception selection carries out a blind experiment and preserves the high-quality fake videos considered real by 15 out of 30 participants, which means that these examples are indistinct to both detection models and human visions. The hard examples from our private dataset are our self-generated fake examples manipulated by recent introduced GAN-based FSGAN [27] and autoencoder-based MegaFS [37] approaches, of which the original data are images of CelebA [25] and raw videos of FaceForensics++. These manipulated hard examples also go through the selection pipeline to guarantee their visual reality and finally 40 videos manipulated by FSGAN and 2937 images manipulated by MegaFS are preserved. Overall, ID test set comprises 976 fake videos and 2348 real videos, from which 25,697 fake images and 25,697 real images are extracted, respectively.

AE	GAN	Graphic	Unknown
Video	522	202	22	230
Image	10,514	10,778	2,171	2,234
Table 2:Overview of manipulation type distribution of ID test set. The unknown manipulation type indicates that the related information of these data were unavailable from its source dataset.
To guarantee the diversity, fake data in ID test set achieves full coverage of manipulation types and at least 13 manipulation approaches. The specific manipulation type distribution and forgery approach distribution of ID test set are shown in Table 2 and Figure 1. We extract almost equal quantity of frames and sequential frames of each manipulation method, which enables fair evaluation of image-level and video-level detection methods trained by different datasets. Moreover, to simulate the restricted video quality caused by the video pre-processing pipeline, 5 types of common perturbations, as shown in Figure 2, are added to videos and images for extra evaluation.

Refer to caption
Figure 1:Illustration of frame data and sequential frame data distribution of each manipulation methods in ID test set.
Methods	Category	OS	Data Pre	Backbone/Method	Param(M)	GFLOPs	Infer T(ms)
HeadPose	Frame-Know	yes	no	SVM	-	-	159.70
FWA-Resnet50	Frame-Know	part	D	Resnet50	25.56	8.24	101.21
Face X-ray	Frame-Know	no	D	HRNet-W48-C	77.47	42.58	35.62
Xception	Frame-Data	part	D	XceptionNet	20.81	16.84	5.25
Meosonet-4	Frame-Data	part	D + A	4-layer Conv	0.28	0.12	5.11
MeosoInception-4	Frame-Data	part	D + A	2-Inception+2-Conv	0.28	0.11	7.31
Patch Resnet Layer1	Frame-Data	yes	D + A	Resnet18	0.15	2.10	0.73
Patch Xception Block2	Frame-Data	yes	D + A	XceptionNet	0.19	3.34	1.17
FFD	Frame-Know	yes	D	XceptionNet + Reg. Map	20.82	16.84	6.04
Multiple-attention	Frame-Know	part	D+A	EfficientNet-b4	18.83	6.80	25.48
Conv LSTM	Video	no	D	InceptionV3 & LSTM	30.36	229.48	221.64
Table 3:Overview of our evaluated forensic detection algorithms. OS represents their open source situation. Data Pre is data pre-processing procedure, in which D stands for face detection and A stands for face alignment. Infer T represents inference time.																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																												                         7. Consider the ethical implications of Deep Fake technology and discuss the
role of detection mechanisms in addressing these concerns.																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																						  answer:The “successful” moon mission was a hoax! The “truth” is, the Apollo 11 astronauts actually never returned from the moon. In an incredibly realistic video, the then president of the United States, Richard Nixon, delivered a televised speech to the nation in a gloomy voice: “Fate has ordained that the men who went to the moon to explore in peace will stay on the moon to rest in peace!” A sad day for humanity! Although the Apollo 11 mission was successful in reality, this “deepfake” video1 was created by the MIT Center for Advanced Virtuality to generate public awareness of the dangers of this emerging artificial intelligence (AI)-based technology. In the words of Francesca Panetta, the Project Co-Lead and XR Creative Director:
“We hope that our work will spark critical awareness among the public. We want them to be alert to what is possible with today’s technology (…) and to be ready to question what they see and hear as we enter a future fraught with challenges over the question of truth.”


Deepfakes are digitally manipulated synthetic media content (e.g., videos, images, sound clips) in which people are shown to do or say something that never existed or happened in the real world (Boush et al., 2015, Chesney and Citron, 2019, Westerlund, 2019). Advances in AI—particularly machine learning (ML) and deep neural networks (DNNs)—have contributed to the development of deepfakes (Chesney and Citron, 2019, Dwivedi et al., 2021, Kietzmann et al., 2020, Mirsky and Lee, 2021). These look highly credible and “true to life” to the extent that distinguishing them from authentic media can be very challenging for a human (see Fig. 1). Thus, they can be used for the purpose of widespread marketplace deception, with varied ramifications for both firms and consumers (Europol, 2022, Luca and Zervas, 2016). In fact, a recent study by scientists from University College London ranks fake audio or video content as the most worrisome use of AI in terms of its potential applications for crime or terrorism (Caldwell et al., 2020). But, simultaneously, this emerging technology has the potential to bring forth major business opportunities for content creation and engagement (Etienne, 2021, Farish, 2020, Kietzmann et al., 2020).

Deception in the marketplace is ubiquitous, which makes it a fundamental issue in consumer research and marketing (Boush et al., 2015, Darke and Ritchie, 2007, Ho et al., 2016). In general, deception refers to a deliberate attempt or act to present others with false or omitted information with the aim of creating a belief that the communicator considers false (Darke and Ritchie, 2007, Ludwig et al., 2016, Xiao and Benbasat, 2011). Thus, it is an intentional manipulation of information to create a false belief in others’ minds (i.e., deceiving parties), all of which can be further increased through deepfakes and hurt consumers and firms alike (Xiao & Benbasat, 2011). Deception permeates the marketplace, harms health, welfare, and financial resources, and undermines trust in organizations and the marketplace as a whole.

For example, a fake video of a CEO admitting the company has been charged with a large regulatory fine (or class-action lawsuit) could cause severe damage, with a crash in the stock value of the company being one of the first negative consequences. These types of attacks have already begun to occur. According to The Wall Street Journal (Stupp, 2019), in one high-profile case, cybercriminals used “deepfake phishing” to deceive the CEO of a UK energy company into transferring $243,000 into their account. Using AI-based voice spoofing software, the criminals successfully impersonated the head of the firm’s parent company, deceiving the CEO into believing he was speaking with his boss. The cybersecurity organization Symantec has stated that it encountered at least three examples of deepfake-based fraud in 2019, resulting in millions of dollars being lost (Zakrzewski, 2019). Moreover, consumers are susceptible to blackmail, intimidation, sabotage, harassment, defamation, revenge porn, identity theft, and bullying (Chesney and Citron, 2019, Cross, 2022, Europol, 2022, Fido et al., 2022, Karasavva and Noorbhai, 2021, Whittaker et al., 2020).

Yet at the same time, this emerging technology also carries positive potential through different forms of commercialization (Johnson and Diakopoulos, 2021, Maksutov et al., 2020). Deepfakes may even help change or innovate business models (Kietzmann et al., 2020). The opportunities pertaining to deepfakes are becoming even more relevant as consumers start spending more time in virtual worlds, which will foreseeably attract more attention and investment from firms across the board. For example, Facebook has changed its name to Meta and pursuing a virtual reality world called Metaverse, in which the company is purported to invest 10 billion dollars in the fiscal year of 2021 alone.2 This virtual world will largely be composed of deepfake objects. Thus, this latest technology will usher in new opportunities, as well as new dangers. This dualistic nature is why, in the present article, we investigate the risks and opportunities of deepfakes, which are virtually unexplored in the present business literature.

Another critical factor making deepfakes relevant is their dissemination via the internet and social media—both of which have become integral to people’s personal and professional lives, allowing consumers to access easy-to-use platforms for real-time discussions, ideological expression, information dissemination, and the sharing of emotions and sentiments (Perse & Lambe, 2016). Consequently, the scale, volume, and distribution speed of deepfakes, combined with the increasing pervasiveness of digital technologies in all areas of society, will have profound positive and negative implications in the marketplace (Kietzmann et al., 2020, Westerlund, 2019).

However, as deepfakes are an emergent technology and complex in nature (Chesney and Citron, 2019, Dwivedi et al., 2021, Kietzmann et al., 2020, Westerlund, 2019), the current understanding of their implications is scattered, sparse, and nascent (Botha and Pieterse, 2020, Chesney and Citron, 2019, Kietzmann et al., 2020). As extant literature only offers anecdotal and disparate indications related to the possibilities of deepfakes for firms and consumers (Chesney and Citron, 2019, Vimalkumar et al., 2021, Wagner and Blewer, 2019), there is a lack of coherent understanding of marketplace deceptions through deepfakes and the specific opportunities they present for both companies and consumers (Chesney and Citron, 2019, Kietzmann et al., 2020, Westerlund, 2019).

To date, marketplace deception has been primarily investigated from the consumer perspective, with a heavy emphasis on how it affects consumers (Taylor, 2021, Xie et al., 2020). The effects of deepfakes on businesses have received scant attention, despite the fact that researchers have noted firms are not immune to their effects (Chadderton and Croft, 2006, Xie et al., 2020). Moreover, deepfakes have a legitimate potential to create commercial opportunities, distinguishing them further from other forms of deception such as fake reviews or opinion spam that only produce adverse effects (Johnson and Diakopoulos, 2021, Kietzmann et al., 2020, Malbon, 2013). Consequently, both consumers and firms must develop their understanding and avoidance capabilities of deepfake deception, mitigate the harm deepfakes can create, and enjoy the opportunities they may offer (Boush et al., 2015, Taylor, 2021).

Against this background, the purpose of this study is to generate a holistic understanding of deepfakes vis-à-vis marketplace deception and the potential opportunities they offer. More specifically, we address the following research questions (RQs):
•
RQ 1: How might deepfakes contribute to marketplace deceptions?

•
RQ 2: How might firms and consumers avoid the malicious effects of deepfakes?

•
RQ 3: What opportunities do deepfakes offer to firms and consumers?


Through the application of an integrative literature review (ILR; Toronto and Remington, 2020, Torraco, 2016), we analyzed the previous research to create a comprehensive understanding in relation to our purpose. In addition to business academia, we reviewed literature from multiple research streams with footprints in deepfake research, including communications, computer science, information science, journalism, and social sciences, to synthesize existing knowledge. Through the current study, we establish a foundational understanding of deepfakes in terms of marketplace deception for firms and consumers (van Heerde et al., 2021). We also accumulate and present the protection mechanisms from their harmful effects, offering insights into the legitimate opportunities presented by this emerging technology.

Access through your organization
Check access to the full text by signing in through your organization.

Section snippets
Understanding marketplace deception
Marketplace deceptions are based on misperception, misprediction, non-perception, or non-prediction (Mechner, 2010, Taylor, 2021). Deception is a common feature of marketplace interactions between business entities, marketers, consumers, and any other party seeking to gain benefit in an illegal or unethical manner (Boush et al., 2015). Such deceptions may include misrepresentations through numerical information or research results, distraction and information overload, display of false emotions 

Methodology
The ILR approach that we have applied in this study is “a form of research that reviews, critiques, and synthesizes representative literature on a topic in an integrated way such that new frameworks and perspectives on the topic are generated” (Torraco, 2005, p. 356). It is considered a particular form of a systematic literature review (SLR; Toronto & Remington, 2020). However, the SLR approach tends to narrowly focus on a specific topic or type of study (Booth et al., 2016). In contrast, the

Findings
Based on our detailed analysis of the reviewed literature, we develop a conceptual framework to capture the deepfake phenomenon in the context of marketplace deception and the opportunities it offers (Fig. 4). The framework permits capturing an overview of the phenomena, and simultaneously facilitates an organized presentation of the findings. We conceptualize that this emergent and highly potent technology is dualistic in nature, thus posing radical threats and opportunities for innovation to

General discussion
Deepfakes are highly realistic synthetic media generated by algorithms (Chesney and Citron, 2019, Maksutov et al., 2020) and typically distributed as social media content. They carry the potential to create marketplace deceptions for both firms and consumers. Deepfakes also offer various opportunities (Chesney and Citron, 2019, Dwivedi et al., 2021, Kietzmann et al., 2020, Westerlund, 2019). The current knowledge on deepfakes is scant and diffuse (Maksutov et al., 2020, Zotov et al., 2020). In

Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgement
Mekhail Mustak expresses gratitude to Liikesivistysrahasto (The Foundation for Economic Education, Finland) for financial support towards this research.

Mekhail Mustak is Assistant Professor (Marketing and Sales Management Department) at the IÉSEG School of Management, France. He is also an affiliated Senior Researcher at the Turku School of Economics, Finland, and a member of the Value Creation for Cyber-Physical Systems and Services (CPSS) research group at the University of Jyväskylä, Finland. His research focuses on the application of artificial intelligence in marketing, B2B marketing, and services marketing. Before joining academia, he..																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																											      8. Write a complete code for this assignment.																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																																		  answer:Deepfake technology has evolved rapidly in recent years, making it increasingly challenging to distinguish real from fabricated content. Detecting deepfakes has become a critical task, and deep learning models like EfficientNet have proven effective in this endeavor.


Graphics Credits: Papers with Code
So let’s get started!!

Prerequisites
Before we dive into the code, make sure you have the following prerequisites installed:

Python 3.x, TensorFlow (2.x), Keras, OpenCV, NumPy, Matplotlib

You can install these libraries using pip if you haven't already:

pip install tensorflow opencv-python numpy matplotlib
Step 1: Data Collection
To create a deepfake detection model, you need a dataset of both real and deepfake videos/images. Various deepfake datasets are available online, such as FaceForensics++ or Celeb-DF.

Download and organize your dataset into real and fake subdirectories.

Step 2: Data Preprocessing
Load and preprocess your dataset. Resize images to a common size, typically 224x224 pixels, and normalize pixel values to the range [0, 1]. You can use Keras’ ImageDataGenerator to facilitate this process.

from tensorflow.keras.preprocessing.image import ImageDataGenerator

# Define data augmentation and preprocessing
datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=20,
    width_shift_range=0.2,
    height_shift_range=0.2,
    shear_range=0.2,
    zoom_range=0.2,
    horizontal_flip=True,
    validation_split=0.2  # Split data into training and validation
)

# Load and preprocess data
train_generator = datagen.flow_from_directory(
    'data/train',
    target_size=(224, 224),
    batch_size=32,
    class_mode='binary',
    subset='training'
)

validation_generator = datagen.flow_from_directory(
    'data/train',
    target_size=(224, 224),
    batch_size=32,
    class_mode='binary',
    subset='validation'
)
Step 3: Build the Deepfake Detection Model
EfficientNet is an efficient convolutional neural network architecture that performs well on various computer vision tasks.

We’ll use the pre-trained EfficientNetB0 model and fine-tune it for deepfake detection.

from tensorflow.keras.applications import EfficientNetB0
from tensorflow.keras.layers import Dense, GlobalAveragePooling2D
from tensorflow.keras.models import Model

# Load pre-trained EfficientNetB0
base_model = EfficientNetB0(weights='imagenet', include_top=False, input_shape=(224, 224, 3))

# Add custom classification head
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1024, activation='relu')(x)
predictions = Dense(1, activation='sigmoid')(x)

model = Model(inputs=base_model.input, outputs=predictions)

# Freeze pre-trained layers
for layer in base_model.layers:
    layer.trainable = False

# Compile the model
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])
Step 4: Train the Model
Train the deepfake detection model using the training and validation data generators created earlier.

history = model.fit(
    train_generator,
    steps_per_epoch=len(train_generator),
    epochs=10,
    validation_data=validation_generator,
    validation_steps=len(validation_generator)
)
Step 5: Evaluate the Model
Evaluate the model’s performance on a separate test dataset or real-world data to assess its deepfake detection accuracy.

test_generator = datagen.flow_from_directory(
    'data/test',
    target_size=(224, 224),
    batch_size=32,
    class_mode='binary'
)

loss, accuracy = model.evaluate(test_generator)
print(f"Test loss: {loss:.4f}")
print(f"Test accuracy: {accuracy*100:.2f}%")
Conclusion
This article demonstrated how to build a deepfake detection model using EfficientNet. Detecting deepfakes is a challenging task, and while this is a simplified example, it can serve as a starting point for more sophisticated deepfake detection systems. Remember that the effectiveness of such a model depends on the quality and diversity of your dataset, and continuous training and evaluation are essential to stay ahead of evolving deepfake technology.