[00:00.000 --> 00:10.840]  So, it's nice to see a nice crowd after two years of pandemic.
[00:10.840 --> 00:12.560]  You're beautiful.
[00:12.560 --> 00:20.720]  So today we're going to talk about similarity detection and how we use it in integrity.
[00:20.720 --> 00:31.440]  As a way to ensure that the website is a safe place, that people just maintain an integrity
[00:31.440 --> 00:33.840]  of place.
[00:33.840 --> 00:36.120]  The outline of the presentation is as follows.
[00:36.120 --> 00:41.640]  We're going to outline the problem, then how we use automation and similarity detection
[00:41.640 --> 00:45.120]  in order to achieve what we want.
[00:45.120 --> 00:50.120]  The current technology that we use for images, which is the vector search, then we are going
[00:50.120 --> 00:57.600]  to discuss in depth what is the actual technology, the vector embedding that makes possible to
[00:57.600 --> 01:01.360]  transform a picture into an element of search.
[01:01.360 --> 01:06.760]  The current platform offering that met up with this proposal to allow other people to
[01:06.760 --> 01:12.840]  crowd source all of their findings into a centralized place.
[01:12.840 --> 01:19.000]  And last but not least, what we have of open and free that you can install in your own,
[01:19.000 --> 01:27.560]  you can deploy in your own site to benefit all these technological findings.
[01:27.560 --> 01:34.280]  So the problem is that any big platform bears the responsibility to ensure it's a safe place
[01:34.280 --> 01:35.280]  to serve.
[01:35.280 --> 01:40.680]  No matter what also the law says, that you have to make sure whatever the user posts,
[01:40.680 --> 01:47.320]  you are ultimately responsible to make sure that everybody is just not exposed to things
[01:47.320 --> 01:52.080]  that will violate your community guidelines.
[01:52.080 --> 01:57.520]  Meta has almost three billion users, it's likely less than a world population.
[01:57.520 --> 02:03.240]  And although the vast majority of our users follow rules, some fringe bad actors will
[02:03.240 --> 02:04.520]  always be present.
[02:04.520 --> 02:13.800]  And at that scale, fringe means tens of millions of bad person creating a lot of problems.
[02:13.800 --> 02:21.800]  And when I mean issues, problems, I mean child exploitation, imageries, non-consensual, intimate
[02:21.800 --> 02:27.440]  imagery, which is a way to say revenge porn, adult sexual exploitation, people forced to
[02:27.440 --> 02:36.360]  perform sexual acts in front of camera against their will, terrorism, violence, whatever.
[02:36.360 --> 02:43.200]  And just to give you a couple of numbers, Meta publishes a transparency report quarterly
[02:43.200 --> 02:48.840]  about what we do to ensure the platform stays safe.
[02:48.840 --> 02:57.360]  And on the second quarter of 2022, we removed the 38 million of adult sexual exploitation
[02:57.360 --> 02:59.280]  pieces of content taken down.
[02:59.280 --> 03:03.760]  And it's just for this category, child exploitation is not so huge, thank God, but also there
[03:03.760 --> 03:07.200]  are other like violence, terrorism and stuff.
[03:07.200 --> 03:12.680]  That accounted for the 0.04% of view content worldwide.
[03:12.680 --> 03:20.640]  And in case you were asking, 97% of this content was proactively taken off, even before people
[03:20.640 --> 03:23.840]  could even see it.
[03:23.840 --> 03:29.320]  The remaining 2.8% is user reports, like I found this.
[03:29.320 --> 03:34.440]  And we take that down also, and we also add to the data banks just to make sure that we
[03:34.440 --> 03:36.840]  are not forgetting about that.
[03:36.840 --> 03:40.280]  Sometimes there are false positives because it's just unavoidable.
[03:40.280 --> 03:44.000]  And half million was restored upon user appeal.
[03:44.000 --> 03:51.080]  And we restore accounts and mostly accounts and the pictures that we're banned for.
[03:51.080 --> 03:58.000]  It goes by itself to the sheer volume of content, the huge scale, the problem we are facing,
[03:58.000 --> 04:05.040]  requires both automation and also human review to ensure either accuracy, both accuracy and
[04:05.040 --> 04:06.540]  also consistency.
[04:06.540 --> 04:11.520]  So there will be a problem if we had the 1 million people clicking and making decisions
[04:11.520 --> 04:16.440]  and what is violating for one is not for the other and vice versa.
[04:16.440 --> 04:21.480]  And so, and we cannot just also just employ automation, because otherwise we will have
[04:21.480 --> 04:27.840]  this very powerful site, decapitating everybody, also innocent users.
[04:27.840 --> 04:34.600]  So the role of automation and similarity detection, the thing is that a lot of things that happen
[04:34.600 --> 04:38.360]  online are things that are being repeated.
[04:38.360 --> 04:40.560]  So are things that are already occurred in the past.
[04:40.560 --> 04:47.560]  Like people posting a picture of some shooting, some mass shooting, for example, like the
[04:47.560 --> 04:53.800]  buffalo or the Christ church, gets taken down and the 10 more accounts spawn and post the
[04:53.800 --> 04:54.800]  same things.
[04:54.800 --> 05:03.760]  So it's much, it's very efficient to reason in terms of let's just redo the things that
[05:03.760 --> 05:06.600]  we already found out that worked.
[05:06.600 --> 05:11.680]  We employ automation to scale, of course, handle the scale of the problem and to consistently
[05:11.680 --> 05:17.280]  repeat a decision that a human reviewer has already vetted in the past.
[05:17.280 --> 05:24.080]  So we tie a content to a decision, a violating content to a decision, let's act upon this.
[05:24.080 --> 05:27.840]  And we take, we act, we tie the decision to the actions.
[05:27.840 --> 05:33.160]  Let's just repeat this action every time we meet a piece of content that triggered this
[05:33.160 --> 05:34.640]  same decision.
[05:34.640 --> 05:38.680]  We do that for videos, for pictures, and also for text.
[05:38.680 --> 05:44.360]  Today we'll be mostly talking about images because the techniques for video and pictures
[05:44.360 --> 05:50.200]  are somewhat very similar, text has a completely different array of techniques that we'll not
[05:50.200 --> 05:53.280]  be presenting today.
[05:53.280 --> 05:59.440]  So a way to, if you want to achieve similarity detection, you have to come up with a way
[05:59.440 --> 06:02.040]  to achieve similarity first.
[06:02.040 --> 06:05.240]  So how do we compare to pictures?
[06:05.240 --> 06:09.400]  Of course, we are not, we are not be doing pixel by pixel comparison.
[06:09.400 --> 06:11.240]  We want to be much faster.
[06:11.240 --> 06:16.960]  Our way to do that is just, okay, let's just MD5 hash all the pictures or SHA1 all the
[06:16.960 --> 06:22.440]  pictures and then we store them somewhere in an indexing system.
[06:22.440 --> 06:29.560]  And whenever a new picture comes in, we just recreate the hash and if it matches, we just
[06:29.560 --> 06:31.080]  ban, right?
[06:31.080 --> 06:36.360]  Well, that doesn't work very well because the cryptographic hashes are not resistant
[06:36.360 --> 06:43.720]  to resizing, rotation, one pixel alteration, all the hash changes all together.
[06:43.720 --> 06:51.680]  Instead, we can really benefit from local hashing because it allows for similarity measurement.
[06:51.680 --> 06:58.960]  Like you change slightly one piece, one portion of the image, and the hash changes a little,
[06:58.960 --> 06:59.960]  but not completely.
[06:59.960 --> 07:06.200]  Then you can reason in terms of distance between two hashes.
[07:06.200 --> 07:10.360]  So you have to turn, you have to find a way to turn an image into a vector and then you
[07:10.360 --> 07:12.040]  perform a vector search.
[07:12.040 --> 07:18.360]  Whenever two vectors are very, very close beyond a certain threshold, then it's probably
[07:18.360 --> 07:19.720]  a match.
[07:19.720 --> 07:24.000]  And just in case if you're asking, these are based as the architecture.
[07:24.000 --> 07:30.240]  You have more or less all the architectures share these four stages, observation, an image
[07:30.240 --> 07:35.440]  has been generated, usually push event like user uploaded something.
[07:35.440 --> 07:42.360]  Then you have the representation phase in which you hash the image to a compact representation.
[07:42.360 --> 07:48.000]  If you're indexing, you store that into your index and instead if you are at inference time
[07:48.000 --> 07:55.200]  like an event someone uploaded something, you search the index they have built with representation.
[07:55.200 --> 08:03.160]  In case you have a match, you action upon what you decide what to do with the match you got.
[08:03.160 --> 08:07.120]  Usually the idea is that this is very close to an image that I already see in the past
[08:07.120 --> 08:10.200]  that was banned and also the account was taken down.
[08:10.200 --> 08:14.480]  Do the same to this user.
[08:14.480 --> 08:24.480]  So first three pieces of content, Facebook has released a library which is FICE, the
[08:24.480 --> 08:32.320]  Facebook similarity search library is a library to do similarity search over a vector of dense
[08:32.320 --> 08:36.400]  vectors or vector floats or integers, for example.
[08:36.400 --> 08:42.600]  You can think about it like a C++ version of Lucene so you index stuff, puts that in
[08:42.600 --> 08:46.400]  a very big space and you can search in this space very fast.
[08:46.400 --> 08:51.720]  It supports CUDA so you can use your GPUs to search.
[08:51.720 --> 08:58.000]  It's basically index on steroids and it's C++ but it has Python bindings available and
[08:58.000 --> 09:00.120]  it scales almost nearly.
[09:00.120 --> 09:07.400]  You can really index 100 millions of pieces on a single machine and it just handles them
[09:07.400 --> 09:14.560]  really, doesn't need to saturate all the memory so it has a very good optimization properties
[09:14.560 --> 09:21.480]  that makes it a very good tool and you can go and download that on GitHub.
[09:21.480 --> 09:27.920]  Today we are also mostly referring to with the perceptual ashing.
[09:27.920 --> 09:33.240]  This means that we are reasoning in terms of colors, colors and images, shapes.
[09:33.240 --> 09:36.680]  We are not reasoning about what's happening inside the image.
[09:36.680 --> 09:43.200]  That's the semantic ashing which we are not going to talk about this today.
[09:43.200 --> 09:49.000]  Perceptual ashing just captures visual similarities and it's very nice for use case because it
[09:49.000 --> 09:52.120]  exactly does its job.
[09:52.120 --> 09:58.960]  So you might think that we are all talking about machine learning systems that come up
[09:58.960 --> 10:04.880]  with very clever representations about our pictures and I'm asking do we really need
[10:04.880 --> 10:06.800]  a convnet for that?
[10:06.800 --> 10:09.480]  Do we really need to employ GPUs?
[10:09.480 --> 10:15.120]  You already said that it's on CUDA so perhaps that's a nice hint but absolutely not.
[10:15.120 --> 10:21.240]  Most of this technology is like a ashing technology so they just computer represent a mathematical
[10:21.240 --> 10:26.760]  transformation over the image and it's really fast and it's really cheap and it can be executed
[10:26.760 --> 10:29.080]  almost everywhere.
[10:29.080 --> 10:37.520]  So a little bit of history, the first very notable example, it comes from a source that
[10:37.520 --> 10:42.400]  nobody would have thought about, it's Microsoft in 2009.
[10:42.400 --> 10:44.400]  Microsoft invents photo DNA.
[10:44.400 --> 10:50.640]  Photo DNA is the first algorithm employed in fight against exploitive images of children.
[10:50.640 --> 11:04.080]  So it transforms a picture into an ash of 144 unsigned integers on 8-bit representation.
[11:04.080 --> 11:05.720]  It's proprietary.
[11:05.720 --> 11:15.800]  So Microsoft licenses this to any non-profit or any organization that wants to fight exploitive
[11:15.800 --> 11:16.800]  images of children.
[11:16.800 --> 11:19.560]  It gives you a license, you can use for that and nothing else.
[11:19.560 --> 11:23.080]  But I cannot disclose the details of how that works.
[11:23.080 --> 11:29.040]  It can be used only for that but Microsoft donated the photo DNA to the National Center
[11:29.040 --> 11:32.080]  for the Missing and Exploited Children, the NACMAC.
[11:32.080 --> 11:39.720]  It's this American non-profit that basically acts as a coordination center in global fight
[11:39.720 --> 11:47.880]  against this phenomenon and shares this library with anyone that wants to integrate.
[11:47.880 --> 11:53.560]  This I cannot talk about how this works, this is the only moment in which I will say something
[11:53.560 --> 11:54.560]  like that.
[11:54.560 --> 12:00.360]  But we can talk about an open source counterpart that almost 10 years later Facebook releases
[12:00.360 --> 12:01.360]  PDQ.
[12:01.360 --> 12:08.200]  PDQ stands for Perceptual Algorithm Using Discrete Cousin Transform and gives a quality
[12:08.200 --> 12:09.200]  metric.
[12:09.200 --> 12:13.560]  It's a very, very bad acronym but we need a three-letter acronym so it's that.
[12:13.560 --> 12:20.640]  It creates a 256-bit hash, uses hamming distance to compute the distance.
[12:20.640 --> 12:21.940]  It's really fast.
[12:21.940 --> 12:27.640]  The compute overhead is negligible compared to discrete.
[12:27.640 --> 12:30.920]  Can tolerate some level of adversality.
[12:30.920 --> 12:35.800]  This means that you change the image because you want to fool the systems in that this
[12:35.800 --> 12:41.840]  image is not something which is well-known, PDQ can resist a little to this manipulation
[12:41.840 --> 12:43.880]  but not all of them.
[12:43.880 --> 12:47.200]  It's used in stopncii.org.
[12:47.200 --> 12:54.120]  It's a website where people, in case you have a fight with your ex-fiancé and he's threatening
[12:54.120 --> 13:02.320]  to publish your intimate imagery, you go to stopncii.org, you upload your intimate imageries,
[13:02.320 --> 13:10.000]  fingerprints get taken, original images get deleted right away of course, and these fingerprints
[13:10.000 --> 13:17.360]  are shared with partners that, okay, if I am going to see these fingerprints in my website,
[13:17.360 --> 13:19.600]  my platform, I'm going to take them down.
[13:19.600 --> 13:24.640]  So it's a crowd source effort and uses PDQ for images.
[13:24.640 --> 13:26.400]  How does that work?
[13:26.400 --> 13:33.760]  So PDQ hashing is, optionally scale down to a square image, okay.
[13:33.760 --> 13:35.880]  Then you compute the luminance.
[13:35.880 --> 13:42.080]  Luminance is the idea that you take the pixel that contributes most in the RGB channel.
[13:42.080 --> 13:45.080]  Instead of putting black and white, you use the luminance.
[13:45.080 --> 13:50.040]  It's just another procedure and the idea is that the luminance gives you better information
[13:50.040 --> 13:58.240]  about what was the channel that was contributing most to the color, to the light in that place.
[13:58.240 --> 14:05.680]  Then you down sample to 64 times 64 using a blur filter and the idea of the blur filter
[14:05.680 --> 14:13.640]  or tent filter is that it gets the most significant value in that region because if you keep convoluting
[14:13.640 --> 14:19.960]  a pixel with your neighborhood, what you will have in the end will be the highest value.
[14:19.960 --> 14:26.600]  So you obtain a representation which is compact and retains the most significant information.
[14:26.600 --> 14:32.120]  Then you divide the images in 16 times 16 boxes, each one by 4 pixels, and you calculate
[14:32.120 --> 14:35.200]  a discrete cosine transform of each box.
[14:35.200 --> 14:40.760]  The discrete cosine transform, so the box is at the 4 bar color there.
[14:40.760 --> 14:47.160]  You see that the grid with a lot of wobbly images, that is a discrete cosine transform.
[14:47.160 --> 14:54.680]  The idea is that any image, any signal can be represented as a sum of cosine signals.
[14:54.680 --> 14:59.960]  You only take the signal, the most significant one, so it's a form of compression actually,
[14:59.960 --> 15:09.120]  and you take the most significant coefficient for the biggest cosine you have.
[15:09.120 --> 15:13.280]  And then you calculate if the median is above a certain value, then it's one, otherwise
[15:13.280 --> 15:14.280]  it's zero.
[15:14.280 --> 15:20.840]  So you get this 256 in an array of 010101 in case this pixel were a high luminance or
[15:20.840 --> 15:23.240]  a low luminance.
[15:23.240 --> 15:29.480]  The DCT provides a spectral hashing property, identifies what is the point in the images,
[15:29.480 --> 15:31.720]  that contributes more or less.
[15:31.720 --> 15:36.360]  You have an hashing space, which is 2 to the power of 1 to 28, because it's half the
[15:36.360 --> 15:41.240]  ashes, because half is always 0, half is always 1.
[15:41.240 --> 15:47.840]  To search, you just do a vector search again, what you've just created.
[15:47.840 --> 15:53.960]  In case we want, we can use partially the same technology to do video hashing, and this
[15:53.960 --> 15:57.920]  is another, it comes in almost the same paper.
[15:57.920 --> 16:07.800]  The TMK is a temporary matching kernel, is a way to use the PDQ creation to do a video
[16:07.800 --> 16:10.640]  similarity detection algorithm.
[16:10.640 --> 16:16.560]  It produces a fixed length video hashes, so your hash stays at the same length, which
[16:16.560 --> 16:24.440]  is like 256 kilobytes, if I'm not wrong, even if your video lasts for 3 hours or 30 seconds.
[16:24.440 --> 16:27.600]  It just produces a fixed length, so it's really nice.
[16:27.600 --> 16:33.440]  What you do is that you resample a video to 15 frames, then you compute the PDQ without
[16:33.440 --> 16:36.880]  the 01 quantization, so you keep the float numbers.
[16:36.880 --> 16:44.480]  That's why it's called PDQF, PDQ float, and then you compute the average of the old descriptors
[16:44.480 --> 16:49.600]  that you have within various periods of the cousin and scene.
[16:49.600 --> 16:52.400]  Why we add the cousin curves?
[16:52.400 --> 17:00.080]  Because a cousin or a scene adds this wobbly movement that tells you whether a frame is
[17:00.080 --> 17:06.600]  before or later in the near surroundings, the near neighborhood of the frames.
[17:06.600 --> 17:12.680]  In case you have 10 pictures, you add this cousin signal, you know this picture is before
[17:12.680 --> 17:18.480]  this one because you see the cousin curve which is going up and going down, and it's
[17:18.480 --> 17:25.480]  a nice uniqueness fingerprinting time signature algorithm to add a cousin.
[17:25.480 --> 17:31.120]  You compute the average of all the frames, the PDQF for all the frames, with various
[17:31.120 --> 17:36.320]  periods, various scene and cousin, and then you pack them all together, and you have these
[17:36.320 --> 17:42.840]  five or six averages, and that's your PDQF embedding.
[17:42.840 --> 17:48.760]  Exampling is just you compare first the vector zero, which is the average of all the frames
[17:48.760 --> 17:55.080]  and doesn't retain a temporal signature, then if there is a match, you compare also all
[17:55.080 --> 18:01.200]  the other vectors at different periods, which are the level two action as the time signature,
[18:01.200 --> 18:04.840]  and so you can be really be sure that the videos are really the same, because if you
[18:04.840 --> 18:09.640]  find the same averages with the same periods, it must be the same video.
[18:09.640 --> 18:13.760]  It's nice that it's resistant to resampling, because you always resample.
[18:13.760 --> 18:19.800]  So in some way, if you vary the frame rate, the video will change, and MD5 hash will change,
[18:19.800 --> 18:22.160]  but this one is not full load.
[18:22.160 --> 18:28.720]  Ashing is really slow, because you have to do a transcoding of all the videos first,
[18:28.720 --> 18:33.480]  and then you have to read all the frames and compute the PDQ for every frame.
[18:33.480 --> 18:36.680]  But search is actually very fast.
[18:36.680 --> 18:39.920]  Another nice hashing technique that we have is the video MD5.
[18:39.920 --> 18:43.600]  I said that we will not be using a crypto-ashes highlight.
[18:43.600 --> 18:46.440]  We use crypto-ashes, but just for videos.
[18:46.440 --> 18:51.000]  This because if you take a MD5 of video and find exact copies, it's really cheap in this
[18:51.000 --> 18:52.000]  way.
[18:52.000 --> 18:58.320]  A lot of actors just repost unmodified content.
[18:58.320 --> 19:04.320]  They are not going really through the hassle of doing a encoding just to try to fool the
[19:04.320 --> 19:05.320]  systems.
[19:05.320 --> 19:07.280]  They just try to repost again.
[19:07.280 --> 19:12.760]  So the MD5 actually works, and it can be done with vector search if we use the bytes for
[19:12.760 --> 19:14.600]  the MD5 algorithm.
[19:14.600 --> 19:20.880]  And it's used widely in stopncii.org also.
[19:20.880 --> 19:27.400]  In 2022, Facebook has released the video PDQ, which is a different algorithm from the former
[19:27.400 --> 19:28.880]  one.
[19:28.880 --> 19:33.880]  Hashing is that we hash every frame to a PDQ hash, and we just pack the list.
[19:33.880 --> 19:36.120]  It's much bigger.
[19:36.120 --> 19:44.240]  It's not slower than the other one, but it has a nice property that we just have to search
[19:44.240 --> 19:46.160]  for individual frames.
[19:46.160 --> 19:50.120]  So we treat the problem as a back-of-word approach.
[19:50.120 --> 19:55.320]  So we just put all these frames inside an index library.
[19:55.320 --> 20:00.360]  Then we search, and we take all the candidates, and we do a pairwise comparison.
[20:00.360 --> 20:05.480]  If the pairwise comparison is successful beyond a certain threshold, then it's a match.
[20:05.480 --> 20:12.080]  And also this you get for free, and it's released along with the PDQ, along with the TMK, PDQF.
[20:12.080 --> 20:20.080]  All this is available inside the Facebook Research GitHub repository.
[20:20.080 --> 20:22.920]  What do you do once you have all these hashes?
[20:22.920 --> 20:28.200]  Your platform is computing the hashes, but it's the first time that you see this content,
[20:28.200 --> 20:32.640]  but perhaps all other actors have already seen this content, too.
[20:32.640 --> 20:39.920]  While you upload them to the threat exchange platform, Necmac shares the PDNA hashes, I
[20:39.920 --> 20:42.400]  told you, with all companies that are asking for them.
[20:42.400 --> 20:48.160]  So can you please tell me where this picture that someone uploaded is a match in Necmac?
[20:48.160 --> 20:52.400]  So I already know that this is something I should call the law enforcement.
[20:52.400 --> 20:59.360]  Data does the equivalent, but for the PDQ, because it has much less friction to adopt
[20:59.360 --> 21:02.600]  the PDQ compared to the PDNA.
[21:02.600 --> 21:06.240]  There's a team, the Internet Safety Engineering that builds and operates all these services
[21:06.240 --> 21:15.960]  where anyone can upload fingerprints, and so you can crowdsource a big graph of matches.
[21:15.960 --> 21:23.600]  There's REST API to access and post new data, has multi-language clients, uses PDQ, and
[21:23.600 --> 21:25.000]  users can also download the data.
[21:25.000 --> 21:27.800]  You are not forced to stay online, stay connected.
[21:27.800 --> 21:33.080]  You can just request for a dump of the database and you can search it.
[21:33.080 --> 21:40.040]  And you find all the data and all the APIs at the GitHub page.
[21:40.040 --> 21:49.720]  In 2020, Facebook also has released its most advanced algorithm to spot similar images,
[21:49.720 --> 21:52.720]  the SimSearchNet++.
[21:52.720 --> 22:01.920]  This is an error network, and it is capable of facing adversarial manipulation that the
[22:01.920 --> 22:06.040]  other embeddings just are not able to.
[22:06.040 --> 22:13.240]  Unfortunately, SimSearchNet is proprietary, so I cannot really talk about that, but we
[22:13.240 --> 22:25.680]  have a cousin product, SSCD, the SimSearch Copy Detection, which is open source and free.
[22:25.680 --> 22:27.560]  So I can really talk about that.
[22:27.560 --> 22:34.640]  They are somewhat related in some technological principles, so I can really talk about this.
[22:34.640 --> 22:39.480]  So this is a PyTorch-based model.
[22:39.480 --> 22:46.200]  So the problem that this, which is a state-of-the-art product, is trying to solve is what happens
[22:46.200 --> 22:54.840]  if I take a picture and I put a caption on it, alterating so many pixels everywhere.
[22:54.840 --> 23:03.120]  A PDQ or a PDNA-ASH would be altered dramatically, but is there anything we can do to teach
[23:03.120 --> 23:10.080]  a computer to just ignore all the captions, all the rotations, all the jitters, all the
[23:10.080 --> 23:11.600]  cropping of the image?
[23:11.600 --> 23:12.680]  Yes, there is.
[23:12.680 --> 23:17.280]  A person is able to do that, so we can teach a computer to do that, too.
[23:17.280 --> 23:19.200]  So models and code are available.
[23:19.200 --> 23:24.040]  What is now available is the training data that we use to create a model, of course.
[23:24.040 --> 23:31.720]  For those which are into the deep learning, it's a ResNet 50 convolutive neural network.
[23:31.720 --> 23:37.320]  And the novelty of the approach is that it's based on our MAC vocabularies.
[23:37.320 --> 23:43.960]  Our regional MAC, for those, how many of you know how a convolutive network work?
[23:43.960 --> 23:44.960]  Raise your hand.
[23:44.960 --> 23:45.960]  Okay, fine.
[23:45.960 --> 23:46.960]  Very good.
[23:46.960 --> 23:53.800]  So it's a network for the others that looks at the image, looks at portions of the image.
[23:53.800 --> 23:59.680]  Each neuron looks at a different portion, and then they pass what they have understood
[23:59.680 --> 24:04.840]  to a higher level series of neurons, the higher and the higher and the higher, until the last
[24:04.840 --> 24:10.640]  layer of the neurons has a very wide overview of the whole picture.
[24:10.640 --> 24:17.240]  In this case, we are using the maximum activation of all the channels that we have.
[24:17.240 --> 24:24.960]  So we take note which are the regions of our Carnaut maps for every different channel,
[24:24.960 --> 24:28.840]  which across all channels have the maximum activation.
[24:28.840 --> 24:34.760]  If you have 10 channels, and that region across all the different channels, all of them you
[24:34.760 --> 24:39.320]  have a maximum activation, that means that that area is an area of interest.
[24:39.320 --> 24:45.660]  So we use these areas of interest as a word in a vocabulary.
[24:45.660 --> 24:52.200]  So exactly when you do the Cosine similarity search for documents, you take all the words,
[24:52.200 --> 24:57.760]  you index all the words, you say these documents as these words, so it's like a vector of words,
[24:57.760 --> 25:05.080]  and then we try to see which are the vectors that have the most words in common and put
[25:05.080 --> 25:07.160]  in the same place.
[25:07.160 --> 25:10.680]  We do the same things, but for portions of the image.
[25:10.680 --> 25:13.120]  So we use the rmax.
[25:13.120 --> 25:17.280]  The idea is that it's a self-supervised system also.
[25:17.280 --> 25:25.240]  So it means that it's trained to recognize augmented input, and it's trained to match
[25:25.240 --> 25:28.320]  an input to its augmented version.
[25:28.320 --> 25:32.440]  So what we do is that we take the training set, we repeat a lot of augmentation, we add
[25:32.440 --> 25:37.880]  the captions, the randomity, we rotate, we flip, we alter the colors.
[25:37.880 --> 25:46.600]  For example, if you do a one degree of whitening, you make the image brighter, which is you
[25:46.600 --> 25:51.560]  add plus one to all the pixels in the image, you are altering all the pixels.
[25:51.560 --> 25:56.760]  But in this case, a PDQ hash is capable of understanding the difference.
[25:56.760 --> 26:01.280]  There's a very weak form of adversarial attack, because the PDQ just computes the difference
[26:01.280 --> 26:03.800]  between regions, so it's not going to be fooled.
[26:03.800 --> 26:09.280]  But you can be much more violent and put just a spot color somewhere, and PDQ is going to
[26:09.280 --> 26:10.280]  be fooled by that.
[26:10.280 --> 26:15.760]  Then you do through the CNN, you do a thing called gem pool, which means you do a generative
[26:15.760 --> 26:22.840]  mean pooling, a generalization of the average pooling, in case you were wondering.
[26:22.840 --> 26:30.400]  Then you go, and at the end you use an entropy-oriented loss function.
[26:30.400 --> 26:39.360]  This means that we want to encourage the network to spread the representation of training data
[26:39.360 --> 26:45.720]  along all different places, because we want to maximize the distance between all the training
[26:45.720 --> 26:48.160]  examples in the training set.
[26:48.160 --> 26:51.200]  So you get a nice uniform search space.
[26:51.200 --> 26:57.520]  At the inference time, you do the same with the CNN, and then you obtain a vector, which
[26:57.520 --> 26:59.760]  is a representation of an image.
[26:59.760 --> 27:06.680]  And the idea is that there is a distance that you can compute between the data set of the
[27:06.680 --> 27:08.440]  reference images.
[27:08.440 --> 27:14.640]  Of course, you can subtract a background data set that was used generally to augment
[27:14.640 --> 27:20.240]  the images, but in this case, what you obtain in the end is that the score of the augmented
[27:20.240 --> 27:26.480]  image is almost the same of the non-augmented version, because it just learns to ignore
[27:26.480 --> 27:30.440]  the places which are not organic in the image.
[27:30.440 --> 27:34.400]  And SSCD is freely available.
[27:34.400 --> 27:37.080]  You can download that and start playing.
[27:37.080 --> 27:42.600]  You find both code and models, as I already said, but not the training data.
[27:42.600 --> 27:47.160]  And by the way, Facebook has also announced an image similarity challenge.
[27:47.160 --> 27:52.320]  You have to determine whether a query image is a modified copy of any image in a reference
[27:52.320 --> 27:53.920]  corpus of one million.
[27:53.920 --> 28:01.960]  This is very similar to the Netflix recommendation challenge, when you had to recommend the movies
[28:01.960 --> 28:05.520]  and you had to beat Netflix's algorithm.
[28:05.520 --> 28:10.600]  And this is the image similarity challenge, and also the meta-IE video similarity challenge,
[28:10.600 --> 28:14.520]  which is two tracks.
[28:14.520 --> 28:20.920]  Generate a useful vector representation for a video, and also try to find a reference
[28:20.920 --> 28:26.440]  video into this very big corpus, and you don't have to only find a video.
[28:26.440 --> 28:36.200]  You have to find a clip, so a sub-portion of a video, into a very big corpus.
[28:36.200 --> 28:43.440]  And last but not least, since the last part of a donor is the tastier one, we have your
[28:43.440 --> 28:49.480]  turnkey open-source solution that you can install in your own premise.
[28:49.480 --> 28:51.640]  The hushier matcher actioner.
[28:51.640 --> 28:57.680]  HMA is an open-source turnkey safety solution.
[28:57.680 --> 29:03.480]  So you just download it, install it, and it starts working right away.
[29:03.480 --> 29:09.880]  What it does is that it scans the images that you want to push towards it.
[29:09.880 --> 29:16.320]  It has an index that is updated with all the hashes coming from thread exchange, but also
[29:16.320 --> 29:26.160]  from yours, and is able to, say, to bind banks' verticals of violations.
[29:26.160 --> 29:30.320]  You might have a non-severe violation or very severe violation, and you might decide that
[29:30.320 --> 29:35.920]  for non-severe violation, you just delete the content and send a warning, or for high-severity
[29:35.920 --> 29:42.440]  violation, you just immediately delete the content, shut down the account of the poster,
[29:42.440 --> 29:45.920]  and you also signal it to the law enforcement.
[29:45.920 --> 29:47.920]  You can do that.
[29:47.920 --> 29:53.640]  And you can configure actions in a backend that are tied to the content that you want
[29:53.640 --> 29:58.400]  to bank into your HMA platform.
[29:58.400 --> 30:03.440]  You can pull violating seeds from Facebook thread exchange API, and works on AWS only,
[30:03.440 --> 30:12.240]  because we wanted to make a very easy-to-use thing, and also something that doesn't really
[30:12.240 --> 30:13.760]  mix your bill higher.
[30:13.760 --> 30:17.600]  So we built it on AWS Lambda.
[30:17.600 --> 30:22.160]  So it doesn't cost anything until it runs, then it runs, spawns a Lambda instance, and
[30:22.160 --> 30:27.200]  then goes down, and you only pay for the seconds that it actually runs.
[30:27.200 --> 30:28.720]  But it's very fast.
[30:28.720 --> 30:33.480]  And there's a Terraform module available thanks to the lovely folks of the Internet Safety
[30:33.480 --> 30:34.480]  Engineering.
[30:34.480 --> 30:37.520]  This is how you deploy that.
[30:37.520 --> 30:43.320]  Your infra, you collocate HMA to your platform.
[30:43.320 --> 30:49.240]  For example, you might own a platform where people have a chat or people post pictures.
[30:49.240 --> 30:55.160]  Whenever new content comes, the web server asks the Azure, have you seen this?
[30:55.160 --> 30:57.440]  And the Azure goes to Matcher.
[30:57.440 --> 31:01.000]  Matcher goes to the index and says, do I know this?
[31:01.000 --> 31:09.200]  And in case there's a match, the actioner module will just tell your, you have to define
[31:09.200 --> 31:14.880]  a callback API in your own platform, like whenever the actioner calls, you are killing
[31:14.880 --> 31:17.160]  this content in your own backend.
[31:17.160 --> 31:24.360]  And, of course, you can fetch from external API new content from the fact exchange platform.
[31:24.360 --> 31:30.160]  So wrapping up, automation is necessary to be effective.
[31:30.160 --> 31:35.400]  But you will lose precision, of course, because automation doesn't really think.
[31:35.400 --> 31:38.920]  It just does whatever you have configured blindly.
[31:38.920 --> 31:44.400]  Human support is always needed for appeals and also to establish the ground through.
[31:44.400 --> 31:47.520]  So what is actually violating, what is not?
[31:47.520 --> 31:52.360]  Do expect false positive, because they will happen.
[31:52.360 --> 31:58.920]  You should put in place an appeal process to allow your users to restore the content.
[31:58.920 --> 32:06.480]  PDQ, video PDQ, MT5 and SSCD will provide you with a way to obtain compact representation
[32:06.480 --> 32:11.440]  of high dimensionality content like pictures and videos.
[32:11.440 --> 32:17.640]  HMA provides you with a turnkey solution that you can install on premise, on your premise,
[32:17.640 --> 32:24.280]  and search and enforce your integrity policies at your platform.
[32:24.280 --> 32:29.480]  And thread exchange provides you with a platform for exchanging representation with other big
[32:29.480 --> 32:34.720]  actors like, maybe, itself, for example.
[32:34.720 --> 32:35.720]  That was all from me.
[32:35.720 --> 32:46.160]  Thank you very much for listening.
[32:46.160 --> 32:55.120]  Any questions?
[32:55.120 --> 32:58.160]  You mentioned it for the challenge, I think?
[32:58.160 --> 33:00.400]  Oh, louder.
[33:00.400 --> 33:05.280]  So you mentioned it for the challenge, finding a clip of a video.
[33:05.280 --> 33:08.640]  Can PDQ do that, actually?
[33:08.640 --> 33:11.200]  You can't hear me.
[33:11.200 --> 33:19.240]  So can PDQ find clips of videos?
[33:19.240 --> 33:20.840]  That's my question, actually.
[33:20.840 --> 33:29.440]  So you should, you say, perhaps I heard about YouTube, what is something that already does.
[33:29.440 --> 33:35.360]  Like if the challenge is to find the clips of videos.
[33:35.360 --> 33:46.040]  Yeah, in general, it's possible, of course, and the video PDQ algorithms will ask every
[33:46.040 --> 33:47.040]  frame.
[33:47.040 --> 33:54.520]  So in case you send a very small sub portion of a video, you will have, like, 100 frames,
[33:54.520 --> 33:58.560]  for example, then these 100 frames will be treated as a bag of words.
[33:58.560 --> 34:03.880]  You search the index, you find the video that contained all of these words.
[34:03.880 --> 34:10.840]  So you have a match of all your query frames inside the index at the very long video that
[34:10.840 --> 34:11.840]  has it.
[34:11.840 --> 34:13.720]  And so it's a match.
[34:13.720 --> 34:14.720]  That's how we do.
[34:14.720 --> 34:18.720]  Of course, there are more clever ways to do that.
[34:18.720 --> 34:19.720]  Thanks.
[34:19.720 --> 34:20.720]  Hello.
[34:20.720 --> 34:28.120]  Not a technical question, but let's see.
[34:28.120 --> 34:35.680]  I was thinking that if you're using such a system to try to prevent digital crimes and
[34:35.680 --> 34:43.640]  such things like that, from an ethical perspective, I was just wondering that you, I suppose you
[34:43.640 --> 34:47.640]  have such images to compare them.
[34:47.640 --> 34:53.120]  And how do you process those, how do you make the decisions?
[34:53.120 --> 34:57.480]  So I repeat the question.
[34:57.480 --> 35:02.360]  From the ethical perspective, the idea is that, of course, we have to see the images
[35:02.360 --> 35:06.000]  in order to be able to know what's happening, right?
[35:06.000 --> 35:12.400]  Yeah, see and, of course, you have to save them and, I don't know, process them and how
[35:12.400 --> 35:14.560]  do you handle this?
[35:14.560 --> 35:20.000]  So this is not the kind of question that I really can answer because it is related to
[35:20.000 --> 35:22.480]  internal procedures.
[35:22.480 --> 35:30.120]  Now, if we have to compute the fingerprint of an image, there must be a one second in
[35:30.120 --> 35:34.280]  which the image is on our surface.
[35:34.280 --> 35:41.680]  It is, since the agency is like NECMAC, they share ashes.
[35:41.680 --> 35:45.440]  So you might have an ash for which you don't have a picture.
[35:45.440 --> 35:50.800]  And you have to trust that this ash is coming from a trusted source that has already vetted
[35:50.800 --> 35:54.560]  whether this ash is nasty stuff or not.
[35:54.560 --> 36:01.400]  That's how we actually avoid sanctioning heavily innocent people.
[36:01.400 --> 36:05.600]  So there is a collaboration with the trusted entities for this.
[36:05.600 --> 36:10.600]  When you receive those from an external agent, if those images are on your platform, you
[36:10.600 --> 36:14.560]  already know what you've seen.
[36:14.560 --> 36:16.200]  Thank you.
[36:16.200 --> 36:19.120]  Can you hear me despite the mask?
[36:19.120 --> 36:20.120]  Can you hear me?
[36:20.120 --> 36:22.240]  Thank you.
[36:22.240 --> 36:26.920]  So I have a question, but first I have a thanks because I have worked in this kind of thing
[36:26.920 --> 36:36.960]  and NECMAC doesn't share any useful data, IWF doesn't share any useful data, Farros doesn't
[36:36.960 --> 36:38.960]  share any useful data.
[36:38.960 --> 36:42.920]  So I will definitely take a look at the threat exchange platform and hope that it's much
[36:42.920 --> 36:44.280]  more useful.
[36:44.280 --> 36:45.440]  And thanks for that.
[36:45.440 --> 36:49.480]  No, I have a question anyway.
[36:49.480 --> 36:56.600]  If I was an attacker, I could download data from the threat exchange platform and try
[36:56.600 --> 37:03.400]  and run as many filters automatically until I find something that is not matched by PDQ,
[37:03.400 --> 37:06.040]  video PDQ, et cetera.
[37:06.040 --> 37:07.480]  What's the way to counter that?
[37:07.480 --> 37:13.080]  Oh, you're asking whether adversarial attacks are possible on PDQ?
[37:13.080 --> 37:14.760]  Yeah, of course.
[37:14.760 --> 37:19.520]  PDQ is a very naive algorithm that just detects the patches of colors.
[37:19.520 --> 37:24.440]  It is actually possible to create adversarial attacks.
[37:24.440 --> 37:34.400]  Just if you think that you alter many pixels in the image and perceptually for us doesn't
[37:34.400 --> 37:43.560]  change anything, but you might end up changing the most relevant pictures for the DCT algorithm.
[37:43.560 --> 37:51.200]  I will create a completely different ashing in the end.
[37:51.200 --> 37:58.240]  Also someone has demonstrated an attack, a reverse engineering attack on photo DNA.
[37:58.240 --> 38:05.080]  Like from the project, it's called ribosome.
[38:05.080 --> 38:13.040]  And it's a neural network that from a hash reconstructs a very blurry picture.
[38:13.040 --> 38:21.560]  So it is actually possible to do that, but PDQ is a very simple and fast algorithm.
[38:21.560 --> 38:29.520]  If you really want to combat seriously adversarial engineering, the things that you need neural
[38:29.520 --> 38:35.920]  networks like SSCD because it contains so many relations to different parts of the images
[38:35.920 --> 38:38.080]  and it's much harder to fool.
[38:38.080 --> 38:41.560]  I'm not saying it's not impossible because, of course, it's possible.
[38:41.560 --> 38:47.800]  In general, later someone will find a way, but it's the usual arms race between attackers
[38:47.800 --> 38:48.800]  and defenders.
[38:48.800 --> 38:51.000]  And it's no exception.
[38:51.000 --> 38:53.000]  Thank you for your question.
[38:53.000 --> 38:54.000]  Hello.
[38:54.000 --> 38:55.000]  Hi.
[38:55.000 --> 38:57.920]  First, thank you for the presentation.
[38:57.920 --> 39:00.080]  I think it's a very interesting topic.
[39:00.080 --> 39:07.640]  I wanted to link it because it's been a bit of a buzz the past few weeks, the generative
[39:07.640 --> 39:13.240]  AI, especially chat GPT, was wondering when you use that kind of algorithm and you scan
[39:13.240 --> 39:18.440]  an image, detect something, is there a level of confidence attached to the result and can
[39:18.440 --> 39:21.720]  you detect when an image is potentially a fake or?
[39:21.720 --> 39:28.520]  There is a lot of time because there's an echo, so I cannot really, can you do it louder
[39:28.520 --> 39:29.520]  please?
[39:29.520 --> 39:31.920]  It's hard to understand from here.
[39:31.920 --> 39:32.920]  Hello.
[39:32.920 --> 39:33.920]  Okay.
[39:33.920 --> 39:34.920]  Is it better?
[39:34.920 --> 39:35.920]  Okay.
[39:35.920 --> 39:42.640]  Yeah, so I said thank you, but I wanted to link to generative AI and I was asking so when
[39:42.640 --> 39:49.880]  you run that kind of algorithm to detect violence or child abuse or anything else, can you also
[39:49.880 --> 39:55.480]  attach a level of confidence in the response to explain whether it's, well, to define whether
[39:55.480 --> 40:00.560]  it's a potentially fake picture or is there an extension to the algorithm where you can
[40:00.560 --> 40:03.440]  link with the generative AI?
[40:03.440 --> 40:08.440]  I'm not sure about the answer.
[40:08.440 --> 40:16.760]  Sorry, we can go for a beer and I can explain more details and let's see.
[40:16.760 --> 40:20.200]  Yeah, you have a question.
[40:20.200 --> 40:21.200]  Hi.
[40:21.200 --> 40:22.480]  Thank you for the talk.
[40:22.480 --> 40:24.480]  It was very interesting.
[40:24.480 --> 40:25.600]  One more question also.
[40:25.600 --> 40:28.920]  Do you run SSCD in production as well?
[40:28.920 --> 40:31.280]  The deep learning network?
[40:31.280 --> 40:38.120]  If we're using SSCD in production, can't I reply to this question?
[40:38.120 --> 40:42.160]  We use simsearch net plus plus.
[40:42.160 --> 40:47.640]  We use this other one because we have written a blog post about this, so I can confirm that
[40:47.640 --> 40:49.720]  we use simsearch net plus plus.
[40:49.720 --> 40:57.320]  I cannot nor confirm or deny about SSCD, but those are related technologies, so I could
[40:57.320 --> 40:58.320]  talk about that.
[40:58.320 --> 41:02.240]  What does the production stack for simsearch net plus plus look like?
[41:02.240 --> 41:03.240]  How do you serve it?
[41:03.240 --> 41:06.160]  It must be pretty hard to deal with the GPUs.
[41:06.160 --> 41:08.240]  This is not a question that I'm sorry.
[41:08.240 --> 41:10.520]  I cannot talk about the production setups.
[41:10.520 --> 41:12.040]  I'm sorry.
[41:12.040 --> 41:15.880]  Okay, any question nearby?
[41:15.880 --> 41:16.880]  Thank you.
[41:16.880 --> 41:23.120]  Of course, you can imagine that we do not operate in the vacuum, so if you can think
[41:23.120 --> 41:31.840]  about how we serve results from a neural network, it is something perhaps similar to what would
[41:31.840 --> 41:42.400]  you do if you would have to put behind an API a model?
[41:42.400 --> 41:44.680]  So I kind of have two questions.
[41:44.680 --> 41:52.800]  The first question is, to what extent do... I think there are potentially two problems.
[41:52.800 --> 41:59.520]  Intentional mismatches and unintentional mismatches.
[41:59.520 --> 42:05.320]  So situations where perhaps an image has been recompressed or has been cropped, or is perhaps
[42:05.320 --> 42:10.720]  another image of the same situation, versus situations where people have deliberately
[42:10.720 --> 42:14.760]  deformed the image to try and get around these kind of systems.
[42:14.760 --> 42:21.420]  Do you have any idea of how performant it is against the two scenarios of either accidental
[42:21.420 --> 42:26.080]  or unintentional mismatches versus intentionally trying to avoid it?
[42:26.080 --> 42:36.520]  So it is, of course, possible to have unintentional mismatches, and I've seen images that were
[42:36.520 --> 42:42.480]  adversarial engineered to give the same embedding.
[42:42.480 --> 42:48.560]  Those are absolutely possible, again, in PDQ, PDNA, and all the perceptual hashing, which
[42:48.560 --> 42:50.880]  is just a mathematical transformation.
[42:50.880 --> 42:57.320]  You just have to find a way where the input seems the same to the algorithm.
[42:57.320 --> 43:02.560]  For the neural network things, it depends.
[43:02.560 --> 43:09.560]  You can study the code, you can study how it's done, if you can, it is absolutely possible
[43:09.560 --> 43:17.440]  sooner or later, because the adversarial attacker on combinets are a reality, so it's absolutely
[43:17.440 --> 43:18.440]  possible.
[43:18.440 --> 43:26.360]  I've seen some mismatches, but usually two perceptual hashes.
[43:26.360 --> 43:32.080]  Usually the more refined the technique, the harder it is to attack, of course, otherwise
[43:32.080 --> 43:36.520]  we just will stay with MD5, because it will be enough.
[43:36.520 --> 43:37.520]  Crops.
[43:37.520 --> 43:46.240]  PDQ is resistant to crops, SSCD is very resistant to crops.
[43:46.240 --> 43:52.680]  If you have rotations, I believe also PDQ is resistant to rotations, like flips, but
[43:52.680 --> 43:58.280]  you cannot ask much more than that.
[43:58.280 --> 43:59.280]  Other questions?
[43:59.280 --> 44:00.280]  Yeah.
[44:00.280 --> 44:08.720]  Do you have any information about the speed difference between SSCD and PDQ?
[44:08.720 --> 44:16.920]  So the question is whether I have some speed benchmarks for the difference of performance
[44:16.920 --> 44:22.040]  between PDQ and SSCD at inference time.
[44:22.040 --> 44:28.680]  PDQ is faster than your time to read the image from disk.
[44:28.680 --> 44:30.560]  So it's negligible.
[44:30.560 --> 44:31.560]  It will just compute.
[44:31.560 --> 44:33.960]  It's a mathematical transformation on the pixel.
[44:33.960 --> 44:41.120]  The neural network requires a dedicated hardware, if you do that on CPU it will take seconds,
[44:41.120 --> 44:43.960]  also because the model I think is big enough.
[44:43.960 --> 44:49.760]  It's not as big as GPT, but it's a 50 level CNET.
[44:49.760 --> 44:56.520]  So it's of course lower and requires dedicated hardware, but it's more precise.
[44:56.520 --> 45:02.800]  It just finds, SSCD finds anything that PDQ is able to find and much more.
[45:02.800 --> 45:11.640]  So in case, if you are very curious about, sorry, if you are very conscious about, I
[45:11.640 --> 45:17.160]  have to scan this stuff just to make sure they don't come from a ill source.
[45:17.160 --> 45:21.880]  You might want to set up an async process that will take more, but will just batch process
[45:21.880 --> 45:22.880]  all your stuff.
[45:22.880 --> 45:30.280]  If you need a super fast thing, PDQ will not really wait over your server.
[45:30.280 --> 45:33.760]  Thank you.
[45:33.760 --> 45:34.760]  Any other question?
[45:34.760 --> 45:35.760]  Hi.
[45:35.760 --> 45:44.920]  First of all, great question from my former colleague David, I think, down there.
[45:44.920 --> 45:47.280]  Not even looking this way.
[45:47.280 --> 45:53.080]  But what happens if you get a false positive match?
[45:53.080 --> 45:58.680]  How do you disregard that in the future without potentially disregarding a real match?
[45:58.680 --> 46:03.960]  So if we get a false positive match, how do we do to restore?
[46:03.960 --> 46:04.960]  Yeah.
[46:04.960 --> 46:05.960]  How do you restore?
[46:05.960 --> 46:10.000]  You mean or in MEDA all the day?
[46:10.000 --> 46:11.000]  Just anywhere.
[46:11.000 --> 46:12.000]  Like as a concept.
[46:12.000 --> 46:14.360]  So in MEDA, I cannot really say.
[46:14.360 --> 46:21.200]  With the Asher Matcher Actioner, you have the, you should provide a capability to your
[46:21.200 --> 46:27.360]  own platform for which you are soft deleting the image because you have to provide away
[46:27.360 --> 46:34.560]  an API in your platform that HMA will call on, where you say, soft delete this picture.
[46:34.560 --> 46:39.560]  So make it unavailable, but do not really delete it in case you want to appeal.
[46:39.560 --> 46:46.720]  So you need to provide like, undelete and unsoft delete and soft delete.
[46:46.720 --> 46:54.560]  This is the simplest way and most effective way to deal with false positive in case, whoops,
[46:54.560 --> 46:57.560]  I did a mistake, I want to restore the content.
[46:57.560 --> 46:58.560]  Sure.
[46:58.560 --> 47:03.320]  But if you have an image that someone wants to upload, say it's a popular image that a
[47:03.320 --> 47:09.240]  lot of people are going to upload, but it matches a pattern of another bad image, can
[47:09.240 --> 47:15.800]  you auto, is there a good way to make a more precise hash and exclude that and say this
[47:15.800 --> 47:17.200]  one is a false positive?
[47:17.200 --> 47:19.000]  It doesn't match what you think it does.
[47:19.000 --> 47:21.040]  So you don't have to keep undoing the.
[47:21.040 --> 47:22.040]  Okay.
[47:22.040 --> 47:28.240]  So partly if the image is popular, so we have many examples and we have many examples of
[47:28.240 --> 47:33.360]  an image which is not bad and then comes a bad image, whether we can use the fact that
[47:33.360 --> 47:35.960]  it's very widespread to augment our position.
[47:35.960 --> 47:37.960]  Is this the question?
[47:37.960 --> 47:38.960]  Okay.
[47:38.960 --> 47:48.000]  Well, really, there's nothing in this presentation that says these because once you train the
[47:48.000 --> 47:53.560]  network is trained and you start serving and the network will give you the same answers
[47:53.560 --> 47:56.040]  to the same question, to the same query.
[47:56.040 --> 48:02.320]  PDQ or other mathematical algorithm, perceptual algorithm is just a mathematical function so
[48:02.320 --> 48:03.320]  will not change.
[48:03.320 --> 48:05.000]  There's nothing to train.
[48:05.000 --> 48:12.920]  So to change a deficiency of your model, you have to retrain.
[48:12.920 --> 48:19.000]  You can do a better retraining and sometimes model are retrained as anything which is still
[48:19.000 --> 48:20.000]  under maintenance.
[48:20.000 --> 48:25.560]  For example, we get new data, for example, and we might want to retrain as any other
[48:25.560 --> 48:31.400]  model also for the spam filters is the same.
[48:31.400 --> 48:36.920]  Do we have more room for questions?
[48:36.920 --> 48:37.920]  I think it's done.
[48:37.920 --> 48:38.920]  Thank you so much.
[48:38.920 --> 48:39.920]  You'll be a wonderful audience.
[48:39.920 --> 48:45.920]  Thank you.