Okay, so now we can start.
Thank you very much for coming to the Python Dev Room and getting up early on Sunday morning
with this cool weather outside.
So now we are going to have a very, very nice talk by Pierre Denis, who is a long-time
Python user.
He's also the creator of Liya, and he's going to talk about Liya in this talk.
Liya is a Python module for helping to calculate probabilities on situations presenting uncertainties.
And what that means, I hope he's going to explain to us now.
Thank you.
So welcome everybody.
So we are here about something serious, a sad story.
I'm not a good storyteller.
I'm afraid, but okay, Dr. Black has been killed last night.
Maybe you have heard about that.
And okay, we have three suspects that have been identified with given probability to
be the killer.
And it seems that colonel Mustard is the most likely, is most likely the killer with 40%
to be the killer.
Then we have Mrs. Peacock, 25%.
Mrs. White, 10%, and Professor Plum, 25%.
Okay, then these are prior probabilities, but we have the help of a profiler, a segment,
the profiler.
This guy is very smart.
And he can tell, for example, that if Mrs. White is the killer, she'll be absent for
the investigation with 95%.
Otherwise, if she's innocent, she'll be absent with only a probability of 20%.
And the profiler tells you several statements like this with probabilities.
So when you see this kind of situation, you see, okay, it's quite complex.
How can I use this information?
Because nothing is certain.
Okay.
So the investigator is Leah.
Here, Leah is not a person, as you have understood.
It's a module dedicated to probabilities.
So okay, I have several statements here.
In other presentation, I elaborate on this, but this time I prefer to show you Leah in
action so you can better understand what it is about.
My claim is that Leah is something quite easy to use, quite intuitive.
You know probably that there are several packages dedicated to probability or statistics.
The core feature of Leah is to be easy to understand and probably well suited for education.
Okay.
Let's start.
First, I import Leah, which is here in version 401B.
Anyway, so first of all, I want to define a fair coin with head and tail.
I do that.
So Leah can work with any Python object here.
I define probabilities on strings, but you can define probabilities on numbers on any
Python object.
Okay.
Here for education, I prefer to switch to fractions.
You know that Python has fraction included.
So I've switched the display to have fractions.
So if I want to create a biased coin, I can define several values and here it means that
tails will be three times likely to go that then head.
So I have a new probability distribution.
So what I'm doing here is a crash course on Leah because we want to be acquainted to it
before doing the investigation.
I can also use a probability mass function to define probability in a fraction.
So Matplutlib is integrated so you can display your histogram about any probability distribution.
Okay.
So I want to make 100 throws.
So I use my B coin variable, my probability distribution to calculate to make 100 random
coins, throws.
So you see in this random throws that there is more tail than head.
But okay, how can I be sure that it follows the probability that I given?
Simply you can use Leah, the same function as before, Leah, VALS.
You provide the values and this time it will use the random sample as it will be a frequency
counter and you see that here more or less it conforms to the probability distribution
that I provided for the biased coin.
So what is interesting on this kind of object is that you can use many of what you do usually
on your Python object.
For example, you can index.
If I ask for zero, it will take the first letter, head or tail, H or T.
I can chain with the Python lower method and I have lower case H or T.
I can map Python function here.
This means that it count the number of characters which is four, head and tail, four characters
each.
So we have a certain four.
And as you could expect also, all the operators are overloaded.
So if I concatenate my distribution B coin with fixed string, I have a new distribution
that follows what has been defined.
Okay and here it's something a bit funny.
What happens if you multiply a dice with a coin?
You get that.
Okay.
Let's now throw two coins.
So the new method allows you to define new event with the same probability.
Here I define two coins which are biased together.
If I add them together, I have all the combination possible with associated probabilities.
So we will see that this is very important.
We are able to calculate conditional probability with the given method.
So here I try to see, okay, assuming that I know that the first coin is tail, what
is the combination of the two coins?
So here we see that the previous result has been filtered out to get just the two remaining
possibilities.
So it's a common feature of LIA is that when you define variable, there is a kind of lazy
evaluation.
They remain linked together in a network that define the relationship, the dependencies
between the random variables.
Okay.
And you can also define Boolean events like, okay, what is the probability to be?
I define it at 140 seconds.
And then I can use operator like to be or not to be.
And the result is it's true, it's certain true.
Because okay, to be it's either true or false and not to be it's the contrary.
So together it's certainly true.
Okay.
And there is also a dedicated function in LIA which is P. So you can extract the probability
of true.
So you get really a real probability like this.
Okay.
Let's go on.
So here it's an excerpt of a book that it's three centuries old from Abraham de Moivre.
It's probably one of the first problem solved by de Moivre here.
Okay.
Let's ask to find the probability of throwing a nace in three throws given a fair dive.
This is how to calculate in LIA.
So here I define a dive.
I create three instances which are independent, which are assigned to the 1, 2, 3.
And then I ask for the probability of any of one of these dives is a nace.
The result is 91, 216th as calculated three centuries ago by de Moivre.
So far so good.
Okay.
No, I don't know if you like playing a role-playing game.
So there's a small example that where you can use LIA.
So imagine that you have here this dwarf which fights against a troll.
Okay.
I first define a new kind of display with percentage because it's more convenient here.
I define two different kind of dice.
Okay.
Imagine that your attack hole is d25 plus 4.
Okay.
What is the probability to ever hit?
You see, okay, it's easy to calculate with inequality.
So you have to be greater or equal that the troll armor class.
You get this probability.
So the damage of the magic axe is to d6 plus 5.
Here is the result.
But this damage is only applied if the dwarf can hit the troll.
So for that we have a special construction, LIA if underscore underscore to avoid collision
with the Python if.
And okay, this means if there is a hit, then I apply the magic axe.
Otherwise, the damage is zero.
And here is the new histogram.
So this is the probability, the actual damage that is done to the troll.
And then from this data you can answer the, okay, assuming that the troll has 20 health
points remaining was the probability to kill him in four rounds or less.
You see it's deadly simple with this formula to calculate.
We find it's 40%, something like that.
Okay.
Okay.
You follow?
So I will, I have many, many, many examples.
But by lack of time I will drop maybe some of these examples.
Boys or girls paradox, something very funny also that you can find on Wikipedia.
So the chance to be a boy or girl are even.
So okay, boy, one half, girl, one half.
Mr Smith asked two children, at least one of them is a boy.
What is the probability that both children are boys?
Many people and including myself, the first time I heard this I think, okay, the information
give me no clues.
It's one half.
But if you calculate like this with Leah, so you define children as two, a joint of
two children and that you count the number of boys, calculate the conditional probability,
the answer is one third actually.
And what is interesting with Leah, you can understand why this is the answer by asking
Leah to show you all the combinations.
So here I show you the gender of all the children, the number of boys and given that
the number of boys is greater or equal to one.
And we see here the answer is here and we understand better why it is one third.
Okay.
It's a bit fast but you can do it at your own pace later.
Okay.
What happens if you have more elaborate problem?
Like here we have several children.
The eldest is a boy and he's got three brothers at least.
What is the probability that all children are boys?
Okay.
You can model this like this.
Here I create seven children.
And I put, so you see when you read this expression, it's quite close to the initial problem.
Of course you have to understand the elements of Leah to do that but after that it's quite
easy to model.
The answer is one forty second.
Again it's possible to ask why it is so and here by joining you see that's okay, seven
children is this part and the other are this.
So you can better understand why it is so.
Okay.
I will drop this Monte Hall problem which is well known.
You can read that after the session offline.
Okay.
Let's go back to the initial problem.
So okay.
First I change the display options.
So the, first we define the pure probabilities like that.
So here I ask Leah to display the probability in one line because it's more convenient in
this case and as a percentage.
Okay.
So we have like this and we see, okay, colon and mustard.
Our priority is the killer, the most likely the killer.
Okay.
Let's now try to write down the different information we have.
So if Mrs. White is the killer she'll be absent with probability ninety percent.
So I define here a variable.
Mrs. White is absent using the if as we've seen before.
I put the condition if the killer is Mrs. White then she'll be absent with ninety five
percent else twenty five, twenty percent.
Sorry.
Okay.
This is the percentage that Mrs. White is absent.
But it's not very interesting because we, we, we are more interested about who is the
killer but we will see what will happen later.
And then we can continue and define other rules like this.
If Mrs. Peacock is innocent she knows who's the killer with probability seventy five percent.
So you see here there is a missing information which is the else part but we assume that
Mrs. Peacock is not insane and if she's the killer then she knows who's the killer hopefully.
So I put here the else part as one hundred percent.
Okay.
And then we can elaborate on more complex information like this one.
Okay.
I will not detail but you see again it's quite, when you see the statement, the
tradition in LIA it's quite straightforward.
And the last one is here.
So what have we done here is to define what we call the Bayesian network which put the
relation between different random variables.
What is interesting with this kind of network is that if you get evidence about something
you can go backwards and refine the probability to be the killer.
So for that, okay, I define a list of evidence here.
So first of all it's empty and the conditional probability is the same as before because
I have no new evidence.
So imagine now that Mrs. White is absent.
I can add it to the evidence and define a new conditional probability.
So you see it change a bit.
Evidence two added to the previous one.
Mrs. Peacock is drunk.
Okay.
I add this information and I get new probability and so on.
Professor Plum accuses Colin and Mustard.
And finally we know that the killer is a woman.
So for that I use here the Python start with Mrs.
because it's a handy way to say given the suspects that the killer is a woman, I add
it to the evidence like that and you see, okay, there is a new probability.
So there are just two suspects remaining, two women and Mrs.
White is likely the killer.
Okay.
Yeah.
Maybe you can consider this as a game but sometimes probability can play a very important
role in some trials.
So long time ago there was the Dreyfus Affair.
There was a big flow of a so-called expert that makes a mistake in this affair.
And also more recently, Selik Larch case where also there is a bad reasoning about probability.
Okay.
So I want to mention also that Leah is able to do symbolic calculation.
So by using the SIMPY module that maybe you know, so it's very easy.
It's the same interface.
So instead of number you put variable name between quotes like this and you have probability
defined with formula.
So you can redo all the same exercise and you will get formulas to be the killer, etc.
So a small example here.
Okay.
I don't detail here.
It's a binomial function here with P and here I calculate a conditional probability
and it displays me a nice formula.
So you can check offline if you want that it is correct.
Okay.
I want just to finish about my bullshit generator which was made 15 years ago.
So here the goal is to produce sentences at random based on a list of words and a list
of grammar rules like this.
Then you see that I put here a probability on each grammar rule so that the most simple
rule are used preferably to avoid to get to long sentences.
So yeah.
Okay.
I get...
So it has produced...
I don't know why it's...
Okay.
So maybe I don't know what happens but...
Okay.
I restart my kernel.
Normally it's supposed to speak and to write down sentences but...
Okay.
Anyway.
You can play that also.
The Python code is really small so you can try it at yourself.
Oh yeah.
Okay.
Of course I didn't import LIA.
Okay.
That's it.
But anyway, sorry for the small interruptions but I think we don't have time for questions
or...
Maybe one question.
Maybe one question.
Okay.
Thank you for the presentation.
I have indeed one question which is about performance.
So do you have information about performance, your libraries compared to other libraries
or yeah, what are your insights on that?
Yeah.
It's a good question.
So okay.
It's not really the concern.
So here as you have seen the results are exact.
So okay.
As you have seen also it's quite fast.
So there are several optimizations.
I have no figures but okay.
As you expect there are many problems which are very complex and for that LIA provides
Monte Carlo several Monte Carlo algorithm that gives approximate results in a fair time.
Yeah.
But I have no figures.
Yeah.
Okay.
Thank you.
Thank you very much.
Thank you very much.