doc/en_US.ISO8859-1/captions/2009/asiabsdcon/rao-kernellocking-1.sbv

0:00:00.000,0:00:02.740
My name is Attilio Rao and

0:00:02.740,0:00:05.960
I think that we are in time for the presentation

0:00:05.960,0:00:10.870
I want to ask sorry for my English because it's not really British English but I will

0:00:10.870,0:00:12.480
try to make this


0:00:12.480,0:00:16.359
a little bit uncomfortable

0:00:16.359,0:00:21.300
Better?

0:00:21.300,0:00:24.609
Ok.Thank you.So we are going to speak about the


0:00:24.609,0:00:28.639
the locking infrastructure in the FreeBSD kernel
which

0:00:28.639,0:00:33.440
is a bit interesting topic because

0:00:33.440,0:00:38.890
Its going to be with time very widely discussed on our mailing list not only

0:00:38.890,0:00:43.100
from developer's perspective but even from user's perspective.

0:00:43.100,0:00:49.470
and we will see why later

0:00:49.470,0:00:52.990
In this presentation we will specifically see what

0:00:52.990,0:00:55.100
was the situation

0:00:55.100,0:00:57.010
of the first

0:00:57.010,0:00:59.150
FreeBSD implementations

0:00:59.150,0:01:01.120
and what changed from that

0:01:01.120,0:01:06.690
what specifically what's called the SMPng era

0:01:06.690,0:01:07.639
and what

0:01:07.639,0:01:10.500
we had prior that

0:01:10.500,0:01:12.780
 we are going to discuss

0:01:12.780,0:01:13.579
specifically

0:01:13.579,0:01:19.160
locking primitives that has been introduced with time until now

0:01:19.160,0:01:20.910
 and

0:01:20.910,0:01:24.730
problems linked to

0:01:24.730,0:01:27.620
parellelism in general and how we solve that in

0:01:27.620,0:01:30.950
the FreeBSD kernel

0:01:30.950,0:01:36.200
You can see a table of content a little bit more detailed as

0:01:36.200,0:01:39.850
listing precisely what we

0:01:39.850,0:01:43.210
some problems like

0:01:43.210,0:01:46.159
Priority Inheritance

0:01:46.159,0:01:53.159
and Adaptive Spinning that we are going to discuss fruitfullly.

0:01:53.370,0:01:58.890
Mostly until FreeBSD 4.x

0:01:58.890,0:02:00.830
We had already moved to multitasking.

0:02:00.830,0:02:05.210
so the slide is a little bit confusing but
 multitasking and preemptive system

0:02:05.210,0:02:06.360
since

0:02:06.360,0:02:10.379
that transition was not very

0:02:10.379,0:02:14.180
was not very difficult to implement in such systems
 because

0:02:14.180,0:02:17.479
if you can see then our uniprocessor machine

0:02:17.479,0:02:18.929
you can get that

0:02:18.929,0:02:20.019
well

0:02:20.019,0:02:24.029
the sequential execution  was
just

0:02:24.029,0:02:25.699
stopped  by

0:02:25.699,0:02:26.309
preemption

0:02:26.309,0:02:29.400
and by arrival of interrupts

0:02:29.400,0:02:33.969
so you should adjustment in consistency of data structures

0:02:33.969,0:02:36.289
about these two issues

0:02:36.289,0:02:37.079
more precisely

0:02:37.079,0:02:39.079
we were handling

0:02:39.079,0:02:41.779
the interrupts and transitions through

0:02:41.779,0:02:43.370
a mechanism

0:02:43.370,0:02:45.779
called SPL

0:02:45.779,0:02:50.769
and for kernel threads, threads running in the kernel we were disabling

0:02:50.769,0:02:51.379
preemption

0:02:51.379,0:02:53.019
in order to avoid

0:02:53.019,0:02:55.809
the corruption of the data structure

0:02:55.809,0:02:57.519
This approach while was

0:02:57.519,0:03:00.629
pretty good on uniprocessor machines

0:03:00.629,0:03:02.269
was actually

0:03:02.269,0:03:04.270
impredictable for

0:03:04.270,0:03:06.219
the SMP environments

0:03:06.219,0:03:10.199
more precisely because we had more coures that

0:03:10.199,0:03:12.959
was running thread per time

0:03:12.959,0:03:13.909
and so

0:03:13.909,0:03:14.980
parallel

0:03:14.980,0:03:19.309
accesses to the data structures were possible

0:03:19.309,0:03:21.290
in order to

0:03:21.290,0:03:22.469
to avoid

0:03:22.469,0:03:24.149
big problems in the kernel

0:03:24.149,0:03:25.799
we have to just

0:03:25.799,0:03:26.739
allow

0:03:26.739,0:03:28.989
the entering of

0:03:28.989,0:03:32.309
one thread per time into kernel

0:03:32.309,0:03:35.379
while that was a pretty good approach

0:03:35.379,0:03:39.049
for workloads that were nearly user space

0:03:39.049,0:03:40.969
for work loads

0:03:40.969,0:03:45.619
 requiring a lot of IO for example they were wateful because they wasn't

0:03:45.619,0:03:47.839
getting any advantage from the new

0:03:47.839,0:03:49.819
SMP architecture

0:03:49.819,0:03:52.749
like the parallelism was basically zero

0:03:52.749,0:03:55.189
at least in the kernel

0:03:55.189,0:03:55.949
in order

0:03:55.949,0:04:00.650
to fix that a new project was created
called SMP

0:04:00.650,0:04:01.470
New generation


0:04:01.470,0:04:05.169
or NG

0:04:05.169,0:04:07.309
as you can see it from the slide

0:04:07.309,0:04:10.329
the entering in the kernel was preempted

0:04:10.329,0:04:12.569
by using Big Lock

0:04:12.569,0:04:19.569
called BKL basically

0:04:23.199,0:04:28.109
With FreeBSD 5.x we had the SMP new generation project

0:04:28.109,0:04:30.110
basically it was

0:04:30.110,0:04:31.509
a sanitization of the

0:04:31.509,0:04:34.539
of all

0:04:34.539,0:04:40.039
our kernel and the engineering over lot of mechanism inside our kernel. We could see it

0:04:40.039,0:04:44.709
FreeBSD 4.x and FreeBSD 5.x as mainly two different kernels

0:04:44.709,0:04:45.550
because of

0:04:45.550,0:04:50.150
substantial subsystem were rewritten and

0:04:50.150,0:04:51.830
were written with the

0:04:51.830,0:04:56.949
idea to use and implement a real parallelism in mind.

0:04:56.949,0:05:02.610
 we can say that basically it was a major task a very big task

0:05:02.610,0:05:04.029
and that it required

0:05:04.029,0:05:06.669
a lot of years to be brought

0:05:06.669,0:05:08.900
in a good shape at least

0:05:08.900,0:05:11.379
In Italy, the people gave

0:05:11.379,0:05:13.069
a lot of

0:05:13.069,0:05:16.350
complaining about the

0:05:16.350,0:05:20.430
un-robustness of FreeBSD 5.x but

0:05:20.430,0:05:22.249
probably that's because they couldn't even

0:05:22.249,0:05:28.929
see that the changes were really really important and really huge

0:05:28.929,0:05:34.490
however for FreeBSD 5.x based this initial SMP system

0:05:34.490,0:05:37.070
inheriting from BSD/OS

0:05:37.070,0:05:39.309
that kindly

0:05:39.309,0:05:42.699
released this code above that

0:05:42.699,0:05:44.009
and the


0:05:44.009,0:05:46.579
the process was break up in

0:05:46.579,0:05:51.069
some precise tasks at least in Italy

0:05:51.069,0:05:55.429
Mainly the first things was introducing in the kernel

0:05:55.429,0:06:00.180
new set of atomic instruction and locking primitives

0:06:00.180,0:06:01.520
Then introducing

0:06:01.520,0:06:05.380
an abstraction called interrupt threads that we are going to discuss

0:06:05.380,0:06:06.929
rather later but

0:06:06.929,0:06:12.319
it was basically restored completely the interrupt mechanism that was in the FreeBSD 4.x

0:06:14.439,0:06:16.490
the the BKL

0:06:16.490,0:06:19.210
lock was moved to a real

0:06:19.210,0:06:20.679
mutex called Giant

0:06:20.679,0:06:23.180
that still exists in our kernel

0:06:23.180,0:06:26.660
and they were introduced some threading primitives

0:06:26.660,0:06:28.019
the

0:06:28.019,0:06:30.499
like and and on and

0:06:30.499,0:06:32.280
 threading primitives

0:06:32.280,0:06:34.319
called also KSE

0:06:34.319,0:06:37.009
which are actually never used in our kernel

0:06:37.009,0:06:41.620
and that being their being exit out in the past year

0:06:41.620,0:06:43.409
and the

0:06:43.409,0:06:45.259
slowly of the porting of

0:06:45.259,0:06:50.459
all the older subsystems to a finer locking was started

0:06:50.459,0:06:55.919
I have to say this task is not still completed, its still going on but

0:06:55.919,0:06:58.889
we are really good shape about that

0:06:58.889,0:07:02.429
just few subsystems remain which are still Giant protected

0:07:02.429,0:07:05.939
and with new release that we're going to ship this year, I think that we made

0:07:05.939,0:07:10.220
a very huge step forward in this direction

0:07:12.319,0:07:18.599
really the SMPng has been considered closed around the end of

0:07:18.599,0:07:20.600
2007

0:07:20.600,0:07:22.579
but the

0:07:22.579,0:07:23.819
the

0:07:23.819,0:07:27.539
the important parts where this initial moving

0:07:27.539,0:07:32.669
I rather thing that's not listed here but I can tell you is that

0:07:32.669,0:07:38.279
even that if Giant was preventing any parallelism initial parallelism

0:07:38.279,0:07:43.219
that were imported new kernel memory allocator that was

0:07:43.219,0:07:45.009
that I discovered

0:07:45.009,0:07:48.439
and the scheduler was move with a separate lock

0:07:48.439,0:07:50.449
in order to

0:07:50.449,0:07:52.080
start getting some

0:07:52.080,0:07:54.699
a little bit of concurrency

0:07:54.699,0:07:59.099
a real concurrency

0:07:59.099,0:08:01.520
the

0:08:01.520,0:08:06.280
before to speak about FreeBSD specifics we can start digging in about

0:08:06.280,0:08:08.219
what kind of

0:08:08.219,0:08:12.729
of locking primitives you can find in our kernel.

0:08:12.729,0:08:15.780
from a more historical point of view

0:08:15.780,0:08:19.710
we have some versions of mutex which

0:08:19.710,0:08:20.919
I assume

0:08:20.919,0:08:24.809
people here knows about that but I'm going to give a little explanation

0:08:24.809,0:08:26.449
for people that doesn't know

0:08:26.449,0:08:28.939
a mutex is basically

0:08:28.939,0:08:30.739
a lock allowing to access

0:08:30.739,0:08:36.700
to some protected data's thread to just one thread per time

0:08:36.700,0:08:38.150
so if a thread

0:08:38.150,0:08:39.690
owns the lock,

0:08:39.690,0:08:40.760
owns the mutex

0:08:40.760,0:08:42.539
other threads

0:08:42.539,0:08:44.039
won't be able to

0:08:44.039,0:08:46.090
to access to this until

0:08:46.090,0:08:48.730
this lock is released

0:08:48.730,0:08:50.430
we offer even

0:08:50.430,0:08:54.890
some kind of locks called R/W lock Read/Write lock

0:08:54.890,0:08:57.920
which are basically a

0:08:57.920,0:09:03.050
locks that can be acquired in two different versions

0:09:03.050,0:09:04.060
one version

0:09:04.060,0:09:07.980
is the write lock which is the same as the mutex just one

0:09:07.980,0:09:10.010
in the protected part per time

0:09:10.010,0:09:13.860
and other one is the read mode which basically

0:09:13.860,0:09:15.100
allows

0:09:15.100,0:09:18.410
all the thread willing to acquire to read mode to

0:09:18.410,0:09:23.699
concurrently adjust to the structure but prevents the threads from

0:09:23.699,0:09:25.390
writing to the protected path.

0:09:25.390,0:09:28.890
while the reader..while they are readers

0:09:28.890,0:09:30.280
then we have even

0:09:30.280,0:09:33.030
the locks called the Read Mostly

0:09:33.030,0:09:37.570
which are basically the same of Read/Write Locks but are

0:09:37.570,0:09:42.500
they have some optimization in order to make the Read

0:09:42.500,0:09:44.180
part be really fast

0:09:44.180,0:09:46.930
and to have like

0:09:46.930,0:09:48.180
zero overhead

0:09:48.180,0:09:51.410
zero overhead kind of lock

0:09:51.410,0:09:53.350
from the read path while

0:09:53.350,0:09:55.590
probably the write path is even

0:09:55.590,0:09:59.210
heavier than the other one but if you think about cases that

0:09:59.210,0:10:01.710
just


0:10:01.710,0:10:02.750
where

0:10:02.750,0:10:06.980
there are a lot of reader chases and very few writer chases you can find that a

0:10:06.980,0:10:08.220
very useful

0:10:08.220,0:10:11.070
very useful primitive

0:10:11.070,0:10:11.850
then we have

0:10:11.850,0:10:14.360
some form of Wait channels

0:10:14.360,0:10:16.030
Wait channels

0:10:16.030,0:10:17.140
basically are what

0:10:17.140,0:10:21.700
generalizations of what other people con call like

0:10:22.470,0:10:24.240
condition variable and

0:10:24.240,0:10:28.240
they basically let that thread sleep

0:10:28.240,0:10:30.870
under some conditions that are

0:10:30.870,0:10:35.200
that are previously started with some

0:10:35.200,0:10:36.610
some variables

0:10:36.610,0:10:37.150
usually

0:10:37.150,0:10:39.500
having a Wait channel means that its

0:10:39.500,0:10:45.080
chases are controlled through another locking primitive like a mutex

0:10:45.080,0:10:46.640
or R/Wlock

0:10:46.640,0:10:52.010
and so often the Wait channel is associated to its

0:10:52.010,0:10:53.620
to its locking primitive

0:10:53.620,0:11:00.140
usually if you have no necessity to use a Wait channel without a primitive

0:11:00.140,0:11:04.150
a locking primitive you probably have bad code

0:11:04.150,0:11:06.830
but there are some edge cases

0:11:06.830,0:11:09.660
with that seem possible

0:11:09.660,0:11:13.550
As last thing FreeBSD sub primitive counting semaphore

0:11:13.550,0:11:15.290
even if thats considered not featured

0:11:15.290,0:11:17.710
as we are going to see I think they're going to see it and

0:11:17.710,0:11:23.570
its usage is pretty much discouraged

0:11:23.570,0:11:28.320
basically FreeBSD you can consider locking primitive divided into three classes

0:11:28.320,0:11:31.250
three classes of

0:11:31.250,0:11:32.450
of locking

0:11:32.450,0:11:34.090
based mainly in

0:11:34.090,0:11:35.600
particular

0:11:35.600,0:11:37.340
from an outside perspective

0:11:37.340,0:11:38.690
based on the behavior

0:11:38.690,0:11:42.680
the contending threads as you regard of the lock

0:11:42.680,0:11:48.100
for example in case of a mutex you can can get that

0:11:48.100,0:11:53.360
spinning and blocking mutex do very different things about the contenders

0:11:53.360,0:11:59.680
as we are going to see more of this later

0:11:59.680,0:12:03.410
usually in the traditional literature,

0:12:03.410,0:12:05.430
there are just two

0:12:05.430,0:12:07.280
cases of the lock classes mainly

0:12:07.280,0:12:08.620
you will find the

0:12:08.620,0:12:11.200
spinning lock and the blocking lock

0:12:11.200,0:12:14.370
or what they called the sleeping lock

0:12:14.370,0:12:16.670
the I think that

0:12:16.670,0:12:21.020
as we're going to see why we have three types I think that things will be clear but

0:12:21.020,0:12:27.100
if you have any questions please ask us. Thats not a problem

0:12:27.100,0:12:29.930
Spinning primitives as I told you

0:12:29.930,0:12:32.810
allows the contesting thread to

0:12:32.810,0:12:36.120
to check the status of the lock periodically

0:12:36.120,0:12:37.590
and the

0:12:37.590,0:12:40.420
and they just do busy waiting around

0:12:40.420,0:12:41.890
 the locking variable

0:12:41.890,0:12:46.400
as the spinning primitive FreeBSD just offers mutex

0:12:46.400,0:12:50.689
What are the problems linked with  this kind of, with this class

0:12:50.689,0:12:53.869
of locks? Mainly its that CPU

0:12:53.869,0:12:58.130
remains busy without doing really nothing useful

0:12:58.130,0:12:59.740
it happens

0:12:59.740,0:13:03.620
that if several threads contest on the

0:13:03.620,0:13:04.870
on the locks

0:13:04.870,0:13:08.210
basically they share the same cache line where the lock is

0:13:08.210,0:13:10.220
where the lock is

0:13:10.220,0:13:12.400
that means that

0:13:12.400,0:13:17.470
contesting or sharing a cache line is a lot underlying activity

0:13:17.470,0:13:20.150
on a lot of architectures like for example

0:13:20.150,0:13:23.660
having a lot of snoop messages between CPUs

0:13:23.660,0:13:26.450
and some buses

0:13:26.450,0:13:28.120
some buses traffic

0:13:28.120,0:13:31.980
which means in a variety operations

0:13:31.980,0:13:35.740
and the last things even the most important you can note is that interrupts

0:13:35.740,0:13:37.120
are disabled

0:13:37.120,0:13:39.330
while spin locks are held

0:13:39.330,0:13:40.810
that was

0:13:40.810,0:13:42.979
that happens mainly because there are

0:13:42.979,0:13:45.140
there were identified in the past by some

0:13:45.140,0:13:47.970
kind of deadlocks possible

0:13:47.970,0:13:50.180
if you were going to lead

0:13:50.180,0:13:51.710
the spin locks

0:13:51.710,0:13:55.900
the interrupts enabled while holding a spin lock. In particular

0:13:55.900,0:13:58.180
you could find that there are

0:13:58.180,0:14:02.530
some problems with the interrupts angling good in the botom half that was

0:14:02.530,0:14:05.040
going to  deadlock

0:14:05.040,0:14:10.250
Its not very simple to understand the thing so I've left out

0:14:10.250,0:14:12.360
but if you want to know

0:14:12.360,0:14:15.990
we could speak later probably

0:14:17.820,0:14:21.320
with spinning primitives we are even blocking primitives

0:14:21.320,0:14:22.890
blocking primitives

0:14:25.260,0:14:26.860
allows the

0:14:26.860,0:14:28.440
basically the contenders to be

0:14:28.440,0:14:30.980
descheduled from the runqueue

0:14:30.980,0:14:35.790
to be put on  another kind of container

0:14:35.790,0:14:38.000
put on another kind of container

0:14:38.000,0:14:40.489
and  basically

0:14:40.489,0:14:41.399
context switch immediately

0:14:41.399,0:14:44.360
immediately.

0:14:44.360,0:14:49.440
then we put again on runqueue of the scheduler just once the just when the owner

0:14:49.440,0:14:51.570
 is going to release the lock

0:14:51.570,0:14:53.260
and it will be the owner

0:14:53.260,0:14:56.930
the owner that was going to

0:14:56.930,0:15:00.310
do all the operations about that

0:15:00.310,0:15:05.550
we have several primitives implemented as blocking primitives like mutexes

0:15:05.550,0:15:10.470
R/W locks and R-M locks

0:15:11.430,0:15:13.140
with

0:15:13.140,0:15:16.890
basically with

0:15:16.890,0:15:21.780
blocking primitives we have a lot of advantages over the spinning mutex

0:15:21.780,0:15:24.650
like having the contenders

0:15:24.650,0:15:26.560
that

0:15:26.560,0:15:27.590
that sleeps

0:15:27.590,0:15:31.840
or that blocks avoids CPU busyness

0:15:31.840,0:15:34.660
and mainly we can leave the

0:15:34.660,0:15:37.150
we can leave the

0:15:37.150,0:15:42.040
we can leave that basically the interrupts out

0:15:42.040,0:15:45.760
that happens mainly because the interrupts code is just allowed

0:15:45.760,0:15:50.710
 at least the bottom of one is just allowed

0:15:50.710,0:15:52.070
to use spin locks

0:15:52.070,0:15:56.049
 probably if   it was going to use  blocking primitives

0:15:56.049,0:16:01.060
we wouldnt have been able to disable interrupts here

0:16:01.060,0:16:02.239
There are however some

0:16:02.239,0:16:04.790
big drawbacks that as you will see

0:16:04.790,0:16:07.210
we handle in FreeBSD

0:16:07.210,0:16:11.280
in order to make the blobking primitives our

0:16:11.280,0:16:13.540
how could I tell

0:16:13.540,0:16:16.440
the first choice in terms of blocking

0:16:16.440,0:16:19.690
where the problem called Priority Inversion

0:16:19.690,0:16:21.899
 and we have

0:16:21.899,0:16:27.589
the problem that context switches are very heavy in particular

0:16:27.589,0:16:30.209
on machines that FreeBSD uses as referral

0:16:30.209,0:16:33.500
like E38 and the MD64

0:16:33.500,0:16:37.940
but as you're going to see we've used two techniques in order to

0:16:37.940,0:16:40.020
to cope with that

0:16:42.020,0:16:45.830
another thing is that while you can't

0:16:45.830,0:16:47.920
allow

0:16:47.920,0:16:50.089
context switches while having

0:16:50.089,0:16:52.570
while holding spin lock

0:16:52.570,0:16:55.249
it's obvious  you can't

0:16:55.249,0:16:59.580
acquire a locking primitive while holding a spin lock

0:16:59.580,0:17:02.110
that's an important rule in FreeBSD

0:17:02.110,0:17:06.089
that sometimes its confused and often its not

0:17:06.089,0:17:07.470
observed

0:17:07.470,0:17:09.929
that leads to block refusal

0:17:12.170,0:17:16.610
usually you will always prefer a blocking primitive for a spin lock

0:17:16.610,0:17:22.159
if not in some very particular condition like what

0:17:22.159,0:17:25.010
Alrick said about the interrupt and even

0:17:25.010,0:17:26.090
about the

0:17:28.160,0:17:30.570
some parts that are very very short

0:17:30.570,0:17:33.629
we should have some example in the kernel even if I can

0:17:33.629,0:17:35.390
I can tell you one right now

0:17:35.390,0:17:38.770
I have no idea actually

0:17:38.770,0:17:39.500
so that

0:17:39.500,0:17:43.740
we're going to see the problemslinked with the blocking primitives the first one is

0:17:43.740,0:17:45.679
called Priority Inversion

0:17:45.679,0:17:46.389
basically

0:17:46.389,0:17:49.130
it could happen that like a thread A

0:17:49.130,0:17:51.410
which has a priority

0:17:51.410,0:17:55.380
owns a lock.  call it L for example

0:17:55.380,0:17:58.710
then another thread with another priority than this one

0:17:58.710,0:18:00.690
locks on this lock

0:18:00.690,0:18:03.299
what happens is that the second thread

0:18:03.299,0:18:04.120
the thread B

0:18:04.120,0:18:05.870
for example

0:18:05.870,0:18:08.920
will need to wait for a lower priority thread

0:18:08.920,0:18:13.070
to finish its work load

0:18:13.070,0:18:15.120
we

0:18:15.120,0:18:17.780
solve this problem actually in the

0:18:17.780,0:18:21.170
kernel using a technique called priority propagation

0:18:21.170,0:18:22.020
basically

0:18:22.020,0:18:24.620
what happens is that priority of thread B

0:18:25.760,0:18:27.880
is lent to thread A

0:18:27.880,0:18:31.460
until it doesn't release the lock

0:18:31.460,0:18:34.760
of its directly implemented in the container

0:18:34.760,0:18:36.180
the turnstiles

0:18:37.870,0:18:39.530
while that could be done

0:18:39.530,0:18:44.290
even on the primitive it has been much convenient to use the container for

0:18:44.290,0:18:45.190
that

0:18:45.190,0:18:45.990
because

0:18:45.990,0:18:52.990
it was going to offer some advantage we are going to see right now

0:18:53.030,0:18:54.240
just note that

0:18:54.240,0:18:56.090
Read locks

0:18:56.090,0:18:57.310
cannot support

0:18:57.310,0:19:03.430
priority propagation fixes for read lock that happens because you'd like to

0:19:03.430,0:19:07.290
the turnstile should keep track of all the readers

0:19:07.290,0:19:11.100
and these would be very very expensive from

0:19:11.100,0:19:12.880
from a

0:19:12.880,0:19:15.540
from a point of view of the overhead

0:19:15.540,0:19:19.800
and even I think I've tried to do something in this regard and I

0:19:19.800,0:19:24.050
saw that there was some races that were trying to

0:19:24.050,0:19:29.390
acquire a spin lock as base even in fast path so it was a

0:19:29.390,0:19:31.320
an impredicable way

0:19:31.320,0:19:32.380
I will tell

0:19:32.380,0:19:37.200
at least for what we found so far

0:19:37.200,0:19:37.630
basically

0:19:37.630,0:19:39.070
 what happens

0:19:39.070,0:19:42.150
about the priority propagation is that the

0:19:42.150,0:19:44.830
the threads and the turnstiles

0:19:44.830,0:19:47.000
are chained together

0:19:47.000,0:19:48.350
the thread

0:19:48.350,0:19:50.970
owns the a pointer

0:19:50.970,0:19:53.710
to wrench the turnstile is sleeping on

0:19:53.710,0:19:58.540
and the turnstile owns a pointer above

0:19:58.540,0:20:00.549
the owner of the lock

0:20:00.549,0:20:04.620
what happens is that for example in this case we have

0:20:05.080,0:20:08.070
a sleeper which is going to sleep on a turnstile

0:20:08.070,0:20:08.990
the first lock

0:20:08.990,0:20:13.470
which has a priority of one hundred and twenty eight

0:20:14.120,0:20:15.520
the turnstile

0:20:15.520,0:20:18.370
to the pointer

0:20:18.370,0:20:20.570
ts_owner knows which is its owner

0:20:20.570,0:20:26.150
and this owner has a priority of two hundred and fifty six

0:20:26.150,0:20:31.120
well as you know higher level, higher value means lower priority. so if this is
0:20:31.120,0:20:34.960
a suitable pace for priority propagation

0:20:34.960,0:20:40.820
but what happens is that this owner is actually sleeping on another turnstile

0:20:40.820,0:20:43.419
and the other owner

0:20:43.419,0:20:48.820
of the second turnstile has always the same priority of its sleepers

0:20:48.820,0:20:50.750
so

0:20:50.750,0:20:55.530
just propagating priority to the first owner was just unuseful because the first

0:20:55.530,0:20:56.340
one

0:20:56.340,0:20:57.320
could

0:20:57.320,0:20:58.760
still

0:20:58.760,0:21:00.580
 keep the chain to a

0:21:00.580,0:21:04.820
lower priority so it's was going to be propagated to the first one

0:21:04.820,0:21:07.679
 actually running

0:21:07.679,0:21:09.870
owner of the chain

0:21:09.870,0:21:14.670
this is the situation after the propagation as you can see all of threads in the chain

0:21:14.670,0:21:16.559
has the same priority

0:21:16.559,0:21:17.950
either possible

0:21:17.950,0:21:24.480
in this case the one the last one arriving

0:21:25.750,0:21:31.720
there are question about that

0:21:31.720,0:21:34.780
no?

0:21:34.780,0:21:36.760
yeah when the

0:21:36.760,0:21:39.720
when the for example the third owner

0:21:39.720,0:21:41.679
the second owner there

0:21:41.679,0:21:43.659
when it goes to release the lock

0:21:43.659,0:21:47.010
it basically brings back the priority to the

0:21:47.010,0:21:49.340
to the

0:21:49.340,0:21:52.490
twenty hundred and sixty five to all the chains

0:21:52.490,0:21:54.650
he is responsible for

0:21:54.650,0:22:01.179
so it just happens at locking operation

0:22:01.179,0:22:04.159
and that is what we do about the Priority Inversion

0:22:04.159,0:22:09.970
inorder to fix instead the overhead given by the

0:22:09.970,0:22:14.030
big amount of context switch we use another technique called adaptive spinning

0:22:14.030,0:22:16.030
basically

0:22:16.030,0:22:20.260
as the context switch brings a lot of overhead

0:22:22.310,0:22:26.090
we prefer to not do

0:22:26.090,0:22:27.770
completely a context switch

0:22:27.770,0:22:30.760
in the case the lock owner is still running

0:22:30.760,0:22:32.190
on a runqueue

0:22:32.190,0:22:38.340
because there are very good chance that the owner is going to release the lock very early

0:22:40.440,0:22:43.990
that means that for example

0:22:43.990,0:22:46.070
we choose just to spin

0:22:46.070,0:22:49.149
in order to wait that the state of the

0:22:49.149,0:22:52.240
lock changed or the state of the owner

0:22:52.240,0:22:57.660
was going to change like the owner going to sleep on another turstile

0:22:57.660,0:22:59.140
and the

0:22:59.140,0:23:03.270
basically we, there have been very big measurement even in the

0:23:03.270,0:23:07.510
another operating system like solice that

0:23:07.510,0:23:12.300
where I think we brought in this approach the first time

0:23:12.300,0:23:16.430
that we're we're showing

0:23:16.430,0:23:23.430
a very big improvement in performance from this technique

0:23:25.790,0:23:30.640
apart from the two types of primitives, these are sleeping primitives

0:23:30.640,0:23:36.120
now there is a consideration we have to make about that

0:23:36.120,0:23:38.110
basically sleeping primitives

0:23:38.110,0:23:42.320
should be in theory just the

0:23:42.320,0:23:44.340
the wait channels

0:23:44.340,0:23:49.170
wait channels should have been the only one implemented using the

0:23:49.170,0:23:50.630
container called

0:23:50.630,0:23:52.760
sleepqueue

0:23:52.760,0:23:53.910
but

0:23:53.910,0:23:56.170
due to some legacy

0:23:56.170,0:24:01.000
the actually the sleepqueues were used to implement other kind of other

0:24:01.000,0:24:03.290
kinds of lock like the

0:24:03.290,0:24:04.219
lockmgr

0:24:04.219,0:24:08.080
and the sx locks and the

0:24:08.080,0:24:11.100
basically the

0:24:11.100,0:24:13.679
semaphore's condvars too

0:24:13.679,0:24:16.010
that has been this is

0:24:16.010,0:24:18.809
going to give some problems actually

0:24:18.809,0:24:19.350
because

0:24:20.450,0:24:24.820
as we're going to see

0:24:24.820,0:24:26.889
and as you can see on the line too

0:24:26.889,0:24:27.929
in the FreeBSD

0:24:27.929,0:24:31.600
while sleeping threads should not hold any kind of lock

0:24:31.600,0:24:33.809
neither blocking nor spinning

0:24:33.809,0:24:36.770
thats a simple thing to explain

0:24:36.770,0:24:40.200
we just want to enforce very

0:24:40.200,0:24:43.490
we just want to enforce

0:24:43.490,0:24:46.060
correct semantics of locking

0:24:46.060,0:24:47.880
so imagine to keep a lock

0:24:47.880,0:24:50.190
a blocking primitive while

0:24:50.190,0:24:50.729
sleeping

0:24:50.729,0:24:53.010
it's going to waste a lot of time

0:24:53.010,0:24:56.530
because all the contenders are going to

0:24:56.530,0:24:58.760
are going to start on the

0:24:58.760,0:25:01.400
lock owner which is sleeping

0:25:01.400,0:25:03.120
basically in fact what

0:25:03.120,0:25:07.169
as you should know condition variables do usually is to drop the lock

0:25:07.169,0:25:11.070
once it was passed to the primitives

0:25:11.070,0:25:12.380
in this case

0:25:14.170,0:25:18.249
basically  we just dont allow that this means that's the

0:25:18.249,0:25:23.160
the same conditions happens even for other kinds of lock

0:25:23.160,0:25:25.540
lockmgr and the sx lock

0:25:25.540,0:25:26.860
so you can't hold

0:25:26.860,0:25:29.410
a mutex for example

0:25:29.410,0:25:33.640
of blocking mutex an R/W lock while trying to acquire

0:25:33.640,0:25:38.559
a lockmgr and sx

0:25:38.559,0:25:41.850
this is going to create some problems because

0:25:41.850,0:25:46.830
in some parts that is unavoidable so you have to drop the lock for example and try

0:25:46.830,0:25:48.190
to acquire

0:25:48.190,0:25:49.770
the other primitive

0:25:49.770,0:25:51.320
which is going to

0:25:53.400,0:25:59.110
and so can create some raisee problems

0:26:00.130,0:26:04.779
as the sleepqueues are born just to serve wait channels

0:26:04.779,0:26:09.190
 they don't track owner too so they dont care about priority propagation and priority inversion problem

0:26:09.190,0:26:14.430
just because sleepqueues entirely should not have work

0:26:14.430,0:26:20.150
so for example lockmgr and sx have not priority propagation

0:26:20.150,0:26:22.360
systems and the

0:26:22.360,0:26:29.360
so they are discouraged to be used even for this thing mainly

0:26:31.590,0:26:34.930
sure

0:26:36.780,0:26:39.000
it's you mean why it's not

0:26:39.000,0:26:41.790
why doesn't blocking primitives exist yeah?

0:26:41.790,0:26:44.250
so imagine that for example the

0:26:44.250,0:26:45.570
you have a  wait channel

0:26:45.570,0:26:47.679
condvar a condition variable

0:26:47.679,0:26:50.950
or M sleep

0:26:50.950,0:26:52.090
M sleep

0:26:52.090,0:26:54.910
the primitive that allows you to sleep on

0:26:54.910,0:26:57.850
a condition variable for example

0:26:57.850,0:26:58.870
however

0:27:00.510,0:27:02.270
the you are

0:27:02.270,0:27:03.350
using the blocking

0:27:03.350,0:27:06.930
the using the turnstile you will go to a

0:27:06.930,0:27:12.110
always the mechanism of priority propagation and priority inversion handling.Its

0:27:12.110,0:27:13.760
not very

0:27:13.760,0:27:14.970
it's pretty

0:27:14.970,0:27:17.320
it's not a simple operation

0:27:17.320,0:27:20.219
it acquires even some kind of spin locks

0:27:20.219,0:27:22.650
 in order to avoid some raises

0:27:22.650,0:27:23.340
and so

0:27:23.340,0:27:24.289
it

0:27:24.289,0:27:26.590
so it has an overhead

0:27:26.590,0:27:31.770
if you do in this case it will be not to be useful it will be completely unuseful to have

0:27:31.770,0:27:34.159
a mechanism like that so

0:27:34.159,0:27:37.410
in theory if you just would have used

0:27:37.410,0:27:41.320
a sleeping the sleepqueue for wait channels

0:27:41.320,0:27:42.990
you are to add

0:27:42.990,0:27:46.640
bigperformance boost than just using the turnstile

0:27:46.640,0:27:49.330
for the same problem

0:27:49.330,0:27:51.310
in theory

0:27:51.310,0:27:54.780
but what happened is that other locks are implementedo

0:27:54.780,0:27:55.839
using this sleepqueue

0:27:55.839,0:27:58.070
that should have not be happened

0:27:58.070,0:27:59.260
on the principle

0:27:59.260,0:28:02.960
really I'm not sure who introduced the sx lock

0:28:02.960,0:28:04.440
I'm actually not sure

0:28:04.440,0:28:06.280
and even the lockmgr

0:28:06.280,0:28:09.870
but

0:28:09.870,0:28:12.340
however

0:28:12.340,0:28:17.669
as you could have seen before the three containers create a hierarchy that

0:28:17.669,0:28:20.090
should not be broken like

0:28:20.090,0:28:21.639
you have spinqueues

0:28:21.639,0:28:26.900
you have spin locks you have blocking primitives and  sleeping primitives and

0:28:26.900,0:28:31.470
you cannot acquire you cannot mix them there are precise rules like

0:28:31.470,0:28:33.710
on the top the sleeping primitive

0:28:33.710,0:28:37.690
in the mid  the blocking primitive and in the end  the spinning primitive

0:28:38.900,0:28:44.440
the main choice will be to use blocking primitives always

0:28:44.440,0:28:48.240
because as you can see we handled a lot of problem that they have

0:28:48.240,0:28:49.659
and the practice

0:28:49.659,0:28:52.229
they have proven to be very

0:28:52.229,0:28:53.799
very helpful

0:28:53.799,0:28:54.999
but sometimes

0:28:56.789,0:28:58.790
some nasty conditions can happen

0:28:58.790,0:29:02.900
for example one of the most widespread is the

0:29:02.900,0:29:06.350
using a mallok with a flag M_WAITOK

0:29:06.350,0:29:11.240
in FreeBSD that means that if the allocator is pretty busy or going to

0:29:11.240,0:29:12.680
 to sleep

0:29:12.680,0:29:15.760
in order to retrieve your memory

0:29:15.760,0:29:17.890
and if you do with a lock hold

0:29:17.890,0:29:22.080
you're going to violate one of our rules and its not

0:29:22.080,0:29:23.440
possible

0:29:23.440,0:29:25.320
another one is just we just

0:29:25.320,0:29:28.299
said before like call a sleeping lock while

0:29:28.299,0:29:32.090
holding a blocking primitive

0:29:33.390,0:29:37.530
in the next example in the next I'm going to show you a way to

0:29:37.530,0:29:41.140
to handle for example the Mallock case

0:29:41.140,0:29:42.520
and similar

0:29:42.520,0:29:45.000
but the that usually

0:29:46.830,0:29:47.620
usually that

0:29:47.620,0:29:49.980
are not very common cases

0:29:49.980,0:29:52.920
at least for simple parts

0:29:52.920,0:29:56.280
you should even try to avoid the

0:29:56.280,0:30:03.280
the

0:30:04.620,0:30:06.180
yes

0:30:06.180,0:30:07.050
even in the

0:30:07.050,0:30:09.120
in the

0:30:09.120,0:30:10.220
wait channel

0:30:10.220,0:30:14.530
as in the FreeBSD you can differentiate between the condition variables and

0:30:14.530,0:30:15.720
Msleep

0:30:15.720,0:30:17.510
 usually Msleep was

0:30:17.510,0:30:22.210
really Msleep was introduced as the first primitive

0:30:22.210,0:30:26.190
but it has an interface very very difficult to

0:30:26.190,0:30:28.460
 to make saner and to understand

0:30:28.460,0:30:30.470
at least for

0:30:30.470,0:30:31.220
for people

0:30:31.220,0:30:32.120
which are

0:30:32.120,0:30:34.960
comfortable with

0:30:34.960,0:30:39.260
with interface of condition variable that we all saw but they are

0:30:39.260,0:30:40.649
newer primitive

0:30:40.649,0:30:42.660
mainly there is

0:30:42.660,0:30:44.400
so far the newer code

0:30:44.400,0:30:46.960
what you should do is just to

0:30:46.960,0:30:49.000
use condition variables

0:30:49.000,0:30:50.659
and not Msleep

0:30:50.659,0:30:51.630
basically

0:30:51.630,0:30:56.220
Msleep should be dropped off but they have  avery nice feature which

0:30:56.220,0:31:02.669
is the possibility to specify a wake up priority on the sleeping threads

0:31:02.669,0:31:04.740
once they are asleep

0:31:04.740,0:31:07.470
that condvar still doesn't

0:31:07.470,0:31:12.430
maybe if we could port these features to the condition variables we we will be able

0:31:12.430,0:31:13.659
to completely drop off Msleep

0:31:13.659,0:31:18.529
from the work arena

0:31:18.529,0:31:20.450
this is a

0:31:20.450,0:31:25.580
simple case that  it's going to show a way to

0:31:26.620,0:31:30.670
a simple way to deal with the for example

0:31:30.670,0:31:34.100
condition I told before the Mallock willing to

0:31:34.100,0:31:35.390
to sleep

0:31:35.390,0:31:38.260
and the doing that while holding a lock

0:31:38.260,0:31:45.070
as you see we have some fake C as some members like flags

0:31:45.070,0:31:47.659
and an object called instructful

0:31:47.659,0:31:49.940
which  needs to be allocated

0:31:49.940,0:31:54.400
and that they are protected by an internal lock

0:31:54.400,0:31:58.810
you imagine that for example the fake C create

0:31:58.810,0:32:02.269
holds lock of the object and does some things

0:32:02.269,0:32:04.460
which are not important

0:32:04.460,0:32:07.650
then in the end for example it's going to

0:32:07.650,0:32:09.170
to allocate

0:32:09.170,0:32:14.110
the FC object and that should be protected in

0:32:14.110,0:32:16.470
in anatomic part

0:32:16.470,0:32:20.030
something you can do is just to set the flag

0:32:20.030,0:32:22.160
for that

0:32:22.160,0:32:22.730
saying

0:32:22.730,0:32:28.460
the allocation is going to happen if you're adjust to this structure concurrently

0:32:28.460,0:32:29.899
just keep the allocation

0:32:29.899,0:32:31.500
and that's what we do

0:32:31.500,0:32:32.919
we check for this flag

0:32:32.919,0:32:37.969
and if its present it means that another thread is still

0:32:37.969,0:32:40.149
is already allocating and we just keep

0:32:40.149,0:32:46.360
so otherwise we set it and then we have locked the mutex

0:32:46.360,0:32:49.100
then we allocate the memory for the

0:32:49.100,0:32:50.610
for the object

0:32:50.610,0:32:52.450
acquire again the lock

0:32:52.450,0:32:54.860
and we simply have seen

0:32:54.860,0:33:00.200
please note that Ive used the temporary storage for that in order to make

0:33:00.200,0:33:01.830
some search on

0:33:01.830,0:33:03.280
like the MS

0:33:03.280,0:33:04.180
about the

0:33:04.180,0:33:05.500
the pointer

0:33:05.500,0:33:10.700
it was just a tricky note that you verify that really the structure was not

0:33:10.700,0:33:14.330
really allocated

0:33:14.330,0:33:16.600
and so that we can get some

0:33:16.600,0:33:21.870
kind of session about that

0:33:22.640,0:33:26.340
one of the biggest innovation that was brought to FreeBSD

0:33:26.340,0:33:30.120
about the locking primitive about the locking primitives

0:33:30.120,0:33:33.770
are the interrupts that

0:33:34.640,0:33:36.850
mainly

0:33:36.850,0:33:40.820
this is pretty simple to explain maybe

0:33:40.820,0:33:44.070
As the top half remains basically the same

0:33:44.070,0:33:49.790
and was going to handle the ISR for the interrupt line for example

0:33:49.790,0:33:54.330
the bottom half changed set and running the interrupts

0:33:54.330,0:33:58.700
handler is solid on that line as it was traditionally happened

0:33:58.700,0:34:02.140
it was going just to schedule a thread

0:34:02.140,0:34:04.980
that was going to run the

0:34:04.980,0:34:06.940
the interrupt handler in a

0:34:06.940,0:34:12.389
--- context and not the kind of --it was going to happen

0:34:12.389,0:34:15.509
traditionally in a lot of unique system

0:34:16.699,0:34:23.179
this has the big advantage that in using your own context you can

0:34:23.179,0:34:24.429
basically

0:34:24.990,0:34:29.889
you're not forced to use spin locks and you can do a lot of other fancy things

0:34:29.889,0:34:32.209
this necesity came over because

0:34:32.209,0:34:33.149
often

0:34:33.149,0:34:38.529
interrupts handlers needs to adjust to some

0:34:38.529,0:34:42.589
needs to adjust to some subsystem locks and the

0:34:42.589,0:34:45.799
as we were going to use blocking ---around

0:34:45.799,0:34:50.379
we had the necessity to support the

0:34:50.379,0:34:52.589
the locking of the

0:34:52.589,0:34:57.119
the possibilities of wide mutex actually

0:34:57.559,0:35:01.759
A similar thing was implemented using taskqueues

0:35:01.759,0:35:02.879
previously

0:35:02.879,0:35:04.010
and the sometimes it

0:35:04.010,0:35:05.740
I think I saw a lenux too

0:35:05.740,0:35:08.439
using taskqueues maybe

0:35:08.439,0:35:10.029
but the

0:35:10.029,0:35:14.709
it was basically something similar but not exactly in this way

0:35:14.709,0:35:16.809
a actually FreeBSD

0:35:16.809,0:35:20.559
 from the release seven

0:35:20.559,0:35:22.579
the interrupt threads

0:35:22.579,0:35:24.659
are this model is a little bit changed

0:35:24.659,0:35:26.499
in order to include the

0:35:26.499,0:35:29.739
a new mechanism called the filtering

0:35:29.739,0:35:36.249
we have interrupt filters that basically if set then directly

0:35:36.249,0:35:39.809
directly

0:35:39.809,0:35:40.879
schedule the thread

0:35:40.879,0:35:43.209
linked to the parked line

0:35:43.209,0:35:46.619
they just check for

0:35:46.619,0:35:50.939
they just let run some new thing in the kernel or context

0:35:50.939,0:35:52.449
that will decide if

0:35:52.449,0:35:56.709
 handle directly to requests or just schedule the kernel

0:35:56.709,0:35:59.739
it's like if you have the old bottom handler

0:35:59.739,0:36:04.529
that add the possibility to register a handler

0:36:04.529,0:36:08.869
still running in interrupt context and at the same time

0:36:08.869,0:36:12.009
decide if scheduled or not

0:36:12.009,0:36:14.499
so that it's no

0:36:14.499,0:36:18.579
no more madatory

0:36:18.579,0:36:22.919
So I think that the first part is going to finish so if you have some questions we can

0:36:22.919,0:36:23.430
handle

0:36:23.430,0:36:28.699
 right now

0:36:28.699,0:36:35.699
this should be material for the second part actually

0:36:45.279,0:36:48.529
a new bus for example

0:36:48.529,0:36:51.259
some

0:36:51.259,0:36:55.769
some drivers  that kind of a frequently used I'm not sure but which ones but all

0:36:55.769,0:37:00.049
the big ones are compared to finer locking

0:37:00.049,0:37:03.109
%um

0:37:03.109,0:37:07.479
actually the problem is not which parts are under Giant

0:37:07.479,0:37:08.530
well how we could

0:37:08.530,0:37:12.380
optimize the locking of some subsystems  because

0:37:12.380,0:37:15.079
for example we have to virtual memory

0:37:15.079,0:37:17.910
which is not on the Giant but its

0:37:17.910,0:37:19.719
not locate

0:37:19.719,0:37:24.400
optimally and it's going to bring a lot of contention

0:37:24.400,0:37:26.230
so

0:37:26.230,0:37:30.329
it's not under Giant but it should be optimized

0:37:30.329,0:37:37.329
 because the parts under Giant are very tiny.New bus for example

0:37:37.599,0:37:44.599
some parts relating to the VFS on the mounting
but yet a very short parts

0:37:44.979,0:37:51.979
I'm not sure about others

0:37:57.479,0:37:59.170
sorry

0:38:02.069,0:38:08.549
well usually it should be moved completely but

0:38:08.549,0:38:11.019
yes

0:38:11.019,0:38:12.539
it could

0:38:32.909,0:38:34.809
okay although

0:38:34.809,0:38:38.289
in the kernel we have a basically

0:38:38.289,0:38:39.450
%um

0:38:39.450,0:38:43.019
as you should know we already imported the trays for example

0:38:43.019,0:38:47.839
and I have wondered, I have submitted by developed

0:38:47.839,0:38:48.669
my country

0:38:48.669,0:38:51.479
called ---some patches that brings the

0:38:51.479,0:38:54.689
the ----- directly in our locking

0:38:54.689,0:38:55.699
in order to

0:38:55.699,0:38:58.890
allow it to be tracked with the trace.

0:38:58.890,0:39:02.009
which is very nice but it's still not completed

0:39:02.009,0:39:03.310
we are reviewing

0:39:03.310,0:39:08.309
above that we have a very the other useful tool called the lock profiling

0:39:08.309,0:39:12.039
that has been very helpful in the past in order to

0:39:12.039,0:39:14.110
find the most contended lock

0:39:14.110,0:39:17.469
and the to try to propose them to finer locking

0:39:17.469,0:39:20.589
so at least for the kernel we have such mechanism

0:39:20.589,0:39:22.719
I'm not sure what should

0:39:22.719,0:39:26.640
have been the user space.I'm sure we've not something similar

0:39:26.640,0:39:28.310
but maybe other systems

0:39:28.310,0:39:29.469
have

0:39:29.469,0:39:30.749
similar tools

0:39:30.749,0:39:36.039
I don't know I just know FreeBSD so

0:39:58.479,0:39:59.220
not sure

0:39:59.220,0:39:59.919
would you repeat

0:39:59.919,0:40:03.879
 some voice please. No I can't hear

0:40:03.879,0:40:05.509
It seems to me that

0:40:05.509,0:40:08.269
you don't you have to do all the work that you do with locking

0:40:08.269,0:40:11.469
well if you're not on SMP right?

0:40:11.469,0:40:13.029
well no

0:40:13.029,0:40:15.259
it's not right because the

0:40:15.259,0:40:20.210
you have to protect even against some mechanism like preemption

0:40:20.210,0:40:25.989
which is going to be tricky.It is dfferent  implemented than FreeBSD 4.x so

0:40:25.989,0:40:28.909
it's going to be with preemption its like

0:40:28.909,0:40:30.099
from

0:40:30.099,0:40:34.479
it's like if you have a real SMP system from our technical point of view

0:40:34.479,0:40:35.809
so you have to handle

0:40:35.809,0:40:38.339
problems typical of that

0:40:38.339,0:40:43.249
really in the kernel we have other kind of synchronization like atomics

0:40:43.249,0:40:45.500
I don't, I should have had

0:40:45.500,0:40:50.609
a slide about that but it disappeared so I can tell you by voice

0:40:50.609,0:40:55.170
its well like we have the possibility to use atomic instruction in the

0:40:55.170,0:40:57.369
in FreeBSD kernel directly

0:40:57.369,0:40:59.249
but the

0:40:59.249,0:41:03.119
to use even memory bytes linked with them

0:41:03.119,0:41:08.869
the only pitfall is that you cannot really trust about the

0:41:08.869,0:41:10.469
cash coherency

0:41:10.469,0:41:14.339
because as long as it's Im be specific you can just

0:41:14.339,0:41:16.989
you can just be trust about

0:41:16.989,0:41:21.879
what happens in your CPU where use the atomic and where to use the memory byte

0:41:21.879,0:41:26.349
 you cannot make assumptions about the what happens about if other CPUs

0:41:26.349,0:41:29.289
can see your modifiers or not

0:41:29.289,0:41:31.640
and if the cache can handle that

0:41:31.640,0:41:37.119
we have a specific primitives in order to for example disable preemption

0:41:37.119,0:41:39.379
which are the critical sections

0:41:39.379,0:41:42.179
critical entry and critical exit

0:41:42.179,0:41:45.309
that what you call them you are not to

0:41:45.309,0:41:48.219
the preemption is simply allowed

0:41:48.219,0:41:54.749
it's that's a very fast primitive so there is not much overhead

0:41:54.749,0:41:56.049
so there's not much overhead

0:41:56.049,0:42:00.679
we also have a way to disable interrupt which is unofficial.I will tell

0:42:00.679,0:42:03.079
that

0:42:03.079,0:42:07.720
because you can do that in machine dependant way

0:42:07.720,0:42:10.619
with a spin lock entry and spin lock exit

0:42:10.619,0:42:14.989
 and then

0:42:14.989,0:42:16.049
yeah that you can

0:42:16.049,0:42:17.389
even disable

0:42:17.389,0:42:19.479
some thread migration

0:42:19.479,0:42:22.940
 using skid primitives

0:42:22.940,0:42:25.319
that are very useful

0:42:25.319,0:42:29.779
when you are going to adjust for example to per-CPU datas

0:42:29.779,0:42:33.270
and you have several chases and you don't want the CPU migrate

0:42:33.270,0:42:34.200
from that

0:42:34.200,0:42:36.619
thread migrate from that CPU

0:42:36.619,0:42:38.729
because you could read different

0:42:38.729,0:42:45.369
values from different CPU then

0:42:45.369,0:42:46.479
I'm not sure

0:42:46.479,0:42:52.079
if there is something else okay

0:42:52.079,0:42:57.229
questions? no?

0:42:57.229,0:42:58.189
so i'll see you later"