ks-version-1-1 / data /subtitles /Deep Learning.vtt
NIKKI77's picture
Deploy: GPU-ready HF Space (Docker)
903b444
WEBVTT - Subtitles by: DownloadYoutubeSubtitles.com
00:00:03.040 --> 00:00:08.960
hello and welcome to the session on deep
00:00:05.920 --> 00:00:11.599
learning my name is mohan and in this
00:00:08.960 --> 00:00:14.080
video we are going to talk about what
00:00:11.599 --> 00:00:16.160
deep learning is all about some of you
00:00:14.080 --> 00:00:19.520
may be already familiar with the image
00:00:16.160 --> 00:00:22.640
recognition how does image recognition
00:00:19.520 --> 00:00:25.680
work you can train this application or
00:00:22.640 --> 00:00:28.080
your machine to recognize whether a
00:00:25.680 --> 00:00:30.080
given image is a cat or a dog and this
00:00:28.080 --> 00:00:32.239
is how it works at a very high level it
00:00:30.080 --> 00:00:34.480
uses artificial neural network it is
00:00:32.239 --> 00:00:36.559
trained with some known images and
00:00:34.480 --> 00:00:38.960
during the training it is told if it is
00:00:36.559 --> 00:00:41.040
recognizing correctly or not and then
00:00:38.960 --> 00:00:42.960
when new images are submitted it
00:00:41.040 --> 00:00:45.680
recognizes correctly based on the
00:00:42.960 --> 00:00:47.600
accuracy of course so a little quick
00:00:45.680 --> 00:00:50.480
understanding about artificial neural
00:00:47.600 --> 00:00:53.600
networks so this is the way it does is
00:00:50.480 --> 00:00:56.000
you provide a lot of training data also
00:00:53.600 --> 00:00:59.359
known as labeled data for example in
00:00:56.000 --> 00:01:02.640
this case these are the images of dogs
00:00:59.359 --> 00:01:05.600
and the network extracts some features
00:01:02.640 --> 00:01:08.640
that makes a dog a dog right so that is
00:01:05.600 --> 00:01:11.760
known as feature extraction and based on
00:01:08.640 --> 00:01:13.760
that when you submit a new image of dog
00:01:11.760 --> 00:01:15.119
the basic features remain pretty much
00:01:13.760 --> 00:01:17.759
the same it may be a completely
00:01:15.119 --> 00:01:21.280
different image but the features of a
00:01:17.759 --> 00:01:23.200
dog still remain pretty much the same in
00:01:21.280 --> 00:01:25.680
various different images let's say
00:01:23.200 --> 00:01:28.000
compared to a cat and that's the way
00:01:25.680 --> 00:01:30.479
artificial neural network works we'll go
00:01:28.000 --> 00:01:32.240
into details of this uh very shortly and
00:01:30.479 --> 00:01:35.119
once the training is done with training
00:01:32.240 --> 00:01:37.439
data we then test it with some test data
00:01:35.119 --> 00:01:39.840
too which is basically completely new
00:01:37.439 --> 00:01:42.240
data which the system has not seen
00:01:39.840 --> 00:01:43.920
before unlike the training data and then
00:01:42.240 --> 00:01:46.560
we find out whether it is predicting
00:01:43.920 --> 00:01:49.280
correctly or not thereby we know whether
00:01:46.560 --> 00:01:50.799
the training is complete or it needs
00:01:49.280 --> 00:01:53.119
more training so that's not a very high
00:01:50.799 --> 00:01:55.040
level artificial neural network works so
00:01:53.119 --> 00:01:57.119
this is what we are going to talk about
00:01:55.040 --> 00:01:59.200
today our agenda looks something like
00:01:57.119 --> 00:02:00.799
this what is deep learning why do we
00:01:59.200 --> 00:02:03.040
need deep learning and then what are the
00:02:00.799 --> 00:02:05.920
applications of deep learning one of the
00:02:03.040 --> 00:02:08.239
main components the secret sauce in deep
00:02:05.920 --> 00:02:09.599
learning is neural networks so we're
00:02:08.239 --> 00:02:10.879
going to talk about what is neural
00:02:09.599 --> 00:02:12.879
network and
00:02:10.879 --> 00:02:15.520
how it works and some of its components
00:02:12.879 --> 00:02:17.440
like for example the activation function
00:02:15.520 --> 00:02:20.160
the gradient descent and so on and so
00:02:17.440 --> 00:02:21.680
forth so that as a part of working of a
00:02:20.160 --> 00:02:23.520
neural network we will go into little
00:02:21.680 --> 00:02:26.720
bit more details how this whole thing
00:02:23.520 --> 00:02:29.360
works so without much further ado let's
00:02:26.720 --> 00:02:31.520
get started so deep learning is
00:02:29.360 --> 00:02:34.080
considered to be a part of machine
00:02:31.520 --> 00:02:36.720
learning so this diagram very nicely
00:02:34.080 --> 00:02:39.599
depicts what deep learning is at a very
00:02:36.720 --> 00:02:42.480
high level you have the all-encompassing
00:02:39.599 --> 00:02:45.840
artificial intelligence which is more a
00:02:42.480 --> 00:02:47.680
concept rather than a technology or a
00:02:45.840 --> 00:02:49.280
technical concept right so it is it's
00:02:47.680 --> 00:02:51.440
more of a concept at a very high level
00:02:49.280 --> 00:02:53.200
artificial intelligence under the herd
00:02:51.440 --> 00:02:55.360
is actually machine learning and deep
00:02:53.200 --> 00:02:58.560
learning and machine learning is a
00:02:55.360 --> 00:03:01.840
broader concept you can say or a broader
00:02:58.560 --> 00:03:03.280
technology and deep learning is a subset
00:03:01.840 --> 00:03:05.440
of machine learning the primary
00:03:03.280 --> 00:03:07.920
difference between machine learning and
00:03:05.440 --> 00:03:11.519
deep learning is that deep learning uses
00:03:07.920 --> 00:03:14.080
neural networks and it is suitable for
00:03:11.519 --> 00:03:16.480
handling large amounts of unstructured
00:03:14.080 --> 00:03:18.080
data and the last but not least one of
00:03:16.480 --> 00:03:19.599
the major differences between machine
00:03:18.080 --> 00:03:22.080
learning and deep learning is that in
00:03:19.599 --> 00:03:24.640
machine learning the feature extraction
00:03:22.080 --> 00:03:26.959
or the feature engineering is done by
00:03:24.640 --> 00:03:29.280
the data scientists manually but in deep
00:03:26.959 --> 00:03:30.799
learning since we use neural networks
00:03:29.280 --> 00:03:32.720
the feature engineering happens
00:03:30.799 --> 00:03:34.720
automatically so that's a little bit of
00:03:32.720 --> 00:03:36.000
a quick difference between machine
00:03:34.720 --> 00:03:38.159
learning and deep learning and this
00:03:36.000 --> 00:03:40.000
diagram very nicely depicts the relation
00:03:38.159 --> 00:03:42.239
between artificial intelligence machine
00:03:40.000 --> 00:03:44.319
learning and deep learning now why do we
00:03:42.239 --> 00:03:47.040
need deep learning machine learning was
00:03:44.319 --> 00:03:49.120
there for quite some time and it can do
00:03:47.040 --> 00:03:51.599
a lot of stuff that probably what deep
00:03:49.120 --> 00:03:53.680
learning can do but it's not very good
00:03:51.599 --> 00:03:57.200
at handling large amounts of
00:03:53.680 --> 00:03:59.920
unstructured data like images voice or
00:03:57.200 --> 00:04:01.920
even text for that matter so traditional
00:03:59.920 --> 00:04:03.519
machine learning is not that very good
00:04:01.920 --> 00:04:05.040
at doing this traditional machine
00:04:03.519 --> 00:04:07.040
learning can handle large amounts of
00:04:05.040 --> 00:04:09.120
structured data but when it comes to
00:04:07.040 --> 00:04:10.480
unstructured data it's a big challenge
00:04:09.120 --> 00:04:12.560
so that is one of the key
00:04:10.480 --> 00:04:15.519
differentiators for deep learning so
00:04:12.560 --> 00:04:18.320
that is number one and increasingly for
00:04:15.519 --> 00:04:20.400
artificial intelligence we need image
00:04:18.320 --> 00:04:22.320
recognition and we need to process
00:04:20.400 --> 00:04:23.680
analyze images and voice that's the
00:04:22.320 --> 00:04:25.520
reason deep learning is required
00:04:23.680 --> 00:04:27.840
compared to let's say traditional
00:04:25.520 --> 00:04:31.199
machine learning it can also perform
00:04:27.840 --> 00:04:33.120
complex algorithms more complex than
00:04:31.199 --> 00:04:35.919
let's say what machine learning can do
00:04:33.120 --> 00:04:38.000
and it can achieve best performance with
00:04:35.919 --> 00:04:39.919
the large amounts of data so the more
00:04:38.000 --> 00:04:42.800
you have the data let's say reference
00:04:39.919 --> 00:04:44.639
data or label data the better the system
00:04:42.800 --> 00:04:46.960
will do because the training process
00:04:44.639 --> 00:04:49.040
will be that much better and last but
00:04:46.960 --> 00:04:51.600
not least with deep learning you can
00:04:49.040 --> 00:04:53.360
really avoid the manual process of
00:04:51.600 --> 00:04:55.280
feature extraction those are some of the
00:04:53.360 --> 00:04:57.120
reasons why we need deep learning some
00:04:55.280 --> 00:05:00.160
of the applications of deep learning
00:04:57.120 --> 00:05:02.960
deep learning has made major inroads and
00:05:00.160 --> 00:05:05.440
it is a major area in which deep
00:05:02.960 --> 00:05:08.880
learning is applied is healthcare and
00:05:05.440 --> 00:05:12.080
within healthcare particularly oncology
00:05:08.880 --> 00:05:15.199
which is basically cancer related stuff
00:05:12.080 --> 00:05:17.919
one of the issues with cancer is that a
00:05:15.199 --> 00:05:20.960
lot of cancers today are curable they
00:05:17.919 --> 00:05:23.360
can be cured they are detected early on
00:05:20.960 --> 00:05:25.600
and the challenge with that is when a
00:05:23.360 --> 00:05:28.080
diagnostics is performed let's say an
00:05:25.600 --> 00:05:30.320
image has been taken of a patient to
00:05:28.080 --> 00:05:33.120
detect whether there is cancer or not
00:05:30.320 --> 00:05:35.120
you need a specialist to look at the
00:05:33.120 --> 00:05:38.080
image and determine whether it is the
00:05:35.120 --> 00:05:41.199
patient is fine or there is any onset of
00:05:38.080 --> 00:05:44.160
cancer and the number of specialists are
00:05:41.199 --> 00:05:46.639
limited so if we use deep learning if we
00:05:44.160 --> 00:05:48.880
use automation here or if we use
00:05:46.639 --> 00:05:52.000
artificial intelligence here then the
00:05:48.880 --> 00:05:54.639
system can with a certain amount of the
00:05:52.000 --> 00:05:57.520
good amount of accuracy determine
00:05:54.639 --> 00:06:00.000
whether a particular patient is having
00:05:57.520 --> 00:06:02.960
cancer or not so the prediction or the
00:06:00.000 --> 00:06:05.919
detection process of a disease like
00:06:02.960 --> 00:06:08.160
cancer can be expedited the detection
00:06:05.919 --> 00:06:10.800
process can be expedited can be faster
00:06:08.160 --> 00:06:13.600
without really waiting for a specialist
00:06:10.800 --> 00:06:15.919
we can obviously then once the
00:06:13.600 --> 00:06:18.479
application once the artificial
00:06:15.919 --> 00:06:20.800
intelligence detects or predicts that
00:06:18.479 --> 00:06:23.120
there is an onset of a cancer this can
00:06:20.800 --> 00:06:25.680
be cross-checked by a doctor but at
00:06:23.120 --> 00:06:27.520
least the initial screening process can
00:06:25.680 --> 00:06:29.919
be automated and that is where the
00:06:27.520 --> 00:06:32.160
current focus is with respect to deep
00:06:29.919 --> 00:06:34.560
learning in healthcare what else
00:06:32.160 --> 00:06:38.319
robotics is another area deep learning
00:06:34.560 --> 00:06:40.880
is majorly used in robotics and you must
00:06:38.319 --> 00:06:43.199
have seen nowadays robots are everywhere
00:06:40.880 --> 00:06:45.120
humanoids the industrial robots which
00:06:43.199 --> 00:06:48.080
are used for manufacturing process you
00:06:45.120 --> 00:06:50.639
must have heard about sofia who got
00:06:48.080 --> 00:06:53.360
citizenship with saudi arabia and so on
00:06:50.639 --> 00:06:55.840
there are multiple such robots which are
00:06:53.360 --> 00:06:58.880
knowledge oriented but there are also
00:06:55.840 --> 00:07:00.639
industrial robots are used in industries
00:06:58.880 --> 00:07:03.120
in the manufacturing process and
00:07:00.639 --> 00:07:05.440
increasingly in security and also in
00:07:03.120 --> 00:07:07.840
defense for example image processing
00:07:05.440 --> 00:07:10.080
video is fed to them and they need to be
00:07:07.840 --> 00:07:11.599
able to detect objects obstacles and so
00:07:10.080 --> 00:07:13.520
on and so forth so that's where deep
00:07:11.599 --> 00:07:15.599
learning is used they need to be able to
00:07:13.520 --> 00:07:17.520
hear and make sense of the sounds that
00:07:15.599 --> 00:07:20.400
they are hearing that needs deep
00:07:17.520 --> 00:07:22.800
learning as well so robotics is a major
00:07:20.400 --> 00:07:25.680
area where deep learning is applied then
00:07:22.800 --> 00:07:27.919
we have self-driving cars or autonomous
00:07:25.680 --> 00:07:30.960
cars you must have heard of google's
00:07:27.919 --> 00:07:33.759
autonomous car which has been tested for
00:07:30.960 --> 00:07:35.440
millions of miles and pretty much
00:07:33.759 --> 00:07:37.120
incident free there were of course a
00:07:35.440 --> 00:07:39.759
couple of incidents here and there but
00:07:37.120 --> 00:07:42.880
it is uh considered to be fairly safe
00:07:39.759 --> 00:07:45.120
and there are today a lot of automotive
00:07:42.880 --> 00:07:47.520
companies in fact pretty much every
00:07:45.120 --> 00:07:49.919
automotive company worth its name is
00:07:47.520 --> 00:07:52.080
investing in self-driving cars or
00:07:49.919 --> 00:07:54.560
autonomous cars and it is predicted that
00:07:52.080 --> 00:07:56.160
in the next probably 10 to 15 years
00:07:54.560 --> 00:07:59.120
these will be in production and they
00:07:56.160 --> 00:08:01.039
will be used extensively in real life
00:07:59.120 --> 00:08:03.039
right now they are all in rnd and in
00:08:01.039 --> 00:08:05.360
test phases but pretty soon these will
00:08:03.039 --> 00:08:07.280
be on the road so this is another area
00:08:05.360 --> 00:08:08.960
where deep learning is used and how is
00:08:07.280 --> 00:08:11.759
it used where is it used within
00:08:08.960 --> 00:08:14.960
autonomous driving the car actually is
00:08:11.759 --> 00:08:17.039
fed with video of surroundings and it is
00:08:14.960 --> 00:08:18.879
supposed to process that information
00:08:17.039 --> 00:08:20.800
process that video and determine if
00:08:18.879 --> 00:08:23.039
there are any obstacles it has to
00:08:20.800 --> 00:08:25.759
determine if there are any cars in the
00:08:23.039 --> 00:08:28.160
site will detect whether it is driving
00:08:25.759 --> 00:08:31.759
in the lane also it has to determine
00:08:28.160 --> 00:08:34.159
whether the signal is green or red so
00:08:31.759 --> 00:08:37.760
that accordingly it can move forward or
00:08:34.159 --> 00:08:39.599
wait so for all these video analysis
00:08:37.760 --> 00:08:41.919
deep learning is used in addition to
00:08:39.599 --> 00:08:44.720
that the training overall training to
00:08:41.919 --> 00:08:47.200
drive the car happens in a deep learning
00:08:44.720 --> 00:08:48.720
environment so again a lot of scope here
00:08:47.200 --> 00:08:51.120
to use deep learning a couple of other
00:08:48.720 --> 00:08:54.880
applications are mission translations
00:08:51.120 --> 00:08:57.760
today we have a lot of information and
00:08:54.880 --> 00:08:59.519
very often this information is in one
00:08:57.760 --> 00:09:03.120
particular language and more
00:08:59.519 --> 00:09:05.519
specifically in english and people need
00:09:03.120 --> 00:09:08.560
information in various parts of the
00:09:05.519 --> 00:09:11.120
world it is pretty difficult for human
00:09:08.560 --> 00:09:13.519
beings to translate each and every piece
00:09:11.120 --> 00:09:15.279
of information or every document into
00:09:13.519 --> 00:09:17.440
all possible languages there are
00:09:15.279 --> 00:09:19.600
probably at least hundreds of languages
00:09:17.440 --> 00:09:22.720
or if not more to translate each and
00:09:19.600 --> 00:09:25.920
every document into every language is
00:09:22.720 --> 00:09:28.560
pretty difficult therefore we can use
00:09:25.920 --> 00:09:31.440
deep learning to do pretty much like a
00:09:28.560 --> 00:09:33.200
real-time translation mechanism so we
00:09:31.440 --> 00:09:36.160
don't have to translate everything and
00:09:33.200 --> 00:09:38.640
keep it ready but we train applications
00:09:36.160 --> 00:09:41.519
or artificial intelligence systems that
00:09:38.640 --> 00:09:44.560
will do the translation on the fly for
00:09:41.519 --> 00:09:46.320
example you go to somewhere like china
00:09:44.560 --> 00:09:48.480
and you want to know what is written on
00:09:46.320 --> 00:09:50.800
a signboard now it is impossible for
00:09:48.480 --> 00:09:52.800
somebody to translate that and put it on
00:09:50.800 --> 00:09:55.440
the web or something like that so you
00:09:52.800 --> 00:09:57.920
have an application which is trained to
00:09:55.440 --> 00:10:00.000
translate stuff on the fly so you
00:09:57.920 --> 00:10:02.240
probably this can be running on your
00:10:00.000 --> 00:10:05.200
mobile phone on your smartphone you scan
00:10:02.240 --> 00:10:07.440
this the application will instantly
00:10:05.200 --> 00:10:10.240
translate that from chinese to english
00:10:07.440 --> 00:10:11.760
that is one then there could be web
00:10:10.240 --> 00:10:14.399
applications where there may be a
00:10:11.760 --> 00:10:16.640
research document which is all in maybe
00:10:14.399 --> 00:10:19.839
chinese or japanese and you want to
00:10:16.640 --> 00:10:22.000
translate that to study that document or
00:10:19.839 --> 00:10:23.839
in that case you need to translate so
00:10:22.000 --> 00:10:26.160
therefore deep learning is used in such
00:10:23.839 --> 00:10:28.160
situations as well and that is again on
00:10:26.160 --> 00:10:30.240
demand so it is not like you have to
00:10:28.160 --> 00:10:31.920
translate all these documents from other
00:10:30.240 --> 00:10:34.000
languages into english and one shot and
00:10:31.920 --> 00:10:36.480
keep it somewhere that is again pretty
00:10:34.000 --> 00:10:38.160
much an impossible task but on a neat
00:10:36.480 --> 00:10:40.399
basis so you have systems that are
00:10:38.160 --> 00:10:42.000
trained to translate on the fly so
00:10:40.399 --> 00:10:43.600
mission translation is another major
00:10:42.000 --> 00:10:45.920
area where deep learning is used then
00:10:43.600 --> 00:10:48.800
there are a few other upcoming areas
00:10:45.920 --> 00:10:51.279
where synthesizing is done by neural
00:10:48.800 --> 00:10:53.680
nets for example music composition and
00:10:51.279 --> 00:10:56.880
generation of music so you can train a
00:10:53.680 --> 00:10:59.680
neural net to produce music even to
00:10:56.880 --> 00:11:02.000
compose music so this is a fun thing
00:10:59.680 --> 00:11:04.720
this is still upcoming it needs a lot of
00:11:02.000 --> 00:11:06.640
effort to train such neural net it has
00:11:04.720 --> 00:11:09.120
been proved that it is possible so this
00:11:06.640 --> 00:11:11.760
is a relatively new area and on the same
00:11:09.120 --> 00:11:13.920
lines colorization of images so these
00:11:11.760 --> 00:11:15.839
two images on the left hand side is a
00:11:13.920 --> 00:11:18.720
grayscale image or a black and white
00:11:15.839 --> 00:11:20.480
image this was colored by a neural net
00:11:18.720 --> 00:11:22.959
or a deep learning application as you
00:11:20.480 --> 00:11:25.040
can see it's done a very good job of
00:11:22.959 --> 00:11:28.000
applying the colors and obviously this
00:11:25.040 --> 00:11:30.320
was trained to do this colorization but
00:11:28.000 --> 00:11:33.360
yes this is one more application of deep
00:11:30.320 --> 00:11:37.279
learning now one of the major secret
00:11:33.360 --> 00:11:40.160
sauce of deep learning is neural network
00:11:37.279 --> 00:11:42.240
deep learning works on neural network or
00:11:40.160 --> 00:11:45.279
consists of neural network so let us see
00:11:42.240 --> 00:11:49.040
what is neural network neural network or
00:11:45.279 --> 00:11:53.360
artificial neural network is designed or
00:11:49.040 --> 00:11:56.880
based on the human brain now human brain
00:11:53.360 --> 00:11:59.519
consists of billions of small cells that
00:11:56.880 --> 00:12:03.120
are known as neurons artificial neural
00:11:59.519 --> 00:12:05.519
networks is in a way trying to simulate
00:12:03.120 --> 00:12:07.839
the human brain so this is a quick
00:12:05.519 --> 00:12:10.399
diagram of biological neuron a
00:12:07.839 --> 00:12:12.959
biological neuron consists of the major
00:12:10.399 --> 00:12:16.079
part which is the cell nucleus and then
00:12:12.959 --> 00:12:18.240
it has some tentacles kind of stuff on
00:12:16.079 --> 00:12:20.160
the top called dendrite and then there
00:12:18.240 --> 00:12:22.399
is like a long tail which is known as
00:12:20.160 --> 00:12:24.240
the axon further again at the end of
00:12:22.399 --> 00:12:27.680
this action are what are known as
00:12:24.240 --> 00:12:30.880
synapses these in turn are connected to
00:12:27.680 --> 00:12:33.680
the dendrites of the next neuron and all
00:12:30.880 --> 00:12:35.440
these neurons are interconnected with
00:12:33.680 --> 00:12:37.519
each other therefore they are like
00:12:35.440 --> 00:12:39.440
billions of them sitting in our brain
00:12:37.519 --> 00:12:42.000
and they're all active they're working
00:12:39.440 --> 00:12:45.360
they based on the signals they receive
00:12:42.000 --> 00:12:47.920
signals as inputs from other neurons or
00:12:45.360 --> 00:12:50.639
maybe from other parts of the body and
00:12:47.920 --> 00:12:52.720
based on certain criteria they send
00:12:50.639 --> 00:12:54.800
signals to the neurons at the other end
00:12:52.720 --> 00:12:56.880
so they they get either activated or
00:12:54.800 --> 00:12:59.760
they don't get activated based on so it
00:12:56.880 --> 00:13:02.480
is like a binary gates so they get
00:12:59.760 --> 00:13:04.800
activated or not activated based on the
00:13:02.480 --> 00:13:06.399
inputs that they receive and so on so we
00:13:04.800 --> 00:13:08.720
will see a little bit of those details
00:13:06.399 --> 00:13:10.880
as we move forward in our artificial
00:13:08.720 --> 00:13:12.320
neuron but this is a biological neuron
00:13:10.880 --> 00:13:15.200
this is the structure of a biological
00:13:12.320 --> 00:13:17.680
neuron and artificial neural network is
00:13:15.200 --> 00:13:20.320
based on the human brain the smallest
00:13:17.680 --> 00:13:23.440
component of artificial neural network
00:13:20.320 --> 00:13:25.839
is an artificial neuron as shown here
00:13:23.440 --> 00:13:28.000
sometimes is also referred to as
00:13:25.839 --> 00:13:30.240
perceptron now this is a very high level
00:13:28.000 --> 00:13:32.800
diagram the artificial neuron has a
00:13:30.240 --> 00:13:35.760
small central unit which will receive
00:13:32.800 --> 00:13:38.320
the input if it is doing let's say image
00:13:35.760 --> 00:13:41.040
processing the inputs could be pixel
00:13:38.320 --> 00:13:44.480
values of the image which is represented
00:13:41.040 --> 00:13:47.680
here as x1 x2 and so on each of the
00:13:44.480 --> 00:13:50.320
inputs are multiplied by what is known
00:13:47.680 --> 00:13:53.200
as weights which are represented as w1
00:13:50.320 --> 00:13:56.240
w2 and so on there is in the central
00:13:53.200 --> 00:13:59.600
unit basically there is a summation of
00:13:56.240 --> 00:14:03.279
these weighted inputs which is like x1
00:13:59.600 --> 00:14:06.160
into w1 plus x2 into w2 and so on the
00:14:03.279 --> 00:14:08.079
products are then added and then there
00:14:06.160 --> 00:14:10.720
is a bias that is added to that in the
00:14:08.079 --> 00:14:12.959
next slide we will see that passes
00:14:10.720 --> 00:14:16.160
through an activation function and the
00:14:12.959 --> 00:14:18.720
output comes as a y which is the output
00:14:16.160 --> 00:14:20.880
and based on certain criteria the cell
00:14:18.720 --> 00:14:23.519
gets either activated or not activated
00:14:20.880 --> 00:14:26.959
so this output would be like a zero or a
00:14:23.519 --> 00:14:28.639
one binary format okay so we will see
00:14:26.959 --> 00:14:30.639
that in a little bit more detail but
00:14:28.639 --> 00:14:33.040
let's do a quick comparison between
00:14:30.639 --> 00:14:35.040
biological and artificial neurons just
00:14:33.040 --> 00:14:36.639
like a biological neuron there are
00:14:35.040 --> 00:14:39.600
dendrites and then there is a cell
00:14:36.639 --> 00:14:42.880
nucleus and synapse and an axon
00:14:39.600 --> 00:14:45.920
we have in the artificial neuron as well
00:14:42.880 --> 00:14:48.160
these inputs come like the dead right if
00:14:45.920 --> 00:14:50.320
you will act like the dendrites there is
00:14:48.160 --> 00:14:52.880
a like a central unit which performs the
00:14:50.320 --> 00:14:56.160
summation of these uh weighted inputs
00:14:52.880 --> 00:14:58.880
which is basically w1 x1 w2 x2 and so on
00:14:56.160 --> 00:15:00.639
and then our bias is added here and then
00:14:58.880 --> 00:15:02.880
that passes through what is known as an
00:15:00.639 --> 00:15:04.639
activation function okay so these are
00:15:02.880 --> 00:15:06.880
known as the weights w1 w2 and then
00:15:04.639 --> 00:15:09.519
there is a bias which will come out here
00:15:06.880 --> 00:15:11.600
and that is added the bias is by the way
00:15:09.519 --> 00:15:14.320
common for a particular neuron so there
00:15:11.600 --> 00:15:16.800
won't be like b1 b2 b3 and so on only
00:15:14.320 --> 00:15:19.440
weights will be one per input the bias
00:15:16.800 --> 00:15:22.639
is common for the entire neuron it is
00:15:19.440 --> 00:15:25.360
also common for or the value of the bias
00:15:22.639 --> 00:15:28.000
remains the same for all the neurons in
00:15:25.360 --> 00:15:29.920
a particular layer we will also see this
00:15:28.000 --> 00:15:31.600
as we move forward and we see deep
00:15:29.920 --> 00:15:34.160
neural network where there are multiple
00:15:31.600 --> 00:15:37.920
neurons so that's the output now the
00:15:34.160 --> 00:15:41.519
whole exercise of training the neuron is
00:15:37.920 --> 00:15:43.519
about changing these weights and biases
00:15:41.519 --> 00:15:46.000
as i mentioned artificial neural network
00:15:43.519 --> 00:15:48.560
will consist of several such neurons and
00:15:46.000 --> 00:15:50.880
as a part of the training process these
00:15:48.560 --> 00:15:53.120
weights keep changing initially they are
00:15:50.880 --> 00:15:55.360
assigned some random values through the
00:15:53.120 --> 00:15:57.279
training process the weights the whole
00:15:55.360 --> 00:16:00.880
process of training is to come up with
00:15:57.279 --> 00:16:02.959
the optimum values of w1 w2 and wn and
00:16:00.880 --> 00:16:05.519
then the b4 or the bias for this
00:16:02.959 --> 00:16:08.399
particular neuron such that it gives an
00:16:05.519 --> 00:16:11.040
accurate output as required so let's see
00:16:08.399 --> 00:16:13.440
what exactly that means so the training
00:16:11.040 --> 00:16:16.720
process this is how it happens it takes
00:16:13.440 --> 00:16:19.040
the inputs each input is multiplied by a
00:16:16.720 --> 00:16:20.639
weight and these weights during training
00:16:19.040 --> 00:16:23.440
keep changing so initially they are
00:16:20.639 --> 00:16:25.519
assigned some random values and based on
00:16:23.440 --> 00:16:27.519
the output whether it is correct or
00:16:25.519 --> 00:16:29.759
wrong there is a feedback coming back
00:16:27.519 --> 00:16:33.120
and that will basically change these
00:16:29.759 --> 00:16:36.320
weights until it starts giving the right
00:16:33.120 --> 00:16:39.199
output that is represented in here as
00:16:36.320 --> 00:16:42.320
sigma i going from 1 to n if there are n
00:16:39.199 --> 00:16:46.160
inputs wi into x i so this is the
00:16:42.320 --> 00:16:49.920
product of w1 x1 w2 x2 and so on right
00:16:46.160 --> 00:16:52.959
and there is a bias that gets added here
00:16:49.920 --> 00:16:55.360
and that entire thing goes to what is
00:16:52.959 --> 00:16:59.120
known as an activation function so
00:16:55.360 --> 00:17:02.160
essentially this is sigma of w i x i
00:16:59.120 --> 00:17:05.360
plus a value of bias which is a b so
00:17:02.160 --> 00:17:07.919
that entire thing goes as an input to an
00:17:05.360 --> 00:17:10.480
activation function now this activation
00:17:07.919 --> 00:17:13.520
function takes this as an input gives
00:17:10.480 --> 00:17:15.439
the output as a binary output it could
00:17:13.520 --> 00:17:17.439
be a zero or a one there are of course
00:17:15.439 --> 00:17:18.959
to start with let's assume it's a binary
00:17:17.439 --> 00:17:20.799
output later we will see that there are
00:17:18.959 --> 00:17:23.120
different types of activation functions
00:17:20.799 --> 00:17:25.439
so it need not always be binary output
00:17:23.120 --> 00:17:28.160
but to start with let's keep simple so
00:17:25.439 --> 00:17:30.799
it decides whether the neuron should be
00:17:28.160 --> 00:17:33.280
fired or not so that is the output like
00:17:30.799 --> 00:17:35.280
a binary output 0 or 1. all right so
00:17:33.280 --> 00:17:36.960
again let me summarize this so it takes
00:17:35.280 --> 00:17:39.280
the inputs so if you're processing an
00:17:36.960 --> 00:17:42.559
image for example the inputs are the
00:17:39.280 --> 00:17:44.559
pixel values of the image x1 x2 up to xn
00:17:42.559 --> 00:17:46.480
there could be hundreds of these so all
00:17:44.559 --> 00:17:48.559
of those are fed as so these are some
00:17:46.480 --> 00:17:51.200
values and these pixel values again can
00:17:48.559 --> 00:17:54.400
be from 0 to 56 each of those pixel
00:17:51.200 --> 00:17:56.160
values are then multiplied with what is
00:17:54.400 --> 00:17:58.160
known as a weight this is a numeric
00:17:56.160 --> 00:18:01.360
value can be any value so this is a
00:17:58.160 --> 00:18:03.679
number w1 similarly w2 is a number so
00:18:01.360 --> 00:18:05.600
initially some random values will be
00:18:03.679 --> 00:18:07.520
assigned and each of these weights are
00:18:05.600 --> 00:18:09.919
multiplied with the input value and
00:18:07.520 --> 00:18:12.320
their sum this is known as the weighted
00:18:09.919 --> 00:18:14.960
sum so that is performed in this kind of
00:18:12.320 --> 00:18:17.440
the central unit and then a bias is
00:18:14.960 --> 00:18:20.080
added remember the bias is common for
00:18:17.440 --> 00:18:21.760
each neuron so this is not the bias
00:18:20.080 --> 00:18:24.559
value is not one
00:18:21.760 --> 00:18:26.640
bias value for per input so just keep
00:18:24.559 --> 00:18:28.640
that in mind the bias value there is one
00:18:26.640 --> 00:18:31.360
bias per neuron so it is like this
00:18:28.640 --> 00:18:33.200
summation plus bias is the output from
00:18:31.360 --> 00:18:34.880
the section this is not the complete
00:18:33.200 --> 00:18:37.600
output of the neuron but this is the
00:18:34.880 --> 00:18:39.200
bias for output for step one that goes
00:18:37.600 --> 00:18:41.520
as an input to what is known as
00:18:39.200 --> 00:18:44.320
activation function and that activation
00:18:41.520 --> 00:18:46.720
function results in an output usually a
00:18:44.320 --> 00:18:49.440
binary output like a zero or a one which
00:18:46.720 --> 00:18:51.919
is known as the firing of the neuron
00:18:49.440 --> 00:18:53.840
okay good so we talked about activation
00:18:51.919 --> 00:18:55.760
function so what is an activation
00:18:53.840 --> 00:18:58.880
function an activation function
00:18:55.760 --> 00:19:02.640
basically takes the weighted sum which
00:18:58.880 --> 00:19:05.520
is we saw w1 x1 w2 x2 the sum of all
00:19:02.640 --> 00:19:08.799
that plus the bias so it takes that as
00:19:05.520 --> 00:19:10.640
an input and it generates a certain
00:19:08.799 --> 00:19:12.640
output now there are different types of
00:19:10.640 --> 00:19:14.160
activation functions and the output is
00:19:12.640 --> 00:19:16.720
different for different types of
00:19:14.160 --> 00:19:18.720
activation functions moreover why is an
00:19:16.720 --> 00:19:20.960
activation function required it is
00:19:18.720 --> 00:19:23.520
basically required to bring in
00:19:20.960 --> 00:19:25.760
non-linearity that's the main reason why
00:19:23.520 --> 00:19:26.880
an activation function is required so
00:19:25.760 --> 00:19:28.720
what are the different types of
00:19:26.880 --> 00:19:30.720
activation functions there are several
00:19:28.720 --> 00:19:32.720
types of activation functions but these
00:19:30.720 --> 00:19:35.200
are the most common ones these are the
00:19:32.720 --> 00:19:37.600
ones that are currently in use sigmoid
00:19:35.200 --> 00:19:41.440
function was one of the early activation
00:19:37.600 --> 00:19:44.400
functions but today relu has kind of
00:19:41.440 --> 00:19:46.960
taken over so relu is by far the most
00:19:44.400 --> 00:19:49.600
popular activation function that is used
00:19:46.960 --> 00:19:52.320
today but still sigmoid function is
00:19:49.600 --> 00:19:54.160
still used in many situations these
00:19:52.320 --> 00:19:56.400
different types of activation functions
00:19:54.160 --> 00:19:58.080
are used in different situations based
00:19:56.400 --> 00:20:00.000
on the kind of problem we are trying to
00:19:58.080 --> 00:20:01.840
solve so what exactly is the difference
00:20:00.000 --> 00:20:03.919
between these two sigmoid gives the
00:20:01.840 --> 00:20:06.799
values of the output will be between 0
00:20:03.919 --> 00:20:07.760
and 1. threshold function is the value
00:20:06.799 --> 00:20:10.240
will be
00:20:07.760 --> 00:20:12.400
0 up to a certain value and beyond that
00:20:10.240 --> 00:20:14.960
this is also known as a step function
00:20:12.400 --> 00:20:17.600
and beyond that it will be 1. in case of
00:20:14.960 --> 00:20:19.520
sigmoid there is a gradual increase but
00:20:17.600 --> 00:20:22.000
in case of threshold it's like also
00:20:19.520 --> 00:20:24.400
known as a step function there's a rapid
00:20:22.000 --> 00:20:26.080
or instantaneous change from zero to one
00:20:24.400 --> 00:20:28.400
whereas in sigmoid we will see in the
00:20:26.080 --> 00:20:30.640
next slide there is a gradual increase
00:20:28.400 --> 00:20:33.200
but the value in this case is between
00:20:30.640 --> 00:20:35.600
zero and one as well now relu function
00:20:33.200 --> 00:20:38.880
on the other hand it is equal to
00:20:35.600 --> 00:20:42.960
basically if the input is 0 or less than
00:20:38.880 --> 00:20:46.000
0 then the output is 0 whereas if the
00:20:42.960 --> 00:20:48.000
input is greater than 0 then the output
00:20:46.000 --> 00:20:49.919
is equal to the input i know it's a
00:20:48.000 --> 00:20:52.400
little confusing but in the next slides
00:20:49.919 --> 00:20:54.720
where we show the relu function it will
00:20:52.400 --> 00:20:57.679
become clear similarly hyperbolic
00:20:54.720 --> 00:21:00.159
tangent this is similar to sigmoid in
00:20:57.679 --> 00:21:03.360
terms of the shape of the function
00:21:00.159 --> 00:21:06.400
however while sigmoid goes from 0 to 1
00:21:03.360 --> 00:21:09.520
hyperbolic tangent goes from -1 to 1 and
00:21:06.400 --> 00:21:13.760
here again the increase or the change
00:21:09.520 --> 00:21:15.760
from -1 to 1 is gradual and not like
00:21:13.760 --> 00:21:18.080
threshold or step function where it
00:21:15.760 --> 00:21:20.159
happens instantaneously so let's take a
00:21:18.080 --> 00:21:21.919
little detailed look at some of these
00:21:20.159 --> 00:21:23.919
functions so let's start with the
00:21:21.919 --> 00:21:26.559
sigmoid function so this is the equation
00:21:23.919 --> 00:21:29.679
of a sigmoid function which is 1 by 1
00:21:26.559 --> 00:21:32.799
plus e to the power of minus x so x is
00:21:29.679 --> 00:21:36.880
the value that is the input it goes from
00:21:32.799 --> 00:21:40.000
0 to -1 so this is sigmoid function the
00:21:36.880 --> 00:21:42.640
equation is phi x is equal to 1 by 1
00:21:40.000 --> 00:21:44.400
plus e to the power of minus x and as
00:21:42.640 --> 00:21:47.520
you can see here this is the input on
00:21:44.400 --> 00:21:49.600
the x-axis as x is where the value is
00:21:47.520 --> 00:21:51.440
coming from in fact it can also go
00:21:49.600 --> 00:21:53.200
negative this is negative actually so
00:21:51.440 --> 00:21:55.520
this is the zero so this is the negative
00:21:53.200 --> 00:21:58.720
value of x so as x is coming from
00:21:55.520 --> 00:22:02.080
negative value towards zero the value of
00:21:58.720 --> 00:22:05.120
the output slowly as it is approaching
00:22:02.080 --> 00:22:08.320
zero it it slowly and very gently
00:22:05.120 --> 00:22:11.600
increases and actually at the point let
00:22:08.320 --> 00:22:15.919
me just use a pen at the point here it
00:22:11.600 --> 00:22:19.039
is it is 0.5 it is actually 0.5 okay and
00:22:15.919 --> 00:22:21.440
slowly gradually it increases to 1 as
00:22:19.039 --> 00:22:24.400
the value of x increases but then as the
00:22:21.440 --> 00:22:27.360
value of x increases it tapers off it
00:22:24.400 --> 00:22:29.840
doesn't go beyond one so that is the
00:22:27.360 --> 00:22:32.320
speciality of sigmoid function so the
00:22:29.840 --> 00:22:34.960
output value will remain between zero
00:22:32.320 --> 00:22:37.360
and one it will never go below zero or
00:22:34.960 --> 00:22:39.679
above one okay then so that is sigmoid
00:22:37.360 --> 00:22:42.000
function now this is threshold function
00:22:39.679 --> 00:22:44.880
or this is also referred to as a step
00:22:42.000 --> 00:22:46.640
function and here we can also set the
00:22:44.880 --> 00:22:48.240
threshold in this case it is that's why
00:22:46.640 --> 00:22:50.720
it's called the threshold function
00:22:48.240 --> 00:22:52.559
normally it is 0 but you can also set a
00:22:50.720 --> 00:22:54.240
different value for the threshold now
00:22:52.559 --> 00:22:57.120
the difference between this and the
00:22:54.240 --> 00:22:59.840
sigmoid is that here the change is rapid
00:22:57.120 --> 00:23:02.799
or instantaneous as the x value comes
00:22:59.840 --> 00:23:06.240
from negative up to zero it remains zero
00:23:02.799 --> 00:23:08.640
and at zero it pretty much immediately
00:23:06.240 --> 00:23:11.280
increases to 1 okay so this is a
00:23:08.640 --> 00:23:13.919
mathematical representation of threshold
00:23:11.280 --> 00:23:16.799
function phi x is equal to 1 if x is
00:23:13.919 --> 00:23:18.799
greater than equal to 0 and 0 if x is
00:23:16.799 --> 00:23:20.640
less than 0. so for all negative values
00:23:18.799 --> 00:23:23.120
it is 0 which since we have set the
00:23:20.640 --> 00:23:25.679
threshold to be 0 so as soon as it
00:23:23.120 --> 00:23:28.640
reaches 0 it becomes 1. you see the
00:23:25.679 --> 00:23:31.520
difference between this and the previous
00:23:28.640 --> 00:23:34.720
one which is basically the sigmoid where
00:23:31.520 --> 00:23:37.120
the increase from 0 to 1 is gradual and
00:23:34.720 --> 00:23:39.200
here it is instantaneous and that's why
00:23:37.120 --> 00:23:41.440
this is also known as a step function
00:23:39.200 --> 00:23:43.679
threshold function or step function this
00:23:41.440 --> 00:23:46.159
is a relu a relu is one of the most
00:23:43.679 --> 00:23:48.799
popular activation functions today this
00:23:46.159 --> 00:23:51.679
is the definition of relu phi x is equal
00:23:48.799 --> 00:23:54.400
to max of x comma zero what it says is
00:23:51.679 --> 00:23:55.679
if the value of x is less than zero then
00:23:54.400 --> 00:23:58.880
phi x is
00:23:55.679 --> 00:24:03.600
zero the moment it increases goes beyond
00:23:58.880 --> 00:24:06.720
zero the value of phi x is equal to x so
00:24:03.600 --> 00:24:08.799
it doesn't stop at one actually it goes
00:24:06.720 --> 00:24:10.720
all the way so as the value of x
00:24:08.799 --> 00:24:13.440
increases the value of y will also
00:24:10.720 --> 00:24:17.760
increase infinitely so there is no limit
00:24:13.440 --> 00:24:19.760
here unlike your sigmoid or threshold or
00:24:17.760 --> 00:24:22.559
the next one which is basically
00:24:19.760 --> 00:24:25.200
hyperbolic tangent okay so in case of
00:24:22.559 --> 00:24:28.080
relu remember there is no upper limit
00:24:25.200 --> 00:24:31.039
the output is equal to either 0 in case
00:24:28.080 --> 00:24:34.240
the value of x is negative or it is
00:24:31.039 --> 00:24:37.039
equal to the value of x so for example
00:24:34.240 --> 00:24:39.840
here if the value of x is 10 then the
00:24:37.039 --> 00:24:42.960
value of y is also 10 right okay so that
00:24:39.840 --> 00:24:45.679
is relu and there are several advantages
00:24:42.960 --> 00:24:48.159
of relu and it is much more efficient
00:24:45.679 --> 00:24:49.840
and provides much more accuracy compared
00:24:48.159 --> 00:24:51.679
to other activation functions like
00:24:49.840 --> 00:24:54.320
sigmoid and so on so that's the reason
00:24:51.679 --> 00:24:56.640
it is very popular all right so this is
00:24:54.320 --> 00:24:58.640
hyperbolic tangent activation function
00:24:56.640 --> 00:25:01.279
the function looks similar to sigmoid
00:24:58.640 --> 00:25:03.360
function the curve if you see the shape
00:25:01.279 --> 00:25:05.279
it looks similar to sigmoid function but
00:25:03.360 --> 00:25:08.080
the difference between hyperbolic
00:25:05.279 --> 00:25:10.799
tangent and sigmoid function is that in
00:25:08.080 --> 00:25:13.200
case of sigmoid the output goes from
00:25:10.799 --> 00:25:16.960
zero to one whereas in case of
00:25:13.200 --> 00:25:18.559
hyperbolic tangent it goes from -1 to 1
00:25:16.960 --> 00:25:21.360
so that is the difference between
00:25:18.559 --> 00:25:23.840
hyperbolic tangent and sigmoid function
00:25:21.360 --> 00:25:26.799
otherwise the shape looks very similar
00:25:23.840 --> 00:25:29.279
there is a gradual increase unlike the
00:25:26.799 --> 00:25:31.840
step function where there was an instant
00:25:29.279 --> 00:25:34.159
increase or instant change here again
00:25:31.840 --> 00:25:37.679
very similar to sigmoid function the
00:25:34.159 --> 00:25:40.080
value changes gradually from -1 to 1. so
00:25:37.679 --> 00:25:42.720
this is the equation of hyperbolic
00:25:40.080 --> 00:25:44.799
tangent activation function yeah so then
00:25:42.720 --> 00:25:47.200
let's move on this is a diagrammatic
00:25:44.799 --> 00:25:50.880
representation of the activation
00:25:47.200 --> 00:25:53.440
function and how the overall data how
00:25:50.880 --> 00:25:55.840
the overall progression happens from
00:25:53.440 --> 00:25:57.679
input to the output so we get the input
00:25:55.840 --> 00:25:59.919
from the input layer by the way the
00:25:57.679 --> 00:26:01.440
neural network has three layers
00:25:59.919 --> 00:26:03.120
typically there will be three layers
00:26:01.440 --> 00:26:04.880
there is an input layer there is an
00:26:03.120 --> 00:26:07.600
output layer and then you have the
00:26:04.880 --> 00:26:10.240
hidden layer so the inputs come from the
00:26:07.600 --> 00:26:12.240
input layer and they get processed in
00:26:10.240 --> 00:26:14.400
the hidden layer and then you get the
00:26:12.240 --> 00:26:16.960
output in the output layer so let's take
00:26:14.400 --> 00:26:19.840
a little bit of a detailed look into the
00:26:16.960 --> 00:26:22.880
working of a neural network so let's say
00:26:19.840 --> 00:26:25.679
we want to classify some images between
00:26:22.880 --> 00:26:28.400
dogs and cats how do we do this this is
00:26:25.679 --> 00:26:30.159
known as a classification process and we
00:26:28.400 --> 00:26:31.600
are trying to use neural networks and
00:26:30.159 --> 00:26:33.520
deep learning to implement this
00:26:31.600 --> 00:26:37.440
classification so how do we do that so
00:26:33.520 --> 00:26:40.159
this is how it works so you have four
00:26:37.440 --> 00:26:42.559
layer neural network there is an input
00:26:40.159 --> 00:26:45.440
layer there is an output layer and then
00:26:42.559 --> 00:26:49.440
there are two hidden layers and what we
00:26:45.440 --> 00:26:52.080
do is we provide labeled training data
00:26:49.440 --> 00:26:54.640
which means these images are fed to the
00:26:52.080 --> 00:26:57.120
network with the label saying that okay
00:26:54.640 --> 00:27:00.159
this is a cat the neural network is
00:26:57.120 --> 00:27:02.480
allowed to process it and come up with a
00:27:00.159 --> 00:27:05.039
prediction saying whether it is a cat or
00:27:02.480 --> 00:27:07.200
a dog and obviously in the beginning
00:27:05.039 --> 00:27:09.760
there may be mistakes a cat may be
00:27:07.200 --> 00:27:12.080
classified as a dog so we then say that
00:27:09.760 --> 00:27:14.000
okay this is wrong this output is wrong
00:27:12.080 --> 00:27:16.559
but every time it predicts correctly we
00:27:14.000 --> 00:27:19.120
say yes this output is correct so that
00:27:16.559 --> 00:27:21.760
learning process so it will go back make
00:27:19.120 --> 00:27:24.720
some changes to its weights and biases
00:27:21.760 --> 00:27:26.799
we again feed these inputs and it will
00:27:24.720 --> 00:27:28.799
give us the output we will check whether
00:27:26.799 --> 00:27:31.360
it is correct or not and so on so this
00:27:28.799 --> 00:27:34.320
is a iterative process which is known as
00:27:31.360 --> 00:27:36.880
the training process so we are training
00:27:34.320 --> 00:27:39.440
the neural network and what happens in
00:27:36.880 --> 00:27:41.760
the training process these weights and
00:27:39.440 --> 00:27:45.600
biases you remember there were weights
00:27:41.760 --> 00:27:48.880
like w1 w2 and so on so these weights
00:27:45.600 --> 00:27:51.679
and biases keep changing every time you
00:27:48.880 --> 00:27:53.760
feed these which is known as an epoch so
00:27:51.679 --> 00:27:56.159
there are multiple iterations every
00:27:53.760 --> 00:27:58.960
iteration is known as an epoch and each
00:27:56.159 --> 00:28:01.279
time the weights are dated to make sure
00:27:58.960 --> 00:28:03.679
that the maximum number of images are
00:28:01.279 --> 00:28:06.080
classified correctly so once again what
00:28:03.679 --> 00:28:09.600
is the input this input could be like
00:28:06.080 --> 00:28:12.159
1000 images of cats and dogs and they
00:28:09.600 --> 00:28:14.559
are labeled because we know which is a
00:28:12.159 --> 00:28:17.039
cat and which is a dog and we feed those
00:28:14.559 --> 00:28:18.960
thousand images the neural network will
00:28:17.039 --> 00:28:20.799
initially assign some weights and biases
00:28:18.960 --> 00:28:23.120
for each neuron and it will try to
00:28:20.799 --> 00:28:25.120
process extract the features from the
00:28:23.120 --> 00:28:27.279
images and it will try to come up with a
00:28:25.120 --> 00:28:29.679
prediction for each image and that
00:28:27.279 --> 00:28:32.240
prediction that is calculated by the
00:28:29.679 --> 00:28:34.240
network is compared with the actual
00:28:32.240 --> 00:28:36.399
value whether it is a cat or a dog and
00:28:34.240 --> 00:28:38.559
that's how the error is calculated so
00:28:36.399 --> 00:28:41.279
let's say there are a thousand images
00:28:38.559 --> 00:28:43.200
and in the first run only 500 of them
00:28:41.279 --> 00:28:45.440
have been correctly classified that
00:28:43.200 --> 00:28:47.440
means we are getting only 50 accuracy so
00:28:45.440 --> 00:28:49.760
we feed that information back to the
00:28:47.440 --> 00:28:51.919
network further update these weights and
00:28:49.760 --> 00:28:54.480
biases for each of the neurons and we
00:28:51.919 --> 00:28:56.320
run this these inputs once again it will
00:28:54.480 --> 00:28:58.000
try to calculate extract the features
00:28:56.320 --> 00:28:59.840
and it will try to predict which of
00:28:58.000 --> 00:29:02.399
these is cats and dogs and this time
00:28:59.840 --> 00:29:04.480
let's say out of thousand 700 of them
00:29:02.399 --> 00:29:06.720
have been predicted correctly so that
00:29:04.480 --> 00:29:09.679
means in the second iteration the
00:29:06.720 --> 00:29:12.559
accuracy has increased from 50 to 70
00:29:09.679 --> 00:29:15.039
percent all right then we go back again
00:29:12.559 --> 00:29:17.760
we feed this maybe for a third iteration
00:29:15.039 --> 00:29:20.799
fourth iteration and so on and slowly
00:29:17.760 --> 00:29:23.360
and steadily the accuracy of this
00:29:20.799 --> 00:29:26.080
network will keep increasing and it may
00:29:23.360 --> 00:29:28.240
reach probably you never know 90 95
00:29:26.080 --> 00:29:30.240
percent and there are several parameters
00:29:28.240 --> 00:29:32.720
that are known as hyper parameters that
00:29:30.240 --> 00:29:34.880
need to be changed and tweaked and that
00:29:32.720 --> 00:29:37.760
is the overall training process and
00:29:34.880 --> 00:29:39.200
ultimately at some point we say okay you
00:29:37.760 --> 00:29:42.080
will probably never reach hundred
00:29:39.200 --> 00:29:44.159
percent accuracy but then we set a limit
00:29:42.080 --> 00:29:46.080
saying that okay if we receive 95
00:29:44.159 --> 00:29:48.399
percent accuracy that is good enough for
00:29:46.080 --> 00:29:50.320
our application and then we say okay our
00:29:48.399 --> 00:29:53.120
training process is done so that is the
00:29:50.320 --> 00:29:55.760
way training happens and once the
00:29:53.120 --> 00:29:58.399
training is done now with the training
00:29:55.760 --> 00:30:01.039
data set the system has let's say seen
00:29:58.399 --> 00:30:03.760
all these thousand images therefore what
00:30:01.039 --> 00:30:05.840
we do is the next step like in any
00:30:03.760 --> 00:30:08.399
normal machine learning process we do
00:30:05.840 --> 00:30:10.799
the testing where we take a fresh set of
00:30:08.399 --> 00:30:13.039
images and we feed it to the network the
00:30:10.799 --> 00:30:14.880
fresh set which it has not seen before
00:30:13.039 --> 00:30:16.559
as a part of the training process and
00:30:14.880 --> 00:30:18.159
this is again nothing new in deep
00:30:16.559 --> 00:30:20.720
learning this was there in machine
00:30:18.159 --> 00:30:23.440
learning as well so you feed the test
00:30:20.720 --> 00:30:25.520
images and then find out whether we are
00:30:23.440 --> 00:30:27.600
getting a similar accuracy or not so
00:30:25.520 --> 00:30:29.520
maybe that accuracy may reduce a little
00:30:27.600 --> 00:30:31.840
bit while training you may get 98
00:30:29.520 --> 00:30:33.760
percent and then for test you may get 95
00:30:31.840 --> 00:30:36.480
percent but there shouldn't be a drastic
00:30:33.760 --> 00:30:38.880
drop like for example you get 98 in
00:30:36.480 --> 00:30:40.799
training and then you get 50 or 40
00:30:38.880 --> 00:30:43.279
percent with the test that means your
00:30:40.799 --> 00:30:46.320
network has not learned you may have to
00:30:43.279 --> 00:30:47.919
retrain your network so that is the way
00:30:46.320 --> 00:30:50.799
neural network training works and
00:30:47.919 --> 00:30:53.279
remember the whole process is about
00:30:50.799 --> 00:30:55.679
changing these weights and biases and
00:30:53.279 --> 00:30:57.520
coming up with the optimal values of
00:30:55.679 --> 00:31:00.240
these weights and biases so that the
00:30:57.520 --> 00:31:02.960
accuracy is the maximum possible all
00:31:00.240 --> 00:31:04.960
right so a little bit more detail about
00:31:02.960 --> 00:31:07.520
how this whole thing works so this is
00:31:04.960 --> 00:31:09.840
known as forward propagation which is
00:31:07.520 --> 00:31:12.320
the data or the information is going in
00:31:09.840 --> 00:31:15.279
the forward direction the inputs are
00:31:12.320 --> 00:31:18.399
taken weighted summation is done bias is
00:31:15.279 --> 00:31:21.039
added here and then that is fed to the
00:31:18.399 --> 00:31:23.200
activation function and then that is
00:31:21.039 --> 00:31:25.360
that comes out as an output so that is
00:31:23.200 --> 00:31:27.360
forward propagation and the output is
00:31:25.360 --> 00:31:29.039
compared with the actual value and that
00:31:27.360 --> 00:31:31.200
will give us the error the difference
00:31:29.039 --> 00:31:33.679
between them is the error and in
00:31:31.200 --> 00:31:36.720
technical terms that is also known as
00:31:33.679 --> 00:31:38.880
our cost function and this is what we
00:31:36.720 --> 00:31:40.559
would like to minimize there are
00:31:38.880 --> 00:31:44.000
different ways of defining the cost
00:31:40.559 --> 00:31:47.200
function but one of the simplest ways is
00:31:44.000 --> 00:31:49.120
mean square error so it is nothing but
00:31:47.200 --> 00:31:51.919
the square of the difference of the
00:31:49.120 --> 00:31:53.679
errors or the sum of the squares of the
00:31:51.919 --> 00:31:56.240
difference of the errors and this is
00:31:53.679 --> 00:31:57.760
also nothing new we have probably if
00:31:56.240 --> 00:31:59.760
you're familiar with machine learning
00:31:57.760 --> 00:32:02.159
you must have come across this mean
00:31:59.760 --> 00:32:04.320
square now there are different ways of
00:32:02.159 --> 00:32:06.240
defining cost function it need not
00:32:04.320 --> 00:32:08.720
always be the mean square error but the
00:32:06.240 --> 00:32:11.760
most common one is this so you define
00:32:08.720 --> 00:32:15.200
this cost function and you ask the
00:32:11.760 --> 00:32:17.600
system to minimize this error so we use
00:32:15.200 --> 00:32:21.039
what is known as an optimization
00:32:17.600 --> 00:32:23.519
function to minimize this error and the
00:32:21.039 --> 00:32:25.840
error itself sent back to the system as
00:32:23.519 --> 00:32:27.600
feedback and that is known as back
00:32:25.840 --> 00:32:30.080
propagation and so this is the cost
00:32:27.600 --> 00:32:32.880
function and how do we optimize the cost
00:32:30.080 --> 00:32:35.919
function we use what is known as
00:32:32.880 --> 00:32:39.519
gradient descent so the gradient descent
00:32:35.919 --> 00:32:42.480
mechanism identifies how to change the
00:32:39.519 --> 00:32:45.760
weights and biases so that the cost
00:32:42.480 --> 00:32:47.919
function is minimized and there is also
00:32:45.760 --> 00:32:50.159
what is known as the rate or the
00:32:47.919 --> 00:32:53.120
learning rate that is what is shown here
00:32:50.159 --> 00:32:55.919
as slower and faster so you need to
00:32:53.120 --> 00:32:59.360
specify what should be the learning rate
00:32:55.919 --> 00:33:02.480
now if the learning rate is very small
00:32:59.360 --> 00:33:04.480
then it will probably take very long to
00:33:02.480 --> 00:33:07.279
train whereas if the learning rate is
00:33:04.480 --> 00:33:09.840
very high then it will appear to be
00:33:07.279 --> 00:33:12.159
faster but then it will probably never
00:33:09.840 --> 00:33:14.480
what is known as converge now what is
00:33:12.159 --> 00:33:17.760
convergence now we are talking about a
00:33:14.480 --> 00:33:20.159
few terms here convergence is like this
00:33:17.760 --> 00:33:24.000
this is a representation of convergence
00:33:20.159 --> 00:33:26.240
so the whole idea of gradient descent is
00:33:24.000 --> 00:33:28.640
to optimize the cost function or
00:33:26.240 --> 00:33:30.880
minimize the cost function in order to
00:33:28.640 --> 00:33:34.000
do that we need to represent the cost
00:33:30.880 --> 00:33:36.480
function as this curve we need to come
00:33:34.000 --> 00:33:38.960
to this minimum value that is what is
00:33:36.480 --> 00:33:41.840
known as the minimization of the cost
00:33:38.960 --> 00:33:44.720
function now what happens if we have the
00:33:41.840 --> 00:33:48.000
learning rate very small is that it will
00:33:44.720 --> 00:33:51.200
take very long to come to this point on
00:33:48.000 --> 00:33:53.279
the other hand if you have large higher
00:33:51.200 --> 00:33:56.159
learning rate what will happen is
00:33:53.279 --> 00:33:58.559
instead of stopping here it will cross
00:33:56.159 --> 00:34:01.279
over because the learning rate is high
00:33:58.559 --> 00:34:03.440
and then it has to come back so it will
00:34:01.279 --> 00:34:05.440
result in what is known as like an
00:34:03.440 --> 00:34:07.760
oscillation so it will never come to
00:34:05.440 --> 00:34:10.639
this point which is known as convergence
00:34:07.760 --> 00:34:13.040
instead it will go back and forth so
00:34:10.639 --> 00:34:14.960
these are known as hyper parameters the
00:34:13.040 --> 00:34:17.520
learning rate and so on and these have
00:34:14.960 --> 00:34:20.639
to be those numbers or those values we
00:34:17.520 --> 00:34:23.040
can determine typically using trial and
00:34:20.639 --> 00:34:25.359
error out of experience we we try to
00:34:23.040 --> 00:34:28.639
find out these values so that is the
00:34:25.359 --> 00:34:30.560
gradient descent mechanism to optimize
00:34:28.639 --> 00:34:34.399
the cost function and that is what is
00:34:30.560 --> 00:34:36.560
used to train our neural network this is
00:34:34.399 --> 00:34:38.720
another representation of how the
00:34:36.560 --> 00:34:41.200
training process works and here in this
00:34:38.720 --> 00:34:44.320
example we are trying to classify these
00:34:41.200 --> 00:34:46.960
images whether they are cats or dogs and
00:34:44.320 --> 00:34:49.599
as you can see actually each image is
00:34:46.960 --> 00:34:54.000
fed in each time one image is fed rather
00:34:49.599 --> 00:34:56.960
and these values of x1 x2 up to xn are
00:34:54.000 --> 00:34:59.280
the pixel values within this image okay
00:34:56.960 --> 00:35:01.920
so those values are then taken and for
00:34:59.280 --> 00:35:04.320
each of those values a weight is
00:35:01.920 --> 00:35:06.079
multiplied and then it goes to the next
00:35:04.320 --> 00:35:08.480
layer and then to the next layer and so
00:35:06.079 --> 00:35:10.880
on ultimately it comes as the output
00:35:08.480 --> 00:35:13.839
layer and it gives an output as whether
00:35:10.880 --> 00:35:16.720
it is a dog or a cat remember the output
00:35:13.839 --> 00:35:19.520
will never be a named output so these
00:35:16.720 --> 00:35:22.400
would be like a zero or a one and we say
00:35:19.520 --> 00:35:24.400
okay zero corresponds to dogs and one
00:35:22.400 --> 00:35:26.640
corresponds to catch so that is the way
00:35:24.400 --> 00:35:28.800
it typically happens this is a binary
00:35:26.640 --> 00:35:31.280
classification we have similar
00:35:28.800 --> 00:35:32.960
situations where there can be multiple
00:35:31.280 --> 00:35:34.960
classes which means that there will be
00:35:32.960 --> 00:35:38.160
multiple more neurons in the output
00:35:34.960 --> 00:35:39.920
layer okay so this is once again a quick
00:35:38.160 --> 00:35:41.839
representation of how the forward
00:35:39.920 --> 00:35:44.400
propagation and the backward propagation
00:35:41.839 --> 00:35:46.640
works so the information is going
00:35:44.400 --> 00:35:49.119
in this direction which is basically the
00:35:46.640 --> 00:35:50.079
forward propagation and at the output
00:35:49.119 --> 00:35:53.200
level
00:35:50.079 --> 00:35:56.480
we find out what is the cost function
00:35:53.200 --> 00:35:58.560
the difference is basically sent back as
00:35:56.480 --> 00:36:01.040
part of the backward propagation and
00:35:58.560 --> 00:36:03.520
gradient descent then adjust the weights
00:36:01.040 --> 00:36:06.160
and biases for the next iteration this
00:36:03.520 --> 00:36:09.280
happens iteratively till the cost
00:36:06.160 --> 00:36:11.680
function is minimized and that is when
00:36:09.280 --> 00:36:13.760
we say the whole the network has
00:36:11.680 --> 00:36:16.160
converged or the training process has
00:36:13.760 --> 00:36:18.880
converged and there can be situations
00:36:16.160 --> 00:36:21.599
where convergence may not happen in rare
00:36:18.880 --> 00:36:24.000
cases but by and large the network will
00:36:21.599 --> 00:36:26.320
converge and after maybe a few
00:36:24.000 --> 00:36:28.160
iterations it could be tens of
00:36:26.320 --> 00:36:30.160
iterations or hundreds of iterations
00:36:28.160 --> 00:36:32.800
depending on what exactly the number of
00:36:30.160 --> 00:36:35.599
iterations can vary and then we say okay
00:36:32.800 --> 00:36:38.079
we are getting a certain accuracy and we
00:36:35.599 --> 00:36:40.800
say that is our threshold maybe 90
00:36:38.079 --> 00:36:42.880
accuracy we stop at that and we say that
00:36:40.800 --> 00:36:44.640
the system is trained the trained model
00:36:42.880 --> 00:36:47.440
is then deployed for production and so
00:36:44.640 --> 00:36:49.920
on so that is the way the neural network
00:36:47.440 --> 00:36:53.200
training happens okay so that is the way
00:36:49.920 --> 00:36:56.079
classification works in deep learning
00:36:53.200 --> 00:36:59.280
using neural network and this slide is
00:36:56.079 --> 00:37:01.520
an animation of this whole process as
00:36:59.280 --> 00:37:04.079
you can see the forward propagation the
00:37:01.520 --> 00:37:06.160
data is going forward from the input
00:37:04.079 --> 00:37:07.359
layer to the output layer and there is
00:37:06.160 --> 00:37:10.000
an output
00:37:07.359 --> 00:37:12.960
and the error is calculated the cost
00:37:10.000 --> 00:37:15.359
function is calculated and that is fed
00:37:12.960 --> 00:37:18.320
back as a part of backward propagation
00:37:15.359 --> 00:37:20.800
and that whole process repeats once
00:37:18.320 --> 00:37:23.359
again okay so remember in neural
00:37:20.800 --> 00:37:27.520
networks the training process is nothing
00:37:23.359 --> 00:37:29.760
but the finding the best values of the
00:37:27.520 --> 00:37:32.400
weights and biases for each and every
00:37:29.760 --> 00:37:34.960
neuron in the network that's all
00:37:32.400 --> 00:37:37.760
training of neural network consists of
00:37:34.960 --> 00:37:40.960
finding the optimal values of the
00:37:37.760 --> 00:37:44.800
weights and biases so that the accuracy
00:37:40.960 --> 00:37:47.040
is maximum all right so with that we
00:37:44.800 --> 00:37:51.800
come to the end of the session we all
00:37:47.040 --> 00:37:51.800
have a great day thank you very much
00:37:53.839 --> 00:37:57.359
hi there if you like this video
00:37:55.680 --> 00:38:00.000
subscribe to the simply learn youtube
00:37:57.359 --> 00:38:02.160
channel and click here to watch similar
00:38:00.000 --> 00:38:05.480
videos turn it up and get certified
00:38:02.160 --> 00:38:05.480
click here