Spaces:

NIKKI77
/

ks-version-1-1

Sleeping

App Files Files Community

ks-version-1-1 / data /subtitles /Deep Learning.vtt

NIKKI77

Deploy: GPU-ready HF Space (Docker)

903b444 3 months ago

raw

history blame contribute delete

63.4 kB

	WEBVTT - Subtitles by: DownloadYoutubeSubtitles.com

	00:00:03.040 --> 00:00:08.960
	hello and welcome to the session on deep

	00:00:05.920 --> 00:00:11.599
	learning my name is mohan and in this

	00:00:08.960 --> 00:00:14.080
	video we are going to talk about what

	00:00:11.599 --> 00:00:16.160
	deep learning is all about some of you

	00:00:14.080 --> 00:00:19.520
	may be already familiar with the image

	00:00:16.160 --> 00:00:22.640
	recognition how does image recognition

	00:00:19.520 --> 00:00:25.680
	work you can train this application or

	00:00:22.640 --> 00:00:28.080
	your machine to recognize whether a

	00:00:25.680 --> 00:00:30.080
	given image is a cat or a dog and this

	00:00:28.080 --> 00:00:32.239
	is how it works at a very high level it

	00:00:30.080 --> 00:00:34.480
	uses artificial neural network it is

	00:00:32.239 --> 00:00:36.559
	trained with some known images and

	00:00:34.480 --> 00:00:38.960
	during the training it is told if it is

	00:00:36.559 --> 00:00:41.040
	recognizing correctly or not and then

	00:00:38.960 --> 00:00:42.960
	when new images are submitted it

	00:00:41.040 --> 00:00:45.680
	recognizes correctly based on the

	00:00:42.960 --> 00:00:47.600
	accuracy of course so a little quick

	00:00:45.680 --> 00:00:50.480
	understanding about artificial neural

	00:00:47.600 --> 00:00:53.600
	networks so this is the way it does is

	00:00:50.480 --> 00:00:56.000
	you provide a lot of training data also

	00:00:53.600 --> 00:00:59.359
	known as labeled data for example in

	00:00:56.000 --> 00:01:02.640
	this case these are the images of dogs

	00:00:59.359 --> 00:01:05.600
	and the network extracts some features

	00:01:02.640 --> 00:01:08.640
	that makes a dog a dog right so that is

	00:01:05.600 --> 00:01:11.760
	known as feature extraction and based on

	00:01:08.640 --> 00:01:13.760
	that when you submit a new image of dog

	00:01:11.760 --> 00:01:15.119
	the basic features remain pretty much

	00:01:13.760 --> 00:01:17.759
	the same it may be a completely

	00:01:15.119 --> 00:01:21.280
	different image but the features of a

	00:01:17.759 --> 00:01:23.200
	dog still remain pretty much the same in

	00:01:21.280 --> 00:01:25.680
	various different images let's say

	00:01:23.200 --> 00:01:28.000
	compared to a cat and that's the way

	00:01:25.680 --> 00:01:30.479
	artificial neural network works we'll go

	00:01:28.000 --> 00:01:32.240
	into details of this uh very shortly and

	00:01:30.479 --> 00:01:35.119
	once the training is done with training

	00:01:32.240 --> 00:01:37.439
	data we then test it with some test data

	00:01:35.119 --> 00:01:39.840
	too which is basically completely new

	00:01:37.439 --> 00:01:42.240
	data which the system has not seen

	00:01:39.840 --> 00:01:43.920
	before unlike the training data and then

	00:01:42.240 --> 00:01:46.560
	we find out whether it is predicting

	00:01:43.920 --> 00:01:49.280
	correctly or not thereby we know whether

	00:01:46.560 --> 00:01:50.799
	the training is complete or it needs

	00:01:49.280 --> 00:01:53.119
	more training so that's not a very high

	00:01:50.799 --> 00:01:55.040
	level artificial neural network works so

	00:01:53.119 --> 00:01:57.119
	this is what we are going to talk about

	00:01:55.040 --> 00:01:59.200
	today our agenda looks something like

	00:01:57.119 --> 00:02:00.799
	this what is deep learning why do we

	00:01:59.200 --> 00:02:03.040
	need deep learning and then what are the

	00:02:00.799 --> 00:02:05.920
	applications of deep learning one of the

	00:02:03.040 --> 00:02:08.239
	main components the secret sauce in deep

	00:02:05.920 --> 00:02:09.599
	learning is neural networks so we're

	00:02:08.239 --> 00:02:10.879
	going to talk about what is neural

	00:02:09.599 --> 00:02:12.879
	network and

	00:02:10.879 --> 00:02:15.520
	how it works and some of its components

	00:02:12.879 --> 00:02:17.440
	like for example the activation function

	00:02:15.520 --> 00:02:20.160
	the gradient descent and so on and so

	00:02:17.440 --> 00:02:21.680
	forth so that as a part of working of a

	00:02:20.160 --> 00:02:23.520
	neural network we will go into little

	00:02:21.680 --> 00:02:26.720
	bit more details how this whole thing

	00:02:23.520 --> 00:02:29.360
	works so without much further ado let's

	00:02:26.720 --> 00:02:31.520
	get started so deep learning is

	00:02:29.360 --> 00:02:34.080
	considered to be a part of machine

	00:02:31.520 --> 00:02:36.720
	learning so this diagram very nicely

	00:02:34.080 --> 00:02:39.599
	depicts what deep learning is at a very

	00:02:36.720 --> 00:02:42.480
	high level you have the all-encompassing

	00:02:39.599 --> 00:02:45.840
	artificial intelligence which is more a

	00:02:42.480 --> 00:02:47.680
	concept rather than a technology or a

	00:02:45.840 --> 00:02:49.280
	technical concept right so it is it's

	00:02:47.680 --> 00:02:51.440
	more of a concept at a very high level

	00:02:49.280 --> 00:02:53.200
	artificial intelligence under the herd

	00:02:51.440 --> 00:02:55.360
	is actually machine learning and deep

	00:02:53.200 --> 00:02:58.560
	learning and machine learning is a

	00:02:55.360 --> 00:03:01.840
	broader concept you can say or a broader

	00:02:58.560 --> 00:03:03.280
	technology and deep learning is a subset

	00:03:01.840 --> 00:03:05.440
	of machine learning the primary

	00:03:03.280 --> 00:03:07.920
	difference between machine learning and

	00:03:05.440 --> 00:03:11.519
	deep learning is that deep learning uses

	00:03:07.920 --> 00:03:14.080
	neural networks and it is suitable for

	00:03:11.519 --> 00:03:16.480
	handling large amounts of unstructured

	00:03:14.080 --> 00:03:18.080
	data and the last but not least one of

	00:03:16.480 --> 00:03:19.599
	the major differences between machine

	00:03:18.080 --> 00:03:22.080
	learning and deep learning is that in

	00:03:19.599 --> 00:03:24.640
	machine learning the feature extraction

	00:03:22.080 --> 00:03:26.959
	or the feature engineering is done by

	00:03:24.640 --> 00:03:29.280
	the data scientists manually but in deep

	00:03:26.959 --> 00:03:30.799
	learning since we use neural networks

	00:03:29.280 --> 00:03:32.720
	the feature engineering happens

	00:03:30.799 --> 00:03:34.720
	automatically so that's a little bit of

	00:03:32.720 --> 00:03:36.000
	a quick difference between machine

	00:03:34.720 --> 00:03:38.159
	learning and deep learning and this

	00:03:36.000 --> 00:03:40.000
	diagram very nicely depicts the relation

	00:03:38.159 --> 00:03:42.239
	between artificial intelligence machine

	00:03:40.000 --> 00:03:44.319
	learning and deep learning now why do we

	00:03:42.239 --> 00:03:47.040
	need deep learning machine learning was

	00:03:44.319 --> 00:03:49.120
	there for quite some time and it can do

	00:03:47.040 --> 00:03:51.599
	a lot of stuff that probably what deep

	00:03:49.120 --> 00:03:53.680
	learning can do but it's not very good

	00:03:51.599 --> 00:03:57.200
	at handling large amounts of

	00:03:53.680 --> 00:03:59.920
	unstructured data like images voice or

	00:03:57.200 --> 00:04:01.920
	even text for that matter so traditional

	00:03:59.920 --> 00:04:03.519
	machine learning is not that very good

	00:04:01.920 --> 00:04:05.040
	at doing this traditional machine

	00:04:03.519 --> 00:04:07.040
	learning can handle large amounts of

	00:04:05.040 --> 00:04:09.120
	structured data but when it comes to

	00:04:07.040 --> 00:04:10.480
	unstructured data it's a big challenge

	00:04:09.120 --> 00:04:12.560
	so that is one of the key

	00:04:10.480 --> 00:04:15.519
	differentiators for deep learning so

	00:04:12.560 --> 00:04:18.320
	that is number one and increasingly for

	00:04:15.519 --> 00:04:20.400
	artificial intelligence we need image

	00:04:18.320 --> 00:04:22.320
	recognition and we need to process

	00:04:20.400 --> 00:04:23.680
	analyze images and voice that's the

	00:04:22.320 --> 00:04:25.520
	reason deep learning is required

	00:04:23.680 --> 00:04:27.840
	compared to let's say traditional

	00:04:25.520 --> 00:04:31.199
	machine learning it can also perform

	00:04:27.840 --> 00:04:33.120
	complex algorithms more complex than

	00:04:31.199 --> 00:04:35.919
	let's say what machine learning can do

	00:04:33.120 --> 00:04:38.000
	and it can achieve best performance with

	00:04:35.919 --> 00:04:39.919
	the large amounts of data so the more

	00:04:38.000 --> 00:04:42.800
	you have the data let's say reference

	00:04:39.919 --> 00:04:44.639
	data or label data the better the system

	00:04:42.800 --> 00:04:46.960
	will do because the training process

	00:04:44.639 --> 00:04:49.040
	will be that much better and last but

	00:04:46.960 --> 00:04:51.600
	not least with deep learning you can

	00:04:49.040 --> 00:04:53.360
	really avoid the manual process of

	00:04:51.600 --> 00:04:55.280
	feature extraction those are some of the

	00:04:53.360 --> 00:04:57.120
	reasons why we need deep learning some

	00:04:55.280 --> 00:05:00.160
	of the applications of deep learning

	00:04:57.120 --> 00:05:02.960
	deep learning has made major inroads and

	00:05:00.160 --> 00:05:05.440
	it is a major area in which deep

	00:05:02.960 --> 00:05:08.880
	learning is applied is healthcare and

	00:05:05.440 --> 00:05:12.080
	within healthcare particularly oncology

	00:05:08.880 --> 00:05:15.199
	which is basically cancer related stuff

	00:05:12.080 --> 00:05:17.919
	one of the issues with cancer is that a

	00:05:15.199 --> 00:05:20.960
	lot of cancers today are curable they

	00:05:17.919 --> 00:05:23.360
	can be cured they are detected early on

	00:05:20.960 --> 00:05:25.600
	and the challenge with that is when a

	00:05:23.360 --> 00:05:28.080
	diagnostics is performed let's say an

	00:05:25.600 --> 00:05:30.320
	image has been taken of a patient to

	00:05:28.080 --> 00:05:33.120
	detect whether there is cancer or not

	00:05:30.320 --> 00:05:35.120
	you need a specialist to look at the

	00:05:33.120 --> 00:05:38.080
	image and determine whether it is the

	00:05:35.120 --> 00:05:41.199
	patient is fine or there is any onset of

	00:05:38.080 --> 00:05:44.160
	cancer and the number of specialists are

	00:05:41.199 --> 00:05:46.639
	limited so if we use deep learning if we

	00:05:44.160 --> 00:05:48.880
	use automation here or if we use

	00:05:46.639 --> 00:05:52.000
	artificial intelligence here then the

	00:05:48.880 --> 00:05:54.639
	system can with a certain amount of the

	00:05:52.000 --> 00:05:57.520
	good amount of accuracy determine

	00:05:54.639 --> 00:06:00.000
	whether a particular patient is having

	00:05:57.520 --> 00:06:02.960
	cancer or not so the prediction or the

	00:06:00.000 --> 00:06:05.919
	detection process of a disease like

	00:06:02.960 --> 00:06:08.160
	cancer can be expedited the detection

	00:06:05.919 --> 00:06:10.800
	process can be expedited can be faster

	00:06:08.160 --> 00:06:13.600
	without really waiting for a specialist

	00:06:10.800 --> 00:06:15.919
	we can obviously then once the

	00:06:13.600 --> 00:06:18.479
	application once the artificial

	00:06:15.919 --> 00:06:20.800
	intelligence detects or predicts that

	00:06:18.479 --> 00:06:23.120
	there is an onset of a cancer this can

	00:06:20.800 --> 00:06:25.680
	be cross-checked by a doctor but at

	00:06:23.120 --> 00:06:27.520
	least the initial screening process can

	00:06:25.680 --> 00:06:29.919
	be automated and that is where the

	00:06:27.520 --> 00:06:32.160
	current focus is with respect to deep

	00:06:29.919 --> 00:06:34.560
	learning in healthcare what else

	00:06:32.160 --> 00:06:38.319
	robotics is another area deep learning

	00:06:34.560 --> 00:06:40.880
	is majorly used in robotics and you must

	00:06:38.319 --> 00:06:43.199
	have seen nowadays robots are everywhere

	00:06:40.880 --> 00:06:45.120
	humanoids the industrial robots which

	00:06:43.199 --> 00:06:48.080
	are used for manufacturing process you

	00:06:45.120 --> 00:06:50.639
	must have heard about sofia who got

	00:06:48.080 --> 00:06:53.360
	citizenship with saudi arabia and so on

	00:06:50.639 --> 00:06:55.840
	there are multiple such robots which are

	00:06:53.360 --> 00:06:58.880
	knowledge oriented but there are also

	00:06:55.840 --> 00:07:00.639
	industrial robots are used in industries

	00:06:58.880 --> 00:07:03.120
	in the manufacturing process and

	00:07:00.639 --> 00:07:05.440
	increasingly in security and also in

	00:07:03.120 --> 00:07:07.840
	defense for example image processing

	00:07:05.440 --> 00:07:10.080
	video is fed to them and they need to be

	00:07:07.840 --> 00:07:11.599
	able to detect objects obstacles and so

	00:07:10.080 --> 00:07:13.520
	on and so forth so that's where deep

	00:07:11.599 --> 00:07:15.599
	learning is used they need to be able to

	00:07:13.520 --> 00:07:17.520
	hear and make sense of the sounds that

	00:07:15.599 --> 00:07:20.400
	they are hearing that needs deep

	00:07:17.520 --> 00:07:22.800
	learning as well so robotics is a major

	00:07:20.400 --> 00:07:25.680
	area where deep learning is applied then

	00:07:22.800 --> 00:07:27.919
	we have self-driving cars or autonomous

	00:07:25.680 --> 00:07:30.960
	cars you must have heard of google's

	00:07:27.919 --> 00:07:33.759
	autonomous car which has been tested for

	00:07:30.960 --> 00:07:35.440
	millions of miles and pretty much

	00:07:33.759 --> 00:07:37.120
	incident free there were of course a

	00:07:35.440 --> 00:07:39.759
	couple of incidents here and there but

	00:07:37.120 --> 00:07:42.880
	it is uh considered to be fairly safe

	00:07:39.759 --> 00:07:45.120
	and there are today a lot of automotive

	00:07:42.880 --> 00:07:47.520
	companies in fact pretty much every

	00:07:45.120 --> 00:07:49.919
	automotive company worth its name is

	00:07:47.520 --> 00:07:52.080
	investing in self-driving cars or

	00:07:49.919 --> 00:07:54.560
	autonomous cars and it is predicted that

	00:07:52.080 --> 00:07:56.160
	in the next probably 10 to 15 years

	00:07:54.560 --> 00:07:59.120
	these will be in production and they

	00:07:56.160 --> 00:08:01.039
	will be used extensively in real life

	00:07:59.120 --> 00:08:03.039
	right now they are all in rnd and in

	00:08:01.039 --> 00:08:05.360
	test phases but pretty soon these will

	00:08:03.039 --> 00:08:07.280
	be on the road so this is another area

	00:08:05.360 --> 00:08:08.960
	where deep learning is used and how is

	00:08:07.280 --> 00:08:11.759
	it used where is it used within

	00:08:08.960 --> 00:08:14.960
	autonomous driving the car actually is

	00:08:11.759 --> 00:08:17.039
	fed with video of surroundings and it is

	00:08:14.960 --> 00:08:18.879
	supposed to process that information

	00:08:17.039 --> 00:08:20.800
	process that video and determine if

	00:08:18.879 --> 00:08:23.039
	there are any obstacles it has to

	00:08:20.800 --> 00:08:25.759
	determine if there are any cars in the

	00:08:23.039 --> 00:08:28.160
	site will detect whether it is driving

	00:08:25.759 --> 00:08:31.759
	in the lane also it has to determine

	00:08:28.160 --> 00:08:34.159
	whether the signal is green or red so

	00:08:31.759 --> 00:08:37.760
	that accordingly it can move forward or

	00:08:34.159 --> 00:08:39.599
	wait so for all these video analysis

	00:08:37.760 --> 00:08:41.919
	deep learning is used in addition to

	00:08:39.599 --> 00:08:44.720
	that the training overall training to

	00:08:41.919 --> 00:08:47.200
	drive the car happens in a deep learning

	00:08:44.720 --> 00:08:48.720
	environment so again a lot of scope here

	00:08:47.200 --> 00:08:51.120
	to use deep learning a couple of other

	00:08:48.720 --> 00:08:54.880
	applications are mission translations

	00:08:51.120 --> 00:08:57.760
	today we have a lot of information and

	00:08:54.880 --> 00:08:59.519
	very often this information is in one

	00:08:57.760 --> 00:09:03.120
	particular language and more

	00:08:59.519 --> 00:09:05.519
	specifically in english and people need

	00:09:03.120 --> 00:09:08.560
	information in various parts of the

	00:09:05.519 --> 00:09:11.120
	world it is pretty difficult for human

	00:09:08.560 --> 00:09:13.519
	beings to translate each and every piece

	00:09:11.120 --> 00:09:15.279
	of information or every document into

	00:09:13.519 --> 00:09:17.440
	all possible languages there are

	00:09:15.279 --> 00:09:19.600
	probably at least hundreds of languages

	00:09:17.440 --> 00:09:22.720
	or if not more to translate each and

	00:09:19.600 --> 00:09:25.920
	every document into every language is

	00:09:22.720 --> 00:09:28.560
	pretty difficult therefore we can use

	00:09:25.920 --> 00:09:31.440
	deep learning to do pretty much like a

	00:09:28.560 --> 00:09:33.200
	real-time translation mechanism so we

	00:09:31.440 --> 00:09:36.160
	don't have to translate everything and

	00:09:33.200 --> 00:09:38.640
	keep it ready but we train applications

	00:09:36.160 --> 00:09:41.519
	or artificial intelligence systems that

	00:09:38.640 --> 00:09:44.560
	will do the translation on the fly for

	00:09:41.519 --> 00:09:46.320
	example you go to somewhere like china

	00:09:44.560 --> 00:09:48.480
	and you want to know what is written on

	00:09:46.320 --> 00:09:50.800
	a signboard now it is impossible for

	00:09:48.480 --> 00:09:52.800
	somebody to translate that and put it on

	00:09:50.800 --> 00:09:55.440
	the web or something like that so you

	00:09:52.800 --> 00:09:57.920
	have an application which is trained to

	00:09:55.440 --> 00:10:00.000
	translate stuff on the fly so you

	00:09:57.920 --> 00:10:02.240
	probably this can be running on your

	00:10:00.000 --> 00:10:05.200
	mobile phone on your smartphone you scan

	00:10:02.240 --> 00:10:07.440
	this the application will instantly

	00:10:05.200 --> 00:10:10.240
	translate that from chinese to english

	00:10:07.440 --> 00:10:11.760
	that is one then there could be web

	00:10:10.240 --> 00:10:14.399
	applications where there may be a

	00:10:11.760 --> 00:10:16.640
	research document which is all in maybe

	00:10:14.399 --> 00:10:19.839
	chinese or japanese and you want to

	00:10:16.640 --> 00:10:22.000
	translate that to study that document or

	00:10:19.839 --> 00:10:23.839
	in that case you need to translate so

	00:10:22.000 --> 00:10:26.160
	therefore deep learning is used in such

	00:10:23.839 --> 00:10:28.160
	situations as well and that is again on

	00:10:26.160 --> 00:10:30.240
	demand so it is not like you have to

	00:10:28.160 --> 00:10:31.920
	translate all these documents from other

	00:10:30.240 --> 00:10:34.000
	languages into english and one shot and

	00:10:31.920 --> 00:10:36.480
	keep it somewhere that is again pretty

	00:10:34.000 --> 00:10:38.160
	much an impossible task but on a neat

	00:10:36.480 --> 00:10:40.399
	basis so you have systems that are

	00:10:38.160 --> 00:10:42.000
	trained to translate on the fly so

	00:10:40.399 --> 00:10:43.600
	mission translation is another major

	00:10:42.000 --> 00:10:45.920
	area where deep learning is used then

	00:10:43.600 --> 00:10:48.800
	there are a few other upcoming areas

	00:10:45.920 --> 00:10:51.279
	where synthesizing is done by neural

	00:10:48.800 --> 00:10:53.680
	nets for example music composition and

	00:10:51.279 --> 00:10:56.880
	generation of music so you can train a

	00:10:53.680 --> 00:10:59.680
	neural net to produce music even to

	00:10:56.880 --> 00:11:02.000
	compose music so this is a fun thing

	00:10:59.680 --> 00:11:04.720
	this is still upcoming it needs a lot of

	00:11:02.000 --> 00:11:06.640
	effort to train such neural net it has

	00:11:04.720 --> 00:11:09.120
	been proved that it is possible so this

	00:11:06.640 --> 00:11:11.760
	is a relatively new area and on the same

	00:11:09.120 --> 00:11:13.920
	lines colorization of images so these

	00:11:11.760 --> 00:11:15.839
	two images on the left hand side is a

	00:11:13.920 --> 00:11:18.720
	grayscale image or a black and white

	00:11:15.839 --> 00:11:20.480
	image this was colored by a neural net

	00:11:18.720 --> 00:11:22.959
	or a deep learning application as you

	00:11:20.480 --> 00:11:25.040
	can see it's done a very good job of

	00:11:22.959 --> 00:11:28.000
	applying the colors and obviously this

	00:11:25.040 --> 00:11:30.320
	was trained to do this colorization but

	00:11:28.000 --> 00:11:33.360
	yes this is one more application of deep

	00:11:30.320 --> 00:11:37.279
	learning now one of the major secret

	00:11:33.360 --> 00:11:40.160
	sauce of deep learning is neural network

	00:11:37.279 --> 00:11:42.240
	deep learning works on neural network or

	00:11:40.160 --> 00:11:45.279
	consists of neural network so let us see

	00:11:42.240 --> 00:11:49.040
	what is neural network neural network or

	00:11:45.279 --> 00:11:53.360
	artificial neural network is designed or

	00:11:49.040 --> 00:11:56.880
	based on the human brain now human brain

	00:11:53.360 --> 00:11:59.519
	consists of billions of small cells that

	00:11:56.880 --> 00:12:03.120
	are known as neurons artificial neural

	00:11:59.519 --> 00:12:05.519
	networks is in a way trying to simulate

	00:12:03.120 --> 00:12:07.839
	the human brain so this is a quick

	00:12:05.519 --> 00:12:10.399
	diagram of biological neuron a

	00:12:07.839 --> 00:12:12.959
	biological neuron consists of the major

	00:12:10.399 --> 00:12:16.079
	part which is the cell nucleus and then

	00:12:12.959 --> 00:12:18.240
	it has some tentacles kind of stuff on

	00:12:16.079 --> 00:12:20.160
	the top called dendrite and then there

	00:12:18.240 --> 00:12:22.399
	is like a long tail which is known as

	00:12:20.160 --> 00:12:24.240
	the axon further again at the end of

	00:12:22.399 --> 00:12:27.680
	this action are what are known as

	00:12:24.240 --> 00:12:30.880
	synapses these in turn are connected to

	00:12:27.680 --> 00:12:33.680
	the dendrites of the next neuron and all

	00:12:30.880 --> 00:12:35.440
	these neurons are interconnected with

	00:12:33.680 --> 00:12:37.519
	each other therefore they are like

	00:12:35.440 --> 00:12:39.440
	billions of them sitting in our brain

	00:12:37.519 --> 00:12:42.000
	and they're all active they're working

	00:12:39.440 --> 00:12:45.360
	they based on the signals they receive

	00:12:42.000 --> 00:12:47.920
	signals as inputs from other neurons or

	00:12:45.360 --> 00:12:50.639
	maybe from other parts of the body and

	00:12:47.920 --> 00:12:52.720
	based on certain criteria they send

	00:12:50.639 --> 00:12:54.800
	signals to the neurons at the other end

	00:12:52.720 --> 00:12:56.880
	so they they get either activated or

	00:12:54.800 --> 00:12:59.760
	they don't get activated based on so it

	00:12:56.880 --> 00:13:02.480
	is like a binary gates so they get

	00:12:59.760 --> 00:13:04.800
	activated or not activated based on the

	00:13:02.480 --> 00:13:06.399
	inputs that they receive and so on so we

	00:13:04.800 --> 00:13:08.720
	will see a little bit of those details

	00:13:06.399 --> 00:13:10.880
	as we move forward in our artificial

	00:13:08.720 --> 00:13:12.320
	neuron but this is a biological neuron

	00:13:10.880 --> 00:13:15.200
	this is the structure of a biological

	00:13:12.320 --> 00:13:17.680
	neuron and artificial neural network is

	00:13:15.200 --> 00:13:20.320
	based on the human brain the smallest

	00:13:17.680 --> 00:13:23.440
	component of artificial neural network

	00:13:20.320 --> 00:13:25.839
	is an artificial neuron as shown here

	00:13:23.440 --> 00:13:28.000
	sometimes is also referred to as

	00:13:25.839 --> 00:13:30.240
	perceptron now this is a very high level

	00:13:28.000 --> 00:13:32.800
	diagram the artificial neuron has a

	00:13:30.240 --> 00:13:35.760
	small central unit which will receive

	00:13:32.800 --> 00:13:38.320
	the input if it is doing let's say image

	00:13:35.760 --> 00:13:41.040
	processing the inputs could be pixel

	00:13:38.320 --> 00:13:44.480
	values of the image which is represented

	00:13:41.040 --> 00:13:47.680
	here as x1 x2 and so on each of the

	00:13:44.480 --> 00:13:50.320
	inputs are multiplied by what is known

	00:13:47.680 --> 00:13:53.200
	as weights which are represented as w1

	00:13:50.320 --> 00:13:56.240
	w2 and so on there is in the central

	00:13:53.200 --> 00:13:59.600
	unit basically there is a summation of

	00:13:56.240 --> 00:14:03.279
	these weighted inputs which is like x1

	00:13:59.600 --> 00:14:06.160
	into w1 plus x2 into w2 and so on the

	00:14:03.279 --> 00:14:08.079
	products are then added and then there

	00:14:06.160 --> 00:14:10.720
	is a bias that is added to that in the

	00:14:08.079 --> 00:14:12.959
	next slide we will see that passes

	00:14:10.720 --> 00:14:16.160
	through an activation function and the

	00:14:12.959 --> 00:14:18.720
	output comes as a y which is the output

	00:14:16.160 --> 00:14:20.880
	and based on certain criteria the cell

	00:14:18.720 --> 00:14:23.519
	gets either activated or not activated

	00:14:20.880 --> 00:14:26.959
	so this output would be like a zero or a

	00:14:23.519 --> 00:14:28.639
	one binary format okay so we will see

	00:14:26.959 --> 00:14:30.639
	that in a little bit more detail but

	00:14:28.639 --> 00:14:33.040
	let's do a quick comparison between

	00:14:30.639 --> 00:14:35.040
	biological and artificial neurons just

	00:14:33.040 --> 00:14:36.639
	like a biological neuron there are

	00:14:35.040 --> 00:14:39.600
	dendrites and then there is a cell

	00:14:36.639 --> 00:14:42.880
	nucleus and synapse and an axon

	00:14:39.600 --> 00:14:45.920
	we have in the artificial neuron as well

	00:14:42.880 --> 00:14:48.160
	these inputs come like the dead right if

	00:14:45.920 --> 00:14:50.320
	you will act like the dendrites there is

	00:14:48.160 --> 00:14:52.880
	a like a central unit which performs the

	00:14:50.320 --> 00:14:56.160
	summation of these uh weighted inputs

	00:14:52.880 --> 00:14:58.880
	which is basically w1 x1 w2 x2 and so on

	00:14:56.160 --> 00:15:00.639
	and then our bias is added here and then

	00:14:58.880 --> 00:15:02.880
	that passes through what is known as an

	00:15:00.639 --> 00:15:04.639
	activation function okay so these are

	00:15:02.880 --> 00:15:06.880
	known as the weights w1 w2 and then

	00:15:04.639 --> 00:15:09.519
	there is a bias which will come out here

	00:15:06.880 --> 00:15:11.600
	and that is added the bias is by the way

	00:15:09.519 --> 00:15:14.320
	common for a particular neuron so there

	00:15:11.600 --> 00:15:16.800
	won't be like b1 b2 b3 and so on only

	00:15:14.320 --> 00:15:19.440
	weights will be one per input the bias

	00:15:16.800 --> 00:15:22.639
	is common for the entire neuron it is

	00:15:19.440 --> 00:15:25.360
	also common for or the value of the bias

	00:15:22.639 --> 00:15:28.000
	remains the same for all the neurons in

	00:15:25.360 --> 00:15:29.920
	a particular layer we will also see this

	00:15:28.000 --> 00:15:31.600
	as we move forward and we see deep

	00:15:29.920 --> 00:15:34.160
	neural network where there are multiple

	00:15:31.600 --> 00:15:37.920
	neurons so that's the output now the

	00:15:34.160 --> 00:15:41.519
	whole exercise of training the neuron is

	00:15:37.920 --> 00:15:43.519
	about changing these weights and biases

	00:15:41.519 --> 00:15:46.000
	as i mentioned artificial neural network

	00:15:43.519 --> 00:15:48.560
	will consist of several such neurons and

	00:15:46.000 --> 00:15:50.880
	as a part of the training process these

	00:15:48.560 --> 00:15:53.120
	weights keep changing initially they are

	00:15:50.880 --> 00:15:55.360
	assigned some random values through the

	00:15:53.120 --> 00:15:57.279
	training process the weights the whole

	00:15:55.360 --> 00:16:00.880
	process of training is to come up with

	00:15:57.279 --> 00:16:02.959
	the optimum values of w1 w2 and wn and

	00:16:00.880 --> 00:16:05.519
	then the b4 or the bias for this

	00:16:02.959 --> 00:16:08.399
	particular neuron such that it gives an

	00:16:05.519 --> 00:16:11.040
	accurate output as required so let's see

	00:16:08.399 --> 00:16:13.440
	what exactly that means so the training

	00:16:11.040 --> 00:16:16.720
	process this is how it happens it takes

	00:16:13.440 --> 00:16:19.040
	the inputs each input is multiplied by a

	00:16:16.720 --> 00:16:20.639
	weight and these weights during training

	00:16:19.040 --> 00:16:23.440
	keep changing so initially they are

	00:16:20.639 --> 00:16:25.519
	assigned some random values and based on

	00:16:23.440 --> 00:16:27.519
	the output whether it is correct or

	00:16:25.519 --> 00:16:29.759
	wrong there is a feedback coming back

	00:16:27.519 --> 00:16:33.120
	and that will basically change these

	00:16:29.759 --> 00:16:36.320
	weights until it starts giving the right

	00:16:33.120 --> 00:16:39.199
	output that is represented in here as

	00:16:36.320 --> 00:16:42.320
	sigma i going from 1 to n if there are n

	00:16:39.199 --> 00:16:46.160
	inputs wi into x i so this is the

	00:16:42.320 --> 00:16:49.920
	product of w1 x1 w2 x2 and so on right

	00:16:46.160 --> 00:16:52.959
	and there is a bias that gets added here

	00:16:49.920 --> 00:16:55.360
	and that entire thing goes to what is

	00:16:52.959 --> 00:16:59.120
	known as an activation function so

	00:16:55.360 --> 00:17:02.160
	essentially this is sigma of w i x i

	00:16:59.120 --> 00:17:05.360
	plus a value of bias which is a b so

	00:17:02.160 --> 00:17:07.919
	that entire thing goes as an input to an

	00:17:05.360 --> 00:17:10.480
	activation function now this activation

	00:17:07.919 --> 00:17:13.520
	function takes this as an input gives

	00:17:10.480 --> 00:17:15.439
	the output as a binary output it could

	00:17:13.520 --> 00:17:17.439
	be a zero or a one there are of course

	00:17:15.439 --> 00:17:18.959
	to start with let's assume it's a binary

	00:17:17.439 --> 00:17:20.799
	output later we will see that there are

	00:17:18.959 --> 00:17:23.120
	different types of activation functions

	00:17:20.799 --> 00:17:25.439
	so it need not always be binary output

	00:17:23.120 --> 00:17:28.160
	but to start with let's keep simple so

	00:17:25.439 --> 00:17:30.799
	it decides whether the neuron should be

	00:17:28.160 --> 00:17:33.280
	fired or not so that is the output like

	00:17:30.799 --> 00:17:35.280
	a binary output 0 or 1. all right so

	00:17:33.280 --> 00:17:36.960
	again let me summarize this so it takes

	00:17:35.280 --> 00:17:39.280
	the inputs so if you're processing an

	00:17:36.960 --> 00:17:42.559
	image for example the inputs are the

	00:17:39.280 --> 00:17:44.559
	pixel values of the image x1 x2 up to xn

	00:17:42.559 --> 00:17:46.480
	there could be hundreds of these so all

	00:17:44.559 --> 00:17:48.559
	of those are fed as so these are some

	00:17:46.480 --> 00:17:51.200
	values and these pixel values again can

	00:17:48.559 --> 00:17:54.400
	be from 0 to 56 each of those pixel

	00:17:51.200 --> 00:17:56.160
	values are then multiplied with what is

	00:17:54.400 --> 00:17:58.160
	known as a weight this is a numeric

	00:17:56.160 --> 00:18:01.360
	value can be any value so this is a

	00:17:58.160 --> 00:18:03.679
	number w1 similarly w2 is a number so

	00:18:01.360 --> 00:18:05.600
	initially some random values will be

	00:18:03.679 --> 00:18:07.520
	assigned and each of these weights are

	00:18:05.600 --> 00:18:09.919
	multiplied with the input value and

	00:18:07.520 --> 00:18:12.320
	their sum this is known as the weighted

	00:18:09.919 --> 00:18:14.960
	sum so that is performed in this kind of

	00:18:12.320 --> 00:18:17.440
	the central unit and then a bias is

	00:18:14.960 --> 00:18:20.080
	added remember the bias is common for

	00:18:17.440 --> 00:18:21.760
	each neuron so this is not the bias

	00:18:20.080 --> 00:18:24.559
	value is not one

	00:18:21.760 --> 00:18:26.640
	bias value for per input so just keep

	00:18:24.559 --> 00:18:28.640
	that in mind the bias value there is one

	00:18:26.640 --> 00:18:31.360
	bias per neuron so it is like this

	00:18:28.640 --> 00:18:33.200
	summation plus bias is the output from

	00:18:31.360 --> 00:18:34.880
	the section this is not the complete

	00:18:33.200 --> 00:18:37.600
	output of the neuron but this is the

	00:18:34.880 --> 00:18:39.200
	bias for output for step one that goes

	00:18:37.600 --> 00:18:41.520
	as an input to what is known as

	00:18:39.200 --> 00:18:44.320
	activation function and that activation

	00:18:41.520 --> 00:18:46.720
	function results in an output usually a

	00:18:44.320 --> 00:18:49.440
	binary output like a zero or a one which

	00:18:46.720 --> 00:18:51.919
	is known as the firing of the neuron

	00:18:49.440 --> 00:18:53.840
	okay good so we talked about activation

	00:18:51.919 --> 00:18:55.760
	function so what is an activation

	00:18:53.840 --> 00:18:58.880
	function an activation function

	00:18:55.760 --> 00:19:02.640
	basically takes the weighted sum which

	00:18:58.880 --> 00:19:05.520
	is we saw w1 x1 w2 x2 the sum of all

	00:19:02.640 --> 00:19:08.799
	that plus the bias so it takes that as

	00:19:05.520 --> 00:19:10.640
	an input and it generates a certain

	00:19:08.799 --> 00:19:12.640
	output now there are different types of

	00:19:10.640 --> 00:19:14.160
	activation functions and the output is

	00:19:12.640 --> 00:19:16.720
	different for different types of

	00:19:14.160 --> 00:19:18.720
	activation functions moreover why is an

	00:19:16.720 --> 00:19:20.960
	activation function required it is

	00:19:18.720 --> 00:19:23.520
	basically required to bring in

	00:19:20.960 --> 00:19:25.760
	non-linearity that's the main reason why

	00:19:23.520 --> 00:19:26.880
	an activation function is required so

	00:19:25.760 --> 00:19:28.720
	what are the different types of

	00:19:26.880 --> 00:19:30.720
	activation functions there are several

	00:19:28.720 --> 00:19:32.720
	types of activation functions but these

	00:19:30.720 --> 00:19:35.200
	are the most common ones these are the

	00:19:32.720 --> 00:19:37.600
	ones that are currently in use sigmoid

	00:19:35.200 --> 00:19:41.440
	function was one of the early activation

	00:19:37.600 --> 00:19:44.400
	functions but today relu has kind of

	00:19:41.440 --> 00:19:46.960
	taken over so relu is by far the most

	00:19:44.400 --> 00:19:49.600
	popular activation function that is used

	00:19:46.960 --> 00:19:52.320
	today but still sigmoid function is

	00:19:49.600 --> 00:19:54.160
	still used in many situations these

	00:19:52.320 --> 00:19:56.400
	different types of activation functions

	00:19:54.160 --> 00:19:58.080
	are used in different situations based

	00:19:56.400 --> 00:20:00.000
	on the kind of problem we are trying to

	00:19:58.080 --> 00:20:01.840
	solve so what exactly is the difference

	00:20:00.000 --> 00:20:03.919
	between these two sigmoid gives the

	00:20:01.840 --> 00:20:06.799
	values of the output will be between 0

	00:20:03.919 --> 00:20:07.760
	and 1. threshold function is the value

	00:20:06.799 --> 00:20:10.240
	will be

	00:20:07.760 --> 00:20:12.400
	0 up to a certain value and beyond that

	00:20:10.240 --> 00:20:14.960
	this is also known as a step function

	00:20:12.400 --> 00:20:17.600
	and beyond that it will be 1. in case of

	00:20:14.960 --> 00:20:19.520
	sigmoid there is a gradual increase but

	00:20:17.600 --> 00:20:22.000
	in case of threshold it's like also

	00:20:19.520 --> 00:20:24.400
	known as a step function there's a rapid

	00:20:22.000 --> 00:20:26.080
	or instantaneous change from zero to one

	00:20:24.400 --> 00:20:28.400
	whereas in sigmoid we will see in the

	00:20:26.080 --> 00:20:30.640
	next slide there is a gradual increase

	00:20:28.400 --> 00:20:33.200
	but the value in this case is between

	00:20:30.640 --> 00:20:35.600
	zero and one as well now relu function

	00:20:33.200 --> 00:20:38.880
	on the other hand it is equal to

	00:20:35.600 --> 00:20:42.960
	basically if the input is 0 or less than

	00:20:38.880 --> 00:20:46.000
	0 then the output is 0 whereas if the

	00:20:42.960 --> 00:20:48.000
	input is greater than 0 then the output

	00:20:46.000 --> 00:20:49.919
	is equal to the input i know it's a

	00:20:48.000 --> 00:20:52.400
	little confusing but in the next slides

	00:20:49.919 --> 00:20:54.720
	where we show the relu function it will

	00:20:52.400 --> 00:20:57.679
	become clear similarly hyperbolic

	00:20:54.720 --> 00:21:00.159
	tangent this is similar to sigmoid in

	00:20:57.679 --> 00:21:03.360
	terms of the shape of the function

	00:21:00.159 --> 00:21:06.400
	however while sigmoid goes from 0 to 1

	00:21:03.360 --> 00:21:09.520
	hyperbolic tangent goes from -1 to 1 and

	00:21:06.400 --> 00:21:13.760
	here again the increase or the change

	00:21:09.520 --> 00:21:15.760
	from -1 to 1 is gradual and not like

	00:21:13.760 --> 00:21:18.080
	threshold or step function where it

	00:21:15.760 --> 00:21:20.159
	happens instantaneously so let's take a

	00:21:18.080 --> 00:21:21.919
	little detailed look at some of these

	00:21:20.159 --> 00:21:23.919
	functions so let's start with the

	00:21:21.919 --> 00:21:26.559
	sigmoid function so this is the equation

	00:21:23.919 --> 00:21:29.679
	of a sigmoid function which is 1 by 1

	00:21:26.559 --> 00:21:32.799
	plus e to the power of minus x so x is

	00:21:29.679 --> 00:21:36.880
	the value that is the input it goes from

	00:21:32.799 --> 00:21:40.000
	0 to -1 so this is sigmoid function the

	00:21:36.880 --> 00:21:42.640
	equation is phi x is equal to 1 by 1

	00:21:40.000 --> 00:21:44.400
	plus e to the power of minus x and as

	00:21:42.640 --> 00:21:47.520
	you can see here this is the input on

	00:21:44.400 --> 00:21:49.600
	the x-axis as x is where the value is

	00:21:47.520 --> 00:21:51.440
	coming from in fact it can also go

	00:21:49.600 --> 00:21:53.200
	negative this is negative actually so

	00:21:51.440 --> 00:21:55.520
	this is the zero so this is the negative

	00:21:53.200 --> 00:21:58.720
	value of x so as x is coming from

	00:21:55.520 --> 00:22:02.080
	negative value towards zero the value of

	00:21:58.720 --> 00:22:05.120
	the output slowly as it is approaching

	00:22:02.080 --> 00:22:08.320
	zero it it slowly and very gently

	00:22:05.120 --> 00:22:11.600
	increases and actually at the point let

	00:22:08.320 --> 00:22:15.919
	me just use a pen at the point here it

	00:22:11.600 --> 00:22:19.039
	is it is 0.5 it is actually 0.5 okay and

	00:22:15.919 --> 00:22:21.440
	slowly gradually it increases to 1 as

	00:22:19.039 --> 00:22:24.400
	the value of x increases but then as the

	00:22:21.440 --> 00:22:27.360
	value of x increases it tapers off it

	00:22:24.400 --> 00:22:29.840
	doesn't go beyond one so that is the

	00:22:27.360 --> 00:22:32.320
	speciality of sigmoid function so the

	00:22:29.840 --> 00:22:34.960
	output value will remain between zero

	00:22:32.320 --> 00:22:37.360
	and one it will never go below zero or

	00:22:34.960 --> 00:22:39.679
	above one okay then so that is sigmoid

	00:22:37.360 --> 00:22:42.000
	function now this is threshold function

	00:22:39.679 --> 00:22:44.880
	or this is also referred to as a step

	00:22:42.000 --> 00:22:46.640
	function and here we can also set the

	00:22:44.880 --> 00:22:48.240
	threshold in this case it is that's why

	00:22:46.640 --> 00:22:50.720
	it's called the threshold function

	00:22:48.240 --> 00:22:52.559
	normally it is 0 but you can also set a

	00:22:50.720 --> 00:22:54.240
	different value for the threshold now

	00:22:52.559 --> 00:22:57.120
	the difference between this and the

	00:22:54.240 --> 00:22:59.840
	sigmoid is that here the change is rapid

	00:22:57.120 --> 00:23:02.799
	or instantaneous as the x value comes

	00:22:59.840 --> 00:23:06.240
	from negative up to zero it remains zero

	00:23:02.799 --> 00:23:08.640
	and at zero it pretty much immediately

	00:23:06.240 --> 00:23:11.280
	increases to 1 okay so this is a

	00:23:08.640 --> 00:23:13.919
	mathematical representation of threshold

	00:23:11.280 --> 00:23:16.799
	function phi x is equal to 1 if x is

	00:23:13.919 --> 00:23:18.799
	greater than equal to 0 and 0 if x is

	00:23:16.799 --> 00:23:20.640
	less than 0. so for all negative values

	00:23:18.799 --> 00:23:23.120
	it is 0 which since we have set the

	00:23:20.640 --> 00:23:25.679
	threshold to be 0 so as soon as it

	00:23:23.120 --> 00:23:28.640
	reaches 0 it becomes 1. you see the

	00:23:25.679 --> 00:23:31.520
	difference between this and the previous

	00:23:28.640 --> 00:23:34.720
	one which is basically the sigmoid where

	00:23:31.520 --> 00:23:37.120
	the increase from 0 to 1 is gradual and

	00:23:34.720 --> 00:23:39.200
	here it is instantaneous and that's why

	00:23:37.120 --> 00:23:41.440
	this is also known as a step function

	00:23:39.200 --> 00:23:43.679
	threshold function or step function this

	00:23:41.440 --> 00:23:46.159
	is a relu a relu is one of the most

	00:23:43.679 --> 00:23:48.799
	popular activation functions today this

	00:23:46.159 --> 00:23:51.679
	is the definition of relu phi x is equal

	00:23:48.799 --> 00:23:54.400
	to max of x comma zero what it says is

	00:23:51.679 --> 00:23:55.679
	if the value of x is less than zero then

	00:23:54.400 --> 00:23:58.880
	phi x is

	00:23:55.679 --> 00:24:03.600
	zero the moment it increases goes beyond

	00:23:58.880 --> 00:24:06.720
	zero the value of phi x is equal to x so

	00:24:03.600 --> 00:24:08.799
	it doesn't stop at one actually it goes

	00:24:06.720 --> 00:24:10.720
	all the way so as the value of x

	00:24:08.799 --> 00:24:13.440
	increases the value of y will also

	00:24:10.720 --> 00:24:17.760
	increase infinitely so there is no limit

	00:24:13.440 --> 00:24:19.760
	here unlike your sigmoid or threshold or

	00:24:17.760 --> 00:24:22.559
	the next one which is basically

	00:24:19.760 --> 00:24:25.200
	hyperbolic tangent okay so in case of

	00:24:22.559 --> 00:24:28.080
	relu remember there is no upper limit

	00:24:25.200 --> 00:24:31.039
	the output is equal to either 0 in case

	00:24:28.080 --> 00:24:34.240
	the value of x is negative or it is

	00:24:31.039 --> 00:24:37.039
	equal to the value of x so for example

	00:24:34.240 --> 00:24:39.840
	here if the value of x is 10 then the

	00:24:37.039 --> 00:24:42.960
	value of y is also 10 right okay so that

	00:24:39.840 --> 00:24:45.679
	is relu and there are several advantages

	00:24:42.960 --> 00:24:48.159
	of relu and it is much more efficient

	00:24:45.679 --> 00:24:49.840
	and provides much more accuracy compared

	00:24:48.159 --> 00:24:51.679
	to other activation functions like

	00:24:49.840 --> 00:24:54.320
	sigmoid and so on so that's the reason

	00:24:51.679 --> 00:24:56.640
	it is very popular all right so this is

	00:24:54.320 --> 00:24:58.640
	hyperbolic tangent activation function

	00:24:56.640 --> 00:25:01.279
	the function looks similar to sigmoid

	00:24:58.640 --> 00:25:03.360
	function the curve if you see the shape

	00:25:01.279 --> 00:25:05.279
	it looks similar to sigmoid function but

	00:25:03.360 --> 00:25:08.080
	the difference between hyperbolic

	00:25:05.279 --> 00:25:10.799
	tangent and sigmoid function is that in

	00:25:08.080 --> 00:25:13.200
	case of sigmoid the output goes from

	00:25:10.799 --> 00:25:16.960
	zero to one whereas in case of

	00:25:13.200 --> 00:25:18.559
	hyperbolic tangent it goes from -1 to 1

	00:25:16.960 --> 00:25:21.360
	so that is the difference between

	00:25:18.559 --> 00:25:23.840
	hyperbolic tangent and sigmoid function

	00:25:21.360 --> 00:25:26.799
	otherwise the shape looks very similar

	00:25:23.840 --> 00:25:29.279
	there is a gradual increase unlike the

	00:25:26.799 --> 00:25:31.840
	step function where there was an instant

	00:25:29.279 --> 00:25:34.159
	increase or instant change here again

	00:25:31.840 --> 00:25:37.679
	very similar to sigmoid function the

	00:25:34.159 --> 00:25:40.080
	value changes gradually from -1 to 1. so

	00:25:37.679 --> 00:25:42.720
	this is the equation of hyperbolic

	00:25:40.080 --> 00:25:44.799
	tangent activation function yeah so then

	00:25:42.720 --> 00:25:47.200
	let's move on this is a diagrammatic

	00:25:44.799 --> 00:25:50.880
	representation of the activation

	00:25:47.200 --> 00:25:53.440
	function and how the overall data how

	00:25:50.880 --> 00:25:55.840
	the overall progression happens from

	00:25:53.440 --> 00:25:57.679
	input to the output so we get the input

	00:25:55.840 --> 00:25:59.919
	from the input layer by the way the

	00:25:57.679 --> 00:26:01.440
	neural network has three layers

	00:25:59.919 --> 00:26:03.120
	typically there will be three layers

	00:26:01.440 --> 00:26:04.880
	there is an input layer there is an

	00:26:03.120 --> 00:26:07.600
	output layer and then you have the

	00:26:04.880 --> 00:26:10.240
	hidden layer so the inputs come from the

	00:26:07.600 --> 00:26:12.240
	input layer and they get processed in

	00:26:10.240 --> 00:26:14.400
	the hidden layer and then you get the

	00:26:12.240 --> 00:26:16.960
	output in the output layer so let's take

	00:26:14.400 --> 00:26:19.840
	a little bit of a detailed look into the

	00:26:16.960 --> 00:26:22.880
	working of a neural network so let's say

	00:26:19.840 --> 00:26:25.679
	we want to classify some images between

	00:26:22.880 --> 00:26:28.400
	dogs and cats how do we do this this is

	00:26:25.679 --> 00:26:30.159
	known as a classification process and we

	00:26:28.400 --> 00:26:31.600
	are trying to use neural networks and

	00:26:30.159 --> 00:26:33.520
	deep learning to implement this

	00:26:31.600 --> 00:26:37.440
	classification so how do we do that so

	00:26:33.520 --> 00:26:40.159
	this is how it works so you have four

	00:26:37.440 --> 00:26:42.559
	layer neural network there is an input

	00:26:40.159 --> 00:26:45.440
	layer there is an output layer and then

	00:26:42.559 --> 00:26:49.440
	there are two hidden layers and what we

	00:26:45.440 --> 00:26:52.080
	do is we provide labeled training data

	00:26:49.440 --> 00:26:54.640
	which means these images are fed to the

	00:26:52.080 --> 00:26:57.120
	network with the label saying that okay

	00:26:54.640 --> 00:27:00.159
	this is a cat the neural network is

	00:26:57.120 --> 00:27:02.480
	allowed to process it and come up with a

	00:27:00.159 --> 00:27:05.039
	prediction saying whether it is a cat or

	00:27:02.480 --> 00:27:07.200
	a dog and obviously in the beginning

	00:27:05.039 --> 00:27:09.760
	there may be mistakes a cat may be

	00:27:07.200 --> 00:27:12.080
	classified as a dog so we then say that

	00:27:09.760 --> 00:27:14.000
	okay this is wrong this output is wrong

	00:27:12.080 --> 00:27:16.559
	but every time it predicts correctly we

	00:27:14.000 --> 00:27:19.120
	say yes this output is correct so that

	00:27:16.559 --> 00:27:21.760
	learning process so it will go back make

	00:27:19.120 --> 00:27:24.720
	some changes to its weights and biases

	00:27:21.760 --> 00:27:26.799
	we again feed these inputs and it will

	00:27:24.720 --> 00:27:28.799
	give us the output we will check whether

	00:27:26.799 --> 00:27:31.360
	it is correct or not and so on so this

	00:27:28.799 --> 00:27:34.320
	is a iterative process which is known as

	00:27:31.360 --> 00:27:36.880
	the training process so we are training

	00:27:34.320 --> 00:27:39.440
	the neural network and what happens in

	00:27:36.880 --> 00:27:41.760
	the training process these weights and

	00:27:39.440 --> 00:27:45.600
	biases you remember there were weights

	00:27:41.760 --> 00:27:48.880
	like w1 w2 and so on so these weights

	00:27:45.600 --> 00:27:51.679
	and biases keep changing every time you

	00:27:48.880 --> 00:27:53.760
	feed these which is known as an epoch so

	00:27:51.679 --> 00:27:56.159
	there are multiple iterations every

	00:27:53.760 --> 00:27:58.960
	iteration is known as an epoch and each

	00:27:56.159 --> 00:28:01.279
	time the weights are dated to make sure

	00:27:58.960 --> 00:28:03.679
	that the maximum number of images are

	00:28:01.279 --> 00:28:06.080
	classified correctly so once again what

	00:28:03.679 --> 00:28:09.600
	is the input this input could be like

	00:28:06.080 --> 00:28:12.159
	1000 images of cats and dogs and they

	00:28:09.600 --> 00:28:14.559
	are labeled because we know which is a

	00:28:12.159 --> 00:28:17.039
	cat and which is a dog and we feed those

	00:28:14.559 --> 00:28:18.960
	thousand images the neural network will

	00:28:17.039 --> 00:28:20.799
	initially assign some weights and biases

	00:28:18.960 --> 00:28:23.120
	for each neuron and it will try to

	00:28:20.799 --> 00:28:25.120
	process extract the features from the

	00:28:23.120 --> 00:28:27.279
	images and it will try to come up with a

	00:28:25.120 --> 00:28:29.679
	prediction for each image and that

	00:28:27.279 --> 00:28:32.240
	prediction that is calculated by the

	00:28:29.679 --> 00:28:34.240
	network is compared with the actual

	00:28:32.240 --> 00:28:36.399
	value whether it is a cat or a dog and

	00:28:34.240 --> 00:28:38.559
	that's how the error is calculated so

	00:28:36.399 --> 00:28:41.279
	let's say there are a thousand images

	00:28:38.559 --> 00:28:43.200
	and in the first run only 500 of them

	00:28:41.279 --> 00:28:45.440
	have been correctly classified that

	00:28:43.200 --> 00:28:47.440
	means we are getting only 50 accuracy so

	00:28:45.440 --> 00:28:49.760
	we feed that information back to the

	00:28:47.440 --> 00:28:51.919
	network further update these weights and

	00:28:49.760 --> 00:28:54.480
	biases for each of the neurons and we

	00:28:51.919 --> 00:28:56.320
	run this these inputs once again it will

	00:28:54.480 --> 00:28:58.000
	try to calculate extract the features

	00:28:56.320 --> 00:28:59.840
	and it will try to predict which of

	00:28:58.000 --> 00:29:02.399
	these is cats and dogs and this time

	00:28:59.840 --> 00:29:04.480
	let's say out of thousand 700 of them

	00:29:02.399 --> 00:29:06.720
	have been predicted correctly so that

	00:29:04.480 --> 00:29:09.679
	means in the second iteration the

	00:29:06.720 --> 00:29:12.559
	accuracy has increased from 50 to 70

	00:29:09.679 --> 00:29:15.039
	percent all right then we go back again

	00:29:12.559 --> 00:29:17.760
	we feed this maybe for a third iteration

	00:29:15.039 --> 00:29:20.799
	fourth iteration and so on and slowly

	00:29:17.760 --> 00:29:23.360
	and steadily the accuracy of this

	00:29:20.799 --> 00:29:26.080
	network will keep increasing and it may

	00:29:23.360 --> 00:29:28.240
	reach probably you never know 90 95

	00:29:26.080 --> 00:29:30.240
	percent and there are several parameters

	00:29:28.240 --> 00:29:32.720
	that are known as hyper parameters that

	00:29:30.240 --> 00:29:34.880
	need to be changed and tweaked and that

	00:29:32.720 --> 00:29:37.760
	is the overall training process and

	00:29:34.880 --> 00:29:39.200
	ultimately at some point we say okay you

	00:29:37.760 --> 00:29:42.080
	will probably never reach hundred

	00:29:39.200 --> 00:29:44.159
	percent accuracy but then we set a limit

	00:29:42.080 --> 00:29:46.080
	saying that okay if we receive 95

	00:29:44.159 --> 00:29:48.399
	percent accuracy that is good enough for

	00:29:46.080 --> 00:29:50.320
	our application and then we say okay our

	00:29:48.399 --> 00:29:53.120
	training process is done so that is the

	00:29:50.320 --> 00:29:55.760
	way training happens and once the

	00:29:53.120 --> 00:29:58.399
	training is done now with the training

	00:29:55.760 --> 00:30:01.039
	data set the system has let's say seen

	00:29:58.399 --> 00:30:03.760
	all these thousand images therefore what

	00:30:01.039 --> 00:30:05.840
	we do is the next step like in any

	00:30:03.760 --> 00:30:08.399
	normal machine learning process we do

	00:30:05.840 --> 00:30:10.799
	the testing where we take a fresh set of

	00:30:08.399 --> 00:30:13.039
	images and we feed it to the network the

	00:30:10.799 --> 00:30:14.880
	fresh set which it has not seen before

	00:30:13.039 --> 00:30:16.559
	as a part of the training process and

	00:30:14.880 --> 00:30:18.159
	this is again nothing new in deep

	00:30:16.559 --> 00:30:20.720
	learning this was there in machine

	00:30:18.159 --> 00:30:23.440
	learning as well so you feed the test

	00:30:20.720 --> 00:30:25.520
	images and then find out whether we are

	00:30:23.440 --> 00:30:27.600
	getting a similar accuracy or not so

	00:30:25.520 --> 00:30:29.520
	maybe that accuracy may reduce a little

	00:30:27.600 --> 00:30:31.840
	bit while training you may get 98

	00:30:29.520 --> 00:30:33.760
	percent and then for test you may get 95

	00:30:31.840 --> 00:30:36.480
	percent but there shouldn't be a drastic

	00:30:33.760 --> 00:30:38.880
	drop like for example you get 98 in

	00:30:36.480 --> 00:30:40.799
	training and then you get 50 or 40

	00:30:38.880 --> 00:30:43.279
	percent with the test that means your

	00:30:40.799 --> 00:30:46.320
	network has not learned you may have to

	00:30:43.279 --> 00:30:47.919
	retrain your network so that is the way

	00:30:46.320 --> 00:30:50.799
	neural network training works and

	00:30:47.919 --> 00:30:53.279
	remember the whole process is about

	00:30:50.799 --> 00:30:55.679
	changing these weights and biases and

	00:30:53.279 --> 00:30:57.520
	coming up with the optimal values of

	00:30:55.679 --> 00:31:00.240
	these weights and biases so that the

	00:30:57.520 --> 00:31:02.960
	accuracy is the maximum possible all

	00:31:00.240 --> 00:31:04.960
	right so a little bit more detail about

	00:31:02.960 --> 00:31:07.520
	how this whole thing works so this is

	00:31:04.960 --> 00:31:09.840
	known as forward propagation which is

	00:31:07.520 --> 00:31:12.320
	the data or the information is going in

	00:31:09.840 --> 00:31:15.279
	the forward direction the inputs are

	00:31:12.320 --> 00:31:18.399
	taken weighted summation is done bias is

	00:31:15.279 --> 00:31:21.039
	added here and then that is fed to the

	00:31:18.399 --> 00:31:23.200
	activation function and then that is

	00:31:21.039 --> 00:31:25.360
	that comes out as an output so that is

	00:31:23.200 --> 00:31:27.360
	forward propagation and the output is

	00:31:25.360 --> 00:31:29.039
	compared with the actual value and that

	00:31:27.360 --> 00:31:31.200
	will give us the error the difference

	00:31:29.039 --> 00:31:33.679
	between them is the error and in

	00:31:31.200 --> 00:31:36.720
	technical terms that is also known as

	00:31:33.679 --> 00:31:38.880
	our cost function and this is what we

	00:31:36.720 --> 00:31:40.559
	would like to minimize there are

	00:31:38.880 --> 00:31:44.000
	different ways of defining the cost

	00:31:40.559 --> 00:31:47.200
	function but one of the simplest ways is

	00:31:44.000 --> 00:31:49.120
	mean square error so it is nothing but

	00:31:47.200 --> 00:31:51.919
	the square of the difference of the

	00:31:49.120 --> 00:31:53.679
	errors or the sum of the squares of the

	00:31:51.919 --> 00:31:56.240
	difference of the errors and this is

	00:31:53.679 --> 00:31:57.760
	also nothing new we have probably if

	00:31:56.240 --> 00:31:59.760
	you're familiar with machine learning

	00:31:57.760 --> 00:32:02.159
	you must have come across this mean

	00:31:59.760 --> 00:32:04.320
	square now there are different ways of

	00:32:02.159 --> 00:32:06.240
	defining cost function it need not

	00:32:04.320 --> 00:32:08.720
	always be the mean square error but the

	00:32:06.240 --> 00:32:11.760
	most common one is this so you define

	00:32:08.720 --> 00:32:15.200
	this cost function and you ask the

	00:32:11.760 --> 00:32:17.600
	system to minimize this error so we use

	00:32:15.200 --> 00:32:21.039
	what is known as an optimization

	00:32:17.600 --> 00:32:23.519
	function to minimize this error and the

	00:32:21.039 --> 00:32:25.840
	error itself sent back to the system as

	00:32:23.519 --> 00:32:27.600
	feedback and that is known as back

	00:32:25.840 --> 00:32:30.080
	propagation and so this is the cost

	00:32:27.600 --> 00:32:32.880
	function and how do we optimize the cost

	00:32:30.080 --> 00:32:35.919
	function we use what is known as

	00:32:32.880 --> 00:32:39.519
	gradient descent so the gradient descent

	00:32:35.919 --> 00:32:42.480
	mechanism identifies how to change the

	00:32:39.519 --> 00:32:45.760
	weights and biases so that the cost

	00:32:42.480 --> 00:32:47.919
	function is minimized and there is also

	00:32:45.760 --> 00:32:50.159
	what is known as the rate or the

	00:32:47.919 --> 00:32:53.120
	learning rate that is what is shown here

	00:32:50.159 --> 00:32:55.919
	as slower and faster so you need to

	00:32:53.120 --> 00:32:59.360
	specify what should be the learning rate

	00:32:55.919 --> 00:33:02.480
	now if the learning rate is very small

	00:32:59.360 --> 00:33:04.480
	then it will probably take very long to

	00:33:02.480 --> 00:33:07.279
	train whereas if the learning rate is

	00:33:04.480 --> 00:33:09.840
	very high then it will appear to be

	00:33:07.279 --> 00:33:12.159
	faster but then it will probably never

	00:33:09.840 --> 00:33:14.480
	what is known as converge now what is

	00:33:12.159 --> 00:33:17.760
	convergence now we are talking about a

	00:33:14.480 --> 00:33:20.159
	few terms here convergence is like this

	00:33:17.760 --> 00:33:24.000
	this is a representation of convergence

	00:33:20.159 --> 00:33:26.240
	so the whole idea of gradient descent is

	00:33:24.000 --> 00:33:28.640
	to optimize the cost function or

	00:33:26.240 --> 00:33:30.880
	minimize the cost function in order to

	00:33:28.640 --> 00:33:34.000
	do that we need to represent the cost

	00:33:30.880 --> 00:33:36.480
	function as this curve we need to come

	00:33:34.000 --> 00:33:38.960
	to this minimum value that is what is

	00:33:36.480 --> 00:33:41.840
	known as the minimization of the cost

	00:33:38.960 --> 00:33:44.720
	function now what happens if we have the

	00:33:41.840 --> 00:33:48.000
	learning rate very small is that it will

	00:33:44.720 --> 00:33:51.200
	take very long to come to this point on

	00:33:48.000 --> 00:33:53.279
	the other hand if you have large higher

	00:33:51.200 --> 00:33:56.159
	learning rate what will happen is

	00:33:53.279 --> 00:33:58.559
	instead of stopping here it will cross

	00:33:56.159 --> 00:34:01.279
	over because the learning rate is high

	00:33:58.559 --> 00:34:03.440
	and then it has to come back so it will

	00:34:01.279 --> 00:34:05.440
	result in what is known as like an

	00:34:03.440 --> 00:34:07.760
	oscillation so it will never come to

	00:34:05.440 --> 00:34:10.639
	this point which is known as convergence

	00:34:07.760 --> 00:34:13.040
	instead it will go back and forth so

	00:34:10.639 --> 00:34:14.960
	these are known as hyper parameters the

	00:34:13.040 --> 00:34:17.520
	learning rate and so on and these have

	00:34:14.960 --> 00:34:20.639
	to be those numbers or those values we

	00:34:17.520 --> 00:34:23.040
	can determine typically using trial and

	00:34:20.639 --> 00:34:25.359
	error out of experience we we try to

	00:34:23.040 --> 00:34:28.639
	find out these values so that is the

	00:34:25.359 --> 00:34:30.560
	gradient descent mechanism to optimize

	00:34:28.639 --> 00:34:34.399
	the cost function and that is what is

	00:34:30.560 --> 00:34:36.560
	used to train our neural network this is

	00:34:34.399 --> 00:34:38.720
	another representation of how the

	00:34:36.560 --> 00:34:41.200
	training process works and here in this

	00:34:38.720 --> 00:34:44.320
	example we are trying to classify these

	00:34:41.200 --> 00:34:46.960
	images whether they are cats or dogs and

	00:34:44.320 --> 00:34:49.599
	as you can see actually each image is

	00:34:46.960 --> 00:34:54.000
	fed in each time one image is fed rather

	00:34:49.599 --> 00:34:56.960
	and these values of x1 x2 up to xn are

	00:34:54.000 --> 00:34:59.280
	the pixel values within this image okay

	00:34:56.960 --> 00:35:01.920
	so those values are then taken and for

	00:34:59.280 --> 00:35:04.320
	each of those values a weight is

	00:35:01.920 --> 00:35:06.079
	multiplied and then it goes to the next

	00:35:04.320 --> 00:35:08.480
	layer and then to the next layer and so

	00:35:06.079 --> 00:35:10.880
	on ultimately it comes as the output

	00:35:08.480 --> 00:35:13.839
	layer and it gives an output as whether

	00:35:10.880 --> 00:35:16.720
	it is a dog or a cat remember the output

	00:35:13.839 --> 00:35:19.520
	will never be a named output so these

	00:35:16.720 --> 00:35:22.400
	would be like a zero or a one and we say

	00:35:19.520 --> 00:35:24.400
	okay zero corresponds to dogs and one

	00:35:22.400 --> 00:35:26.640
	corresponds to catch so that is the way

	00:35:24.400 --> 00:35:28.800
	it typically happens this is a binary

	00:35:26.640 --> 00:35:31.280
	classification we have similar

	00:35:28.800 --> 00:35:32.960
	situations where there can be multiple

	00:35:31.280 --> 00:35:34.960
	classes which means that there will be

	00:35:32.960 --> 00:35:38.160
	multiple more neurons in the output

	00:35:34.960 --> 00:35:39.920
	layer okay so this is once again a quick

	00:35:38.160 --> 00:35:41.839
	representation of how the forward

	00:35:39.920 --> 00:35:44.400
	propagation and the backward propagation

	00:35:41.839 --> 00:35:46.640
	works so the information is going

	00:35:44.400 --> 00:35:49.119
	in this direction which is basically the

	00:35:46.640 --> 00:35:50.079
	forward propagation and at the output

	00:35:49.119 --> 00:35:53.200
	level

	00:35:50.079 --> 00:35:56.480
	we find out what is the cost function

	00:35:53.200 --> 00:35:58.560
	the difference is basically sent back as

	00:35:56.480 --> 00:36:01.040
	part of the backward propagation and

	00:35:58.560 --> 00:36:03.520
	gradient descent then adjust the weights

	00:36:01.040 --> 00:36:06.160
	and biases for the next iteration this

	00:36:03.520 --> 00:36:09.280
	happens iteratively till the cost

	00:36:06.160 --> 00:36:11.680
	function is minimized and that is when

	00:36:09.280 --> 00:36:13.760
	we say the whole the network has

	00:36:11.680 --> 00:36:16.160
	converged or the training process has

	00:36:13.760 --> 00:36:18.880
	converged and there can be situations

	00:36:16.160 --> 00:36:21.599
	where convergence may not happen in rare

	00:36:18.880 --> 00:36:24.000
	cases but by and large the network will

	00:36:21.599 --> 00:36:26.320
	converge and after maybe a few

	00:36:24.000 --> 00:36:28.160
	iterations it could be tens of

	00:36:26.320 --> 00:36:30.160
	iterations or hundreds of iterations

	00:36:28.160 --> 00:36:32.800
	depending on what exactly the number of

	00:36:30.160 --> 00:36:35.599
	iterations can vary and then we say okay

	00:36:32.800 --> 00:36:38.079
	we are getting a certain accuracy and we

	00:36:35.599 --> 00:36:40.800
	say that is our threshold maybe 90

	00:36:38.079 --> 00:36:42.880
	accuracy we stop at that and we say that

	00:36:40.800 --> 00:36:44.640
	the system is trained the trained model

	00:36:42.880 --> 00:36:47.440
	is then deployed for production and so

	00:36:44.640 --> 00:36:49.920
	on so that is the way the neural network

	00:36:47.440 --> 00:36:53.200
	training happens okay so that is the way

	00:36:49.920 --> 00:36:56.079
	classification works in deep learning

	00:36:53.200 --> 00:36:59.280
	using neural network and this slide is

	00:36:56.079 --> 00:37:01.520
	an animation of this whole process as

	00:36:59.280 --> 00:37:04.079
	you can see the forward propagation the

	00:37:01.520 --> 00:37:06.160
	data is going forward from the input

	00:37:04.079 --> 00:37:07.359
	layer to the output layer and there is

	00:37:06.160 --> 00:37:10.000
	an output

	00:37:07.359 --> 00:37:12.960
	and the error is calculated the cost

	00:37:10.000 --> 00:37:15.359
	function is calculated and that is fed

	00:37:12.960 --> 00:37:18.320
	back as a part of backward propagation

	00:37:15.359 --> 00:37:20.800
	and that whole process repeats once

	00:37:18.320 --> 00:37:23.359
	again okay so remember in neural

	00:37:20.800 --> 00:37:27.520
	networks the training process is nothing

	00:37:23.359 --> 00:37:29.760
	but the finding the best values of the

	00:37:27.520 --> 00:37:32.400
	weights and biases for each and every

	00:37:29.760 --> 00:37:34.960
	neuron in the network that's all

	00:37:32.400 --> 00:37:37.760
	training of neural network consists of

	00:37:34.960 --> 00:37:40.960
	finding the optimal values of the

	00:37:37.760 --> 00:37:44.800
	weights and biases so that the accuracy

	00:37:40.960 --> 00:37:47.040
	is maximum all right so with that we

	00:37:44.800 --> 00:37:51.800
	come to the end of the session we all

	00:37:47.040 --> 00:37:51.800
	have a great day thank you very much

	00:37:53.839 --> 00:37:57.359
	hi there if you like this video

	00:37:55.680 --> 00:38:00.000
	subscribe to the simply learn youtube

	00:37:57.359 --> 00:38:02.160
	channel and click here to watch similar

	00:38:00.000 --> 00:38:05.480
	videos turn it up and get certified

	00:38:02.160 --> 00:38:05.480
	click here