Placeholder Image

字幕表 動画を再生する

  • >> [Narrator] Live from New York, it's The Cube

  • covering the IBM Machine Learning Launch Event

  • brought to you by IBM.

  • Here are your hosts, Dave Vellante and Stu Miniman.

  • >> Good morning everybody, welcome to the Waldorf Astoria.

  • Stu Miniman and I are here in New York City,

  • the Big Apple,

  • for IBM's Machine Learning Event #IBMML.

  • We're fresh off Spark Summit, Stu,

  • where we had The Cube, this by the way is The Cube,

  • the worldwide leader in live tech coverage.

  • We were at Spark Summit last week,

  • George Gilbert and I,

  • watching the evolution of so-called big data.

  • Let me frame, Stu, where we're at

  • and bring you into the conversation.

  • The early days of big data were all about

  • offloading the data warehouse and reducing the cost

  • of the data warehouse.

  • I often joke that the ROI of big data

  • is reduction on investment, right?

  • There's these big, expensive data warehouses.

  • It was quite successful in that regard.

  • What then happened is we started to throw

  • all this data into the data warehouse.

  • People would joke it became a data swamp,

  • and you had a lot of tooling

  • to try to clean the data warehouse

  • and a lot of transforming and loading

  • and the ETL vendors started to participate there

  • in a bigger way.

  • Then you saw the extension of these data pipelines

  • to try to more with that data.

  • The Cloud guys have now entered in a big way.

  • We're now entering the Cognitive Era,

  • as IBM likes to refer to it.

  • Others talk about AI and machine learning

  • and deep learning,

  • and that's really the big topic here today.

  • What we can tell you, that the news goes out

  • at 9:00am this morning, and it was well known

  • that IBM's bringing machine learning

  • to its mainframe, z mainframe.

  • Two years ago, Stu, IBM announced the z13,

  • which was really designed to bring

  • analytic and transaction processing together

  • on a single platform.

  • Clearly IBM is extending the useful life

  • of the mainframe by bringing things like Spark,

  • certainly what it did with Linux

  • and now machine learning into z.

  • I want to talk about Cloud, the importance of Cloud,

  • and how that has really taken over the world of big data.

  • Virtually every customer you talk to now

  • is doing work on the Cloud.

  • It's interesting to see now

  • IBM unlocking its transaction base,

  • its mission-critical data,

  • to this machine learning world.

  • What are you seeing around Cloud and big data?

  • >> We've been digging into this big data space

  • since before it was called big data.

  • One of the early things that really got me

  • interested and exciting about it is,

  • from the infrastructure standpoint,

  • storage has always been one of its costs

  • that we had to have,

  • and the massive amounts of data,

  • the digital explosion we talked about,

  • is keeping all that information

  • or managing all that information

  • was a huge challenge.

  • Big data was really that bit flip.

  • How do we take all that information

  • and make it an opportunity?

  • How do we get new revenue streams?

  • Dave, IBM has been at the center of this

  • and looking at the higher-level pieces

  • of not just storing data, but leveraging it.

  • Obviously huge in analytics, lots of focus

  • on everything from Hadoop and Spark and newer technologies,

  • but digging in to how they can leverage up the stack,

  • which is where IBM has done a lot of acquisitions

  • in that space and leveraging that

  • and wants to make sure that they have a strong position

  • both in Cloud, which was renamed.

  • The soft layer is now IBM Bluemix

  • with a lot of services

  • including a machine learning service

  • that leverages the Watson technology

  • and of course OnPrem they've got the z

  • and the power solutions

  • that you and I have covered for many years

  • at the IBM Med show.

  • >> Machine learning obviously heavily leverages models.

  • We've seen in the early days of the data,

  • the data scientists would build models

  • and machine learning allows those models

  • to be perfected over time.

  • So there's this continuous process.

  • We're familiar with the world of Batch

  • and then some mini computer brought in

  • the world of interactive,

  • so we're familiar with those types of workloads.

  • Now we're talking about a new emergent workload

  • which is continuous.

  • Continuous apps where you're streaming data in,

  • what Spark is all about.

  • The models that data scientists are building

  • can constantly be improved.

  • The key is automation, right?

  • Being able to automate that whole process,

  • and being able to collaborate

  • between the data scientist, the data quality engineers,

  • even the application developers

  • that's something that IBM really tried to address

  • in its last big announcement in this area

  • of which was in October of last year

  • the Watson data platform,

  • what they called at the time the DataWorks.

  • So really trying to bring together

  • those different personas

  • in a way that they can collaborate together

  • and improve models on a continuous basis.

  • The use cases that you often hear in big data

  • and certainly initially in machine learning

  • are things like fraud detection.

  • Obviously ad serving has been a big data application

  • for quite some time.

  • In financial services, identifying good targets,

  • identifying risk.

  • What I'm seeing, Stu, is that the phase that we're in now

  • of this so-called big data and analytics world,

  • and now bringing in machine learning and deep learning,

  • is to really improve on some of those use cases.

  • For example, fraud's gotten much, much better.

  • Ten years ago, let's say, it took many, many months,

  • if you ever detected fraud.

  • Now you get it in seconds, or sometimes minutes,

  • but you also get a lot of false positives.

  • Oops, sorry, the transaction didn't go through.

  • Did you do this transaction?

  • Yes, I did.

  • Oh, sorry, you're going to have to redo it

  • because it didn't go through.

  • It's very frustrating for a lot of users.

  • That will get better and better and better.

  • We've all experienced retargeting from ads,

  • and we know how crappy they are.

  • That will continue to get better.

  • The big question that people have

  • and it goes back to Jeff Hammerbacher,

  • the best minds of my generation

  • are trying to get people to click on ads.

  • When will we see big data really start

  • to affect our lives in different ways

  • like patient outcomes?

  • We're going to hear some of that today

  • from folks in health care and pharma.

  • Again, these are the things that people are waiting for.

  • The other piece is, of course, IT.

  • What you're seeing, in terms of IT,

  • in the whole data flow?

  • >> Yes, a big question we have, Dave, is

  • where's the data?

  • And therefore, where does it make sense

  • to be able to do that processing?

  • In big data we talked about you've got

  • masses amounts of data,

  • can we move the processing to that data?

  • With IT, the day before, your RCTO talked that

  • there's going to be massive amounts of data at the edge

  • and I don't have the time or the bandwidth

  • or the need necessarily

  • to pull that back to some kind of central repository.

  • I want to be able to work on it there.

  • Therefore there's going to be a lot of data

  • worked at the edge.

  • Peter Levine did a whole video talking about how,

  • "Oh, Public Cloud is dead, it's all going to the edge."

  • A little bit hyperbolic to the statement

  • we understand that there's plenty use cases

  • for both Public Cloud and for the edge.

  • In fact we see Google big pushing machine learning

  • TensorFlow,

  • it's got one of those machine learning frameworks out there

  • that we expect a lot of people to be working on.

  • Amazon is putting effort into the MXNet framework,

  • which is once again an open-source effort.

  • One of the things I'm looking at the space,

  • and I think IBM can provide some leadership here

  • is to what frameworks are going to become popular

  • across multiple scenarios?

  • How many winners can there be for these frameworks?

  • We already have multiple programming languages,

  • multiple Clouds.

  • How much of it is just API compatibility?

  • How much of work there,

  • and where are the repositories of data going to be,

  • and where does it make sense to do that

  • predictive analytics, that advanced processing?

  • >> You bring up a good point.

  • Last year, last October, at Big Data CIV,

  • we had a special segment of data scientists

  • with a data scientist panel.

  • It was great.

  • We had some rockstar data scientists on there

  • like Dee Blanchfield and Joe Caserta,

  • and a number of others.

  • They echoed what you always hear

  • when you talk to data scientists.

  • "We spend 80% of our time messing with the data,

  • "trying to clean the data, figuring out the data quality,

  • "and precious little time on the models

  • "and proving the models

  • "and actually getting outcomes from those models."

  • So things like Spark have simplified that whole process

  • and unified a lot of the tooling

  • around so-called big data.

  • We're seeing Spark adoption increase.

  • George Gilbert in our part one and part two last week

  • in the big data forecast from Wikibon

  • showed that we're still not on the steep part

  • of the Se-curve, in terms of Spark adoption.

  • Generically, we're talking about streaming as well

  • included in that forecast,

  • but it's forecasting that increasingly those applications

  • are going to become more and more important.

  • It brings you back to what IBM's trying to do

  • is bring machine learning

  • into this critical transaction data.

  • Again, to me, it's an extension of the vision

  • that they put forth two years ago,

  • bringing analytic and transaction data together,

  • actually processing within that Private Cloud complex,

  • which is what essentially this mainframe is,

  • it's the original Private Cloud, right?

  • You were saying off-camera,

  • it's the original converged infrastructure.

  • It's the original Private Cloud.

  • >> The mainframe's still here, lots of Linux on it.

  • We've covered for many years,

  • you want your cool Linux docker, containerized,

  • machine learning stuff,

  • I can do that on the Zn-series.

  • >> You want Python and Spark and Re and Papa Java,

  • and all the popular programming languages.

  • It makes sense.

  • It's not like a huge growth platform,

  • it's kind of flat, down, up in the product cycle

  • but it's alive and well

  • and a lot of companies run their businesses

  • obviously on the Zn.

  • We're going to be unpacking that all day.

  • Some of the questions we have is,

  • what about Cloud?

  • Where does it fit?

  • What about Hybrid Cloud?