Running Llama3 model inference  on Intel CPU and Intel GPU (1)   Leave a comment

Introduction

On April 18, 2024, Meta released Llama 3, the latest and most capable Open source large language model (LLM)  model, which is a  major leap over the previous Llama 2 model. This latest LLM model features pretained and instruction-fine-tuned language models with 8B and 70B parameters. According to Meta, these Llama 3 8B and 70B models were the beginning of Llama 3, and there’s a lot more to come, for example the  largest models with over 400B parameters which are still training.

Meta has made the LIama 3 models available for download at  Llama 3 website  and provided the  Getting Started Guide for the latest list of all available platforms. After learning the information,  I started to try out these models. One of the great news is that on the same day( 4/18/2024), Intel announced that its CPU and GPU have been validated to supported Llama 3 8B and 70B models (refer to Llama 3 with Intel AI Solutions). As my  initial experiments of these latest LLM models, I had opportunities to run the Meta Llama 3 8B inference on Intel CPUs and Intel® Arc™ built-in GPU using IPEX-LLM. IPEX-LLM is a PyTorch library for running LLM on Intel CPU and GPU with very low latency 

Run the Llama3 8B inference on Intel CPUs

We can use IPEX-LLM optimize model API to accelerate Llama3 models on CPU. Here are the steps:

  1. Install IPEX-LLM and set environment variables on Linux with the help of IPEX-LLM

$ git clone https://github.com/intel-analytics/ipex-llm.git

$ pip install ipex-llm

$ source ipex-llm-init2

2. Create the conda to manage the Python environment:

. Install the latest 64-bit version of the installer and then clean up after themselves.

$mkdir -p ~/miniconda3

$wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda3/miniconda.sh

  $bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3

  $rm -rf ~/miniconda3/miniconda.sh

     . Initialize the newly-installed Minicond.  

    Run the following commands to initialize for bash and zsh shells:

  $~/miniconda3/bin/conda init bash
  $~/miniconda3/bin/conda init zsh

  • Create a Python environment for IPEX-LLM
   $conda create -n llm python=3.11 # recommend to use Python 3.11
   $conda activate llm
  Install the latest ipex-llm nightly build with 'all' option
   $pip install --pre --upgrade ipex-llm[all] 
   Install transformers 4.37.0 (>=4.33.0 is required for Llama3 with IPEX-LLM optimizations) 
   $pip install transformers==4.37.0
  • Run the inference example for a Llama3 model to predict the next N tokens using generate() API, with IPEX-LLM INT4 optimizations:               

   For example we give a prompt ‘What is AI’ to run the inference using this LIama 3 model.

$ python ./generate.py –prompt ‘What is AI?’

With the default setting of 32 tokens of output, it produced the output like this as shown in the following screen shot:

We can also give a different prompt to run the inference using this LIama 3 model.

$python ./generate.py –prompt ‘What is llama3?’

With the default setting of 32 tokens of output, the output is like this:

As a part of the output of the program, it gave the inference time for 32 tokens (default value). For example, the inference time in the example above is about 2.944019079208374 second.   

Run the Llama3 8B inference on Intel ARC A770 GPU

I ran the Llama3 8B inference on a system with Intel® Arc™ A770 Graphics (16GB)  of  16 GB memory and 32 Xe-cores

Here are the steps:

  1.  create conda virtual environment:

$conda create -n llama3-test python=3.11
$conda activate llama3-test

$cd llama3

  • Install  IPEX-LLM  (intel_extension_for_pytorch==2.1.10+xpu as default)

$git clone https://github.com/intel-analytics/ipex-llm.git

Below command will install intel_extension_for_pytorch==2.1.10+xpu as default
$pip install –pre –upgrade ipex-llm[xpu] –extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/

  •  Install transformers 4.37.0

# transformers>=4.33.0 is required for Llama3 with IPEX-LLM optimizations 

$pip install transformers==4.37.0

  • Set the environment variables

$source /opt/intel/oneapi/setvars.sh

$export LD_PRELOAD=${LD_PRELOAD}:${CONDA_PREFIX}/lib/libtcmalloc.so
$export SYCL_PI_LEVEL_ZERO_USE_IMMEDIATE_COMMANDLISTS=1
$export SYCL_CACHE_PERSISTENT=1
$export ENABLE_SDP_FUSION=1

  • Run the inference example for a Llama3 model to predict the next N tokens using generate() API, with IPEX-LLM INT4 optimizations:

            $python ./generate.py –repo-id-or-model-path /mnt/disk1/models/llama3- 8b-instruction-hf –prompt ‘What is AI?’

As a part of the output of the program, it gave the inference time. For example, the

 inference time for the 32 tokens (default value) in the example above  is about 0.474730 second.   

Posted April 28, 2024 by kyuoracleblog in Uncategorized

Tagged with , , , ,

COLLABORATE 20 Conference Converted to Digital Virtual Conferences   Leave a comment

This year’s  Oracle technology users groups COLLABORATE conference has been cancelled  due to the evolving COVID-19 situation. As a result, the conference has been converted to multiple Digital Virtual Conferences.

  • OATUG (Oracle Applications Technology Users Group) will hold OATUG Online Forum on April 20-May 1, 2020 with 200+ sessions online. Refer to OATUG  Form Online page here . For this online forum, I will present my topic “Replicating Oracle EBS Database with Snapshot for DEV/Test,” on  Tue, Apr 28, 2020 3:30 PM – 4:30 PM EDT.
  • Quest Oracle Community will hold several Quest Forum Digital Events:  Innovation Week on April 20-April 24, 2020 and Community Education Weeks on May 27-June 18. Check this link for details.
  •   The Innovation week will consists of  keynote speakers, Oracle roadmap updates, strategic insights and customer case studies . This link has more details.
  • Community Education Weeks continues a series of comprehensive virtual events for each Quest Community such as Database & Technology week on June 15-18, 2020.  I will be giving three presentations on the Database & Technology Week:
    • Leveraging Oracle Autonomous Data Warehouse for Machine Learning, June 15, 2020, 11:am-12:00pm.
    • Database Cloud: Architecture components, Deployment and Migration, June 15, 2020, 3:45:pm-4:45pm.
    • Achieving Extreme Scalability & Total Fault Isolation with Oracle Sharding 19c/20c, June 16, 2020, 11:am-12:00pm.

 

Posted April 19, 2020 by kyuoracleblog in Uncategorized

My Presentations at Germany Oracle User Group Conference (DOAG2019)   Leave a comment

Next week I will be presenting the two following topics at Germany Oracle User Group Conference (DOAG2019):

Cloning Oracle EBS Database With Snapshot for DEV/Test

Abstract: For many complex database applications such as Oracle E-Business suite, it is quite common that a production environment needs 10 or more copies of various development and test purposes. This database clone process to create these many copies of production database can be very challenging and costly especially when the database gets larger than multiple even over 10 terabytes (TBs). This session will discuss two snapshot based methods that can help IT simplify this database cloning process and reduce the time and resource cost: The method is called Oracle gDBClone which is based on Oracle ACFS snapshot. The second method is based on database storage volume snapshot provided by the storage array. With a real example in our IT department, we will discuss the implementation details of these two methods and discuss the pros and cons of these two methods.

Oracle ADW for Advanced Analytic and Machine Learning

Abstract: Oracle Autonomous Data Warehouse (ADW) provides a fully automated data warehouse specific features to deliver outstanding query performance. As a fully managed cloud service running on high performance Oracle Infrastructure Cloud (OCI), ADW also comes with a new set of tools such as Zeppelin-based SQL notebooks for advanced analytics and Oracle machine learning. This session will examine the architecture of Oracle ADW and its collaborative environment. Then we will discuss how application developers and data scientists can leverage this environment: how import and export large set of data to/from ADW database instance, and how to build, evaluate, and apply machine learning models with SQL notebooks. With several examples, I will share some troubleshooting experiences and tips of building such advanced analytics and Oracle machine learning applications in Oracle ADW.

The  presentation slide deck links have been listed on the presentations section of this blog.

My Presentation Topic at Oracle OpenWorld 2019   Leave a comment

I will be speaking at Oracle OpenWorld 2019. Here are information of my session. Look forward to seeing and discussing this interesting topic with friends and colleagues in Oracle community.

Title: Achieving Massive Scalability and Total Fault Isolation with Oracle Sharding 19c.  Session Code: BUS1988

Session time and location: Tuesday, September 17, 04:15 PM – 05:00 PM | Moscone West – Room 3020A

Session abstract:

Oracle Sharding was introduced in Oracle Database 12cR2, and was further improved in Oracle database 18c and 19c for linear scalability and complete fault isolation for OLTP workloads. With Oracle Sharding, data is horizontally partitioned across discrete Oracle Databases (shards) that collectively form a single logical database. Come to this session to learn about the latest sharding improvements in Oracle Database 18c and 19c, sharding on Oracle Database Cloud, and leveraging sharding for your business. Hear about the experience of using Oracle Sharding in Dell IT’s global geo-distribution application, including what apps best fit to sharding, deployment of sharded databases with replication for massive scalability, and complete fault isolation. Lessons learned are also covered.

Session link at Oracle OpenWorld conference  catalog: https://events.rainfocus.com/widget/oracle/oow19/catalogow19?search=BUS1988

Posted August 23, 2019 by kyuoracleblog in Uncategorized

I will be speaking at Ohio Oracle Users Group Meeting   Leave a comment

I  will be invited as a guest speaker to present at the upcoming Ohio Oracle Users Groups meeting on March 24, 2017.

At the meeting I will be speaking at the following presentation sessions

10:00 – 11:00 A.M:   Oracle DBAs, let’s have a taste of Oracle Database Cloud Service
11:00 – 12:00 A.M:  Leveraging Oracle Database In-Memory to accelerate Business Analytic Applications
01:00 – 02:00 P.M:  Optimize Oracle Database Storage with Storage Tiering
02:00 – 03:00 P.M:   Oracle ASM Cluster File System for Private Cloud storage

For the details about the meeting , please refer to the Ohio Oracle Users Group link: http://www.ooug.org/

 

Posted March 21, 2017 by kyuoracleblog in Uncategorized

IOUG Oracle Database 12c R2 Expert Panel tomorrow December 6, 2016, 12:00pm – 1:00pm Central   Leave a comment

I will join three other Oracle ACE/ACE Directors Jim Czuprynski, Anuj Mohan, and  Rich Niemiec to present IOUG Oracle Database 12cR2 Expert Panel. We will share our experience of attending a week long  Oracle 12cR2 beta program on behalf of IOUG  in Oracle Lab at Oracle HQ. We  will also discuss some of the major Oracle database features introduced in Oracle 12cR2 such as those in Multitenant database, Database In-Memory, Oracle Sharding and many more. To attend this IOUG webinar, please get the registration information from this link:
http://www.ioug.org/page/oracle-webcasts-online-events

 

 

Thanks IOUG for Accepting My Presentations for Collaborate 17   Leave a comment

Just received a great news from IOUG that three of my presentations were accepted for Collaborate 17.

. Session #724:  Oracle DBAs, It is time to have a first taste of Oracle Database Cloud Service

    Abstract: As an important part of Oracle Cloud offering, Oracle Database Cloud service provides an easy of use enterprise-proven highly available and scalable database service. Oracle database cloud service can be the first “going to Cloud” that you, an Oracle DBA may experience. This session helps you get the first taste of Oracle Database Cloud service. It will examine choices of of database cloud services and help you to determine which one would fit in your business requirement. It will also discuss the technical expects of each of cloud services. Through a few examples, the session will discuss the DBA tasks and how you would perform in the new database environment ranging from creating database cloud service to managing the databases.

. Session #487, Leveraging Oracle Database In-Memory to accelerate Business Analytic Applications

Abstract :By introducing In-Memory column store, Oracle Database In-Memory (DBIM) significantly improves the performance for analytic queries as well as mixed workloads.  Come to this session to learn Oracle Database In Memory under the hood:  the dual format memory architecture and configuration, how the data is populated into In-Memory column store and  it helps query performance. . This presentation also covers further enhancements of DBIM in Oracle Database 12cR2  : Join group, and In-memory expression, In Memory active Data Guard support etc.   Through some case studies of business analytics projects, this presentation covers the practices of leveraging DBIM to improve the query performance of business analytic applications and how to use  In Memory advisor to determine the objects that need to be loaded into In memory column storage . We also will present some analysis on the performance gains by using IM memory features and when and how these gains can be achieved.

. Session #437: Get ready to upgrade your Oracle databases to 12c: tools, methods and paths

Abstract: In this session we will share some of our experience on upgrading different versions of Oracle databases to the  latest Oracle 12c. Attend this session you will learn the upgrade tools and various upgrade paths that help you plan your upcoming database upgrades. This will include upgrade methods for multitenant databases.  Another focus area of this session is Oracle RAC database upgrade which includes Oracle Grid Infrastructure upgrade and Oracle RAC database upgrade. You will also learn how to use the rolling upgrade for Oracle Clusterware and ASM and Oracle RAC database to reduce the database downtime.
And it will be my great honor to join my fellow Oracle ACE Directors and Oracle ACEs  as a co-speaker for their sessions:
. Session #779: Oracle VM, OEM13c and Cloud Computing – Expert Panel by   Tariq Farooq and Charles Kim and Kai Yu
. Session #518: Oracle 12c R2 Expert Panel by Anuj Mohan, Jim Czuprynski, Kai Yu, Seth Miller
And at last , but certainly not least,  I am waiting for  IOUG Collaborate 17 conference committee to accept one of important sessions in Cloud area:  the  IOUG Cloud SIG session: Session #711 Cloud Experience-To the Cloud and Back -Hands on Workshop by Steve Lemme, Charles Kim, Try Logon, Erik Benner, Kai Yu, Seth Miller.
See you at Collaborate 17@Vegas , next April!

 

Validated Infrastructure for Oracle Linux and Oracle VM   Leave a comment

If you are thinking about building  Oracle Linux or Oracle VM based systems for your Oracle databases, for example  building a private cloud type of system to consolidate your databases,. have you thought about the hardware  infrastructure  that your rest of platform will be built upon. One thing that you need to make sure is that the infrastructure stack is certified and validated for the software platform stack such as Oracle Linux/Oracle VM and the Oracle database.

The nice thing about this is that as a customer, you should not spend  your resources for this complex and time consuming certification and validation process  as various hardware vendors will partner with Oracle to establish the validated configurations. But you need to make sure that you select the proper infrastructure and OS  based on the validated configurations.  Since starting  working on Oracle Linux and Oracle VM in 2009 ,  it has been my team’s responsibility to work with Oracle Linux and Oracle VM team to certify  and validate  Infrastructure such as  Dell servers, storage and networking components  with Oracle VM and Oracle Linux Unbreakable Kernel (UEK). And we posted the validated configurations on Oracle Linux and Oracle VM  Validated Configuration (OVC) website : http://www.oracle.com/technetwork/topics/linux/validated-configurations-085828.html .

This website documented pre-tested , validated configurations and best practices of running Oracle Linux and Oracle VM on various infrastructure  such as Dell PowerEdge servers, Dell Storage and Network. With each of the validated configuration, customers  can run their  production Oracle database either with  Oracle Linux or Oracle VM as a  database platform.

You can get a completed list of the configurations with this link: http://linux.oracle.com/pls/apex/f?p=102:1. On this link, you can type a Hardware vendor name and you can get a list of the validated configuration with various hardware. For example,. if you  type Dell to search, it will come back with  55 configurations as today (10/19/2016). The list consists of the latest servers such as Dell PowerEdge R930 and various Dell Storages.

dell_ovc_list

This is what a specific configuration looks like:

dell_ovc_list_1

dell_ovc_list_2

OOW2016 Part 2: What I presented and what I learned   Leave a comment

Working in Progress, will come up in 2-3 days

Posted October 4, 2016 by kyuoracleblog in Uncategorized

OOW2016 Part 1: Preconference Oracle ACED Product Briefing at Oracle HQ   Leave a comment

The annual Oracle technology event Oracle OpenWorld (OOW) was held on Sept 18-22 in San Francisco. As usual my trip to this conference really consists of two parts: Preconference Oracle ACE Director product briefing (my 7th product briefing )  on Sept 15-16  and OOW conference  (my 11th OOW) on Sept 18-22. I am writing these two blogs to share my experience of this 7 days conference.

The preconference Oracle ACE Director Product briefing was held at the Oracle Conference center of Oracle HQ at Redwood shores, California. The event agenda consisted of two full days of presentations by Oracle executives and product managers. The Product briefing started at 9:00am of Spet 18  with the executive presentation by Thomas Kurian , the president of Oracle product development. In his one hour presentation, Thomas Kurian gave a very broad summary of the Oracle’s IaaS/Cloud Vision, Strategy, Roadmap , which covered a good amount of new development that would be announced in the upcoming Oracle OpenWorld 2016 in a next few days. Thomas’ talk was followed by two IaaS presentations that covered Oracle latest IaaS Cloud service: 1) IaaS architecture by Mark Cavage , vice president of Engineering at Oracle ; 2) the engineering overview of IaaS implementation by Deepak Patil, Vice president, Development at Oracle. The new Oracle IaaS service has built in some high performance features such as high bandwidth (Tb/s) and low round-trip-time (< 1ms) between Oracle Cloud data centers . It uses Software Defined networks  (SDN) technologies which provides the flexibility of network in the same way as Hypervisor provides flexibility of compute. In the compute cloud service, the  IaaS offers the Bare metal based compute service, the Virtual machine based compute service as well as the Container based compute service.

dsc04935

(Oracle President for product development Thomas Kurian in Oracle ACED product briefing )

One of interesting presentations was given by Navin R. Thadani, Products at Ravello Systems at Oracle. Navin introduced one of new products recently acquired by Oracle: Ravello. This product helps customers by lifting the applications running on VMware from customer on-premise data center to Oracle public Cloud simply without any change. Another presentation was about Oracle  Oracle  Database 12c Release 2 Cloud First release, which was later officially announced by Larry at Oracle OpenWorld 16.

In the afternoon, I also attended a special session about Oracle Linux and virtualization update (my favor subject), presented by my good friend Honglin Su, Sr. Direct  of Product Management for Oracle Linux and Virtualization. A lot of new features about Oracle UEK kernel, and Oracle VM were presented, such as support for Ksplice , OpenStack for Oracle Linux, Docker and Oracle Cloud container Registry, in UEK R4. QU2 . Here is the photo taken after the session:

with_honglin

(With Oracle Linux and Virtualization : Honglin Su (Right) and his colleague (Left) and myself (middle))

On the second day, there were more Oracle cloud related presentations which covered a variety of Oracle public cloud topics: DevOps at Oracle public cloud, Big data in Oracle cloud, Oracle Golden Gate in the cloud, Oracle PaaS cloud. Oracle Management cloud service. Etc.

In summary, from this Oracle product briefing, we can see that Oracle has put so much focus on Oracle Cloud in the past year. I was expecting to give more details about Oracle Cloud from the  rest of Oracle OpenWorld conference.

Oracle ACE director product briefing not only provides us a chance to get the exposure on the Oracle latest technology development, it was also a great network opportunities for us. This is the annual event where more 70 Oracle ACE Directors  got together to discuss many their favor technologies and establish the professional network and the friendship among us. The following photo is the picture of all Oracle ACED directors who attend this preconference product briefing event at  Oracle Conference center before Oracle OpenWorld 2016.

dsc05296

 

(Oracle ACE directors at Oracle conference center)

The networking opportunity event extended to Oracle ACE and Oracle associate ACE (for more details about Oracle ACE directors, Oracle ACE and Oracle ACE associate , refer to Oracle ACE program link : http://www.oracle.com/technetwork/community/oracle-ace/index.html ).

The following is the photo of Oracle ACE directors, Oracle ACE and Oracle ACE associate taken at San Francisco Pier 39 where Oracle ACE program hosted the annual Oracle ACE dinner for all the Oracle ACED/ACE/ACE associates who attended Oracle OpenWorld 2016.

oracle-aces_at_pier39

 

 

Posted October 4, 2016 by kyuoracleblog in Uncategorized