Category Archives: ESXI

What is RSC and why does it matter in Vmware

Got latency on your VM’s? It might be RSC

Let’s say we have two VM’s with the  following specs:

  • Windows 2012r2 (Latest patches)
  • VMxNet3
  • Same VLAN
  • Same version of ESXI 3029758
  • Both HW version 11
  • Both have the latest VMware tools for the ESXI Build 3029758
    Basic host diagram

Any new latency between these two machines wouldn’t be expected considering they are both within the same physical location and all within the same topology that has been in place for months almost untouched.

After Vmotioning both to the same host, the latency goes away. That would be expected considering VM’s will use the internal networking for VM to VM networking located on the same host. Bypassing the need to go out to the VDS or vSwitch As shown below.

Local host diagram

 

So let’s go back to the title of the post. What is RSC and why does it matter to VMware?


RSC ( Receive Segment Coalescing) is a technology used to help CPU utilization on a server. It achieves this task by taking the payload off of the CPU and giving it to the network adapter, in our case the VMxNet3 VMware adapter.  RSC Strips headers from the packet’s, combines those packets, then sends those packets to the right destination.  Without RSC the receiver would get 4-5 packets, but with RSC enabled the receiver only has to process the single packet sent with the 5 packets of information stuffed inside.

With VM Hardware version 11 there was a bug introduced that caused the ESXI to not keep the data properly is the PSH flag (PSH Flag explanation ) was not written in the first packet  but was written to the packets following. If you read the article below it gives an awesome example of why this is useful.

Imagine you are walking in a line of 5 friends, friend 1 doesn’t have a pass to get in the gate, but friend 2-5 does. Being a gentleman friend 1 lets friend 2-5 go thru while he buys his ticket. But then friend 2-5 are waiting for him while they are already in the park. pretty similar concept to the PSH flags’.

While packet 2-5 have the PSH flag that grants them permission to go to the application, ESXi has a hiccup while waiting for the PSH flag on packet 1 causing it to wait before the packet can be sent and the full information is received.   Here is the KB article that has the problem highlighted Vmware KB .

What is the fix?


To fix this problem the solution is pretty simple on the OS side you can disable RSC and stop the function, but doing this keep track of the memory use on that box and the box that received a bulk of those packets.

 

Running the command  Will give you the output in below:

netsh int tcp show global

netsh int tcp show global

Received Segment Coalescing State is what we are looking for. If that shows as enabled run the command below to disable it. you should receive a similar output below after running the

netsh int tcp set global rsc=disabled

netsh int tcp set global rsc=disabled

you should receive an output similar to the one above after running the   netsh int tcp show global again.

 

What is affected?


Currently, this affects people running ESXI 6.0 build  3568940 or below also running windows 2008 r2 and above. This problem can be solved 1 of two ways. By updating to ESXi 6.0 Update 2 Build 3620759 or above, or by running the above command on your machines affected by the problem.

 


 

KB Articles below for reference also for citing sources:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2129176

https://communities.vmware.com/thread/524842?start=0&tstart=0

https://technet.microsoft.com/en-us/library/hh997024(v=ws.11).aspx

A general system error occurred: Connection refused

capture4

Logging into my dev environment today I had an issue where I couldn’t start VM’s.  This was happening across clusters and different hosts.

To get the VM’s up you can log into the ESXi host using the Vsphere client.

The issue is the workflow service either hung up or stopped in the VCenter server. In order to fix the workflow service, you will need to check the status and take the proper measures. Take a look below.


  1. Check the status with the command below. you will get an output similar to below. INFO:root: is the line where the service

service-control –status vmware-vpx-workflow

capture1

2. Once the status is found use one of the two commands to either start or stop the service.

service-control –stop vmware-vpx-workflow

capture2

service-control –start vmwarevpx-workflow

capture3

3. Try to power on a VM now and see if the results worked!


There’s no current KB article from VMware on this.  But plenty of resources on the topic.


 

Oracle licensing and VMware

I was talking to a DBA the other day about oracle licensing, so I decided to write a post about it. Not only because it is confusing as hell but it’s a good to know for any potential oracle implementations in the future. But before we begin a couple of terms we need to know and understand for any of this to make even the slightest of sense.

We are going to assume enterprise licensing will be used in this case.


Soft Partitioning-

Oracle definition- “Soft partitioning segments the operating system using OS resource managers. The operating system limits the number of CPUs where an Oracle database is running by creating areas where CPU resources are allocated to applications within the same operating system. This is a flexible way of managing data processing resources since the CPU capacity can be changed fairly easily, as additional resource is needed.” – As stated in http://www.oracle.com/us/corporate/pricing/partitioning-070609.pdf

 


Hard Partitioning-

Oracle definition- “Hard partitioning physically segments a server, by taking a single large server and separating it into distinct smaller systems. Each separated system acts as a physically independent, self-contained server, typically with its own CPUs, operating system, separate boot area, memory, input/output subsystem and network resources.” – As stated in http://www.oracle.com/us/corporate/pricing/partitioning-070609.pdf


Now if you had a virtual machine that was running an oracle database you would assume that the partitioning method that would be used is soft partitioning. Because having the ESXi software make software partitions out of a physical socket .

BUT this is not the case. There is a key phrase in that definition that really stands out.

“The operating system limits the number of CPUs where an Oracle database is running by creating areas where CPU resources are allocated to applications within the same operating system.”

Since assigning a vCPU and vCores to a machine doesn’t limit that particular VM to only use only those resources  that’s one reason why oracle does not consider VMware a soft partitioning product.  (Unless you have CPU affinity enabled)

If you had a box with 4 physical sockets and 8 cores per socket that VM will be able to use any of the 32 cores. Oracle has a simple formula to offer a good insight into how much licensing will cost for a database, it looks something like this.

# of sockets*# of cores*CPU factor*License

So let’s say for a 4 socket intel Xeon system with 8 cores per socket and enterprise licensing. CPU factor chart will be found here http://www.oracle.com/us/corporate/contracts/processor-core-factor-table-070634.pdf

4*8*.5*$47,500= $760,000 Per VM Per host.

That is a lot of money! But it gets more complicated than that. Say this particular VM is in a 3 host cluster. When you introduce VMware technologies like DRS and vMotion is that considered partitioning technology? So does the cluster need to be licensed? There has been more discussion on this in the IT community than there has been about aliens in Area 51. But the short answer is yes. if you have a 3 node cluster with the same specs stated above you will need to have all of those physical machines licensed. Any CPU that the oracle server can touch ( or potentially touch) needs to be licensed under the oracle license agreement.

Here might be a couple of solutions to this issue.

The first solution is to make your own cluster for Oracle database. Make it a 2 node cluster that in the event of a host failure you have it moved to another node. Spec out the physical host configurations to match exactly what you need and not over power it. If you need a 2 socket 4 core setup then just buy 2 of those. Don’t over provision!

The second solution might be to set up CPU affinity. Set the database to only use one particular CPU in the host and make sure it stays there. Now there are obvious downsides to this, one failing over would not happen. This practice has a lot of debate surrounding it. and definitely use it at your own risk. There has been mixed results from Oracle and the team of lawyers they employ.

And the third solution is probably the most painful. Pay Oracle. It’s simple and feels like you need a shot of penicillin after but it is the easiest solution.

-Resources and sources-

http://www.oracle.com/us/corporate/pricing/partitioning-070609.pdf

http://www.oracle.com/us/corporate/pricing/technology-price-list-070617.pdf

Thanks for reading!

I try my best to give the most accurate information possible! But I don’t know everything. If you found something I said not accurate let me know!

 

 

VMFS deprecated Bug ESXI 6.0 Update 1

So if you recently added a new datastore to you vcenter you might have gotten a nice little alarm on the ESXi hosts you added the datastore to, something like below.
Deprecated VMFS

This little gem above is a bug that is included in your ESXi version 6.0 update 1. While it’s a pain to fix it is pretty easy solution and doesn’t actually mean you have deprecated VMFS volumes on your host. But just to make sure check anyway and make sure this bug applies to you.

The solution is pretty simple by restarting two management agents on each affected ESXi the error will clear, of course until you add a new datastore. Then chances are the error will reappear. If you have a massive amount of ESXi hosts and the option is available to update to ESXi 6.0U2 then I personally would update instead of restarting these agents on all your hosts.


Step 1: Log into your ESXi host via SSH or DCUI

Step 2: Run the following command to restart the management agents on the box

services.sh restart

ssh

This command will reset the management agents on the ESXi host and clear the deprecated VMFS tag that has plagued your vcenter.


Kb article is below for more from VMware.

VMFS deprecated datastore 2109735