Blame - docs/memory.txt - chromium.googlesource.com/chromiumos/third_party/qemu

blob: 2ceb348942bdf0b191c1408199e1a2f869ee0252 [file] [log] [blame]

Avi Kivity	9d3a473	2011-07-26 14:26:00 +0300	[diff] [blame]	1	The memory API
				2	==============
				3
				4	The memory API models the memory and I/O buses and controllers of a QEMU
				5	machine. It attempts to allow modelling of:
				6
				7	- ordinary RAM
				8	- memory-mapped I/O (MMIO)
				9	- memory controllers that can dynamically reroute physical memory regions
Ademar de Souza Reis Jr	69ddaf6	2011-12-05 16:54:14 -0300	[diff] [blame]	10	to different destinations
Avi Kivity	9d3a473	2011-07-26 14:26:00 +0300	[diff] [blame]	11
				12	The memory model provides support for
				13
				14	- tracking RAM changes by the guest
				15	- setting up coalesced memory for kvm
				16	- setting up ioeventfd regions for kvm
				17
Paolo Bonzini	2d40178	2013-05-06 18:23:38 +0200	[diff] [blame]	18	Memory is modelled as an acyclic graph of MemoryRegion objects. Sinks
				19	(leaves) are RAM and MMIO regions, while other nodes represent
				20	buses, memory controllers, and memory regions that have been rerouted.
				21
				22	In addition to MemoryRegion objects, the memory API provides AddressSpace
				23	objects for every root and possibly for intermediate MemoryRegions too.
				24	These represent memory as seen from the CPU or a device's viewpoint.
Avi Kivity	9d3a473	2011-07-26 14:26:00 +0300	[diff] [blame]	25
				26	Types of regions
				27	----------------
				28
				29	There are four types of memory regions (all represented by a single C type
				30	MemoryRegion):
				31
				32	- RAM: a RAM region is simply a range of host memory that can be made available
				33	to the guest.
				34
				35	- MMIO: a range of guest memory that is implemented by host callbacks;
				36	each read or write causes a callback to be called on the host.
				37
				38	- container: a container simply includes other memory regions, each at
				39	a different offset. Containers are useful for grouping several regions
				40	into one unit. For example, a PCI BAR may be composed of a RAM region
				41	and an MMIO region.
				42
				43	A container's subregions are usually non-overlapping. In some cases it is
				44	useful to have overlapping regions; for example a memory controller that
				45	can overlay a subregion of RAM with MMIO or ROM, or a PCI controller
				46	that does not prevent card from claiming overlapping BARs.
				47
				48	- alias: a subsection of another region. Aliases allow a region to be
				49	split apart into discontiguous regions. Examples of uses are memory banks
				50	used when the guest address space is smaller than the amount of RAM
				51	addressed, or a memory controller that splits main memory to expose a "PCI
				52	hole". Aliases may point to any type of region, including other aliases,
				53	but an alias may not point back to itself, directly or indirectly.
				54
Peter Maydell	6f1ce94	2013-10-15 15:42:34 +0100	[diff] [blame]	55	It is valid to add subregions to a region which is not a pure container
				56	(that is, to an MMIO, RAM or ROM region). This means that the region
				57	will act like a container, except that any addresses within the container's
				58	region which are not claimed by any subregion are handled by the
				59	container itself (ie by its MMIO callbacks or RAM backing). However
				60	it is generally possible to achieve the same effect with a pure container
				61	one of whose subregions is a low priority "background" region covering
				62	the whole address range; this is often clearer and is preferred.
				63	Subregions cannot be added to an alias region.
Avi Kivity	9d3a473	2011-07-26 14:26:00 +0300	[diff] [blame]	64
				65	Region names
				66	------------
				67
				68	Regions are assigned names by the constructor. For most regions these are
				69	only used for debugging purposes, but RAM regions also use the name to identify
				70	live migration sections. This means that RAM region names need to have ABI
				71	stability.
				72
				73	Region lifecycle
				74	----------------
				75
Paolo Bonzini	8b5c216	2015-02-13 13:42:03 +0100	[diff] [blame]	76	A region is created by one of the memory_region_init*() functions and
				77	attached to an object, which acts as its owner or parent. QEMU ensures
				78	that the owner object remains alive as long as the region is visible to
				79	the guest, or as long as the region is in use by a virtual CPU or another
				80	device. For example, the owner object will not die between an
				81	address_space_map operation and the corresponding address_space_unmap.
Paolo Bonzini	d8d9581	2014-06-11 12:42:01 +0200	[diff] [blame]	82
Paolo Bonzini	8b5c216	2015-02-13 13:42:03 +0100	[diff] [blame]	83	After creation, a region can be added to an address space or a
				84	container with memory_region_add_subregion(), and removed using
				85	memory_region_del_subregion().
Paolo Bonzini	d8d9581	2014-06-11 12:42:01 +0200	[diff] [blame]	86
Paolo Bonzini	8b5c216	2015-02-13 13:42:03 +0100	[diff] [blame]	87	Various region attributes (read-only, dirty logging, coalesced mmio,
				88	ioeventfd) can be changed during the region lifecycle. They take effect
				89	as soon as the region is made visible. This can be immediately, later,
				90	or never.
				91
				92	Destruction of a memory region happens automatically when the owner
				93	object dies.
				94
				95	If however the memory region is part of a dynamically allocated data
				96	structure, you should call object_unparent() to destroy the memory region
				97	before the data structure is freed. For an example see VFIOMSIXInfo
				98	and VFIOQuirk in hw/vfio/pci.c.
				99
				100	You must not destroy a memory region as long as it may be in use by a
				101	device or CPU. In order to do this, as a general rule do not create or
				102	destroy memory regions dynamically during a device's lifetime, and only
				103	call object_unparent() in the memory region owner's instance_finalize
				104	callback. The dynamically allocated data structure that contains the
				105	memory region then should obviously be freed in the instance_finalize
				106	callback as well.
				107
				108	If you break this rule, the following situation can happen:
				109
				110	- the memory region's owner had a reference taken via memory_region_ref
				111	(for example by address_space_map)
				112
				113	- the region is unparented, and has no owner anymore
				114
				115	- when address_space_unmap is called, the reference to the memory region's
				116	owner is leaked.
				117
				118
				119	There is an exception to the above rule: it is okay to call
				120	object_unparent at any time for an alias or a container region. It is
				121	therefore also okay to create or destroy alias and container regions
				122	dynamically during a device's lifetime.
				123
				124	This exceptional usage is valid because aliases and containers only help
				125	QEMU building the guest's memory map; they are never accessed directly.
				126	memory_region_ref and memory_region_unref are never called on aliases
				127	or containers, and the above situation then cannot happen. Exploiting
				128	this exception is rarely necessary, and therefore it is discouraged,
				129	but nevertheless it is used in a few places.
				130
				131	For regions that "have no owner" (NULL is passed at creation time), the
				132	machine object is actually used as the owner. Since instance_finalize is
				133	never called for the machine object, you must never call object_unparent
				134	on regions that have no owner, unless they are aliases or containers.
				135
Avi Kivity	9d3a473	2011-07-26 14:26:00 +0300	[diff] [blame]	136
				137	Overlapping regions and priority
				138	--------------------------------
				139	Usually, regions may not overlap each other; a memory address decodes into
				140	exactly one target. In some cases it is useful to allow regions to overlap,
				141	and sometimes to control which of an overlapping regions is visible to the
				142	guest. This is done with memory_region_add_subregion_overlap(), which
				143	allows the region to overlap any other region in the same container, and
				144	specifies a priority that allows the core to decide which of two regions at
				145	the same address are visible (highest wins).
Marcel Apfelbaum	8002ccd	2013-09-16 11:21:15 +0300	[diff] [blame]	146	Priority values are signed, and the default value is zero. This means that
				147	you can use memory_region_add_subregion_overlap() both to specify a region
				148	that must sit 'above' any others (with a positive priority) and also a
				149	background region that sits 'below' others (with a negative priority).
Avi Kivity	9d3a473	2011-07-26 14:26:00 +0300	[diff] [blame]	150
Peter Maydell	6f1ce94	2013-10-15 15:42:34 +0100	[diff] [blame]	151	If the higher priority region in an overlap is a container or alias, then
				152	the lower priority region will appear in any "holes" that the higher priority
				153	region has left by not mapping subregions to that area of its address range.
				154	(This applies recursively -- if the subregions are themselves containers or
				155	aliases that leave holes then the lower priority region will appear in these
				156	holes too.)
				157
				158	For example, suppose we have a container A of size 0x8000 with two subregions
				159	B and C. B is a container mapped at 0x2000, size 0x4000, priority 1; C is
				160	an MMIO region mapped at 0x0, size 0x6000, priority 2. B currently has two
				161	of its own subregions: D of size 0x1000 at offset 0 and E of size 0x1000 at
				162	offset 0x2000. As a diagram:
				163
				164	0 1000 2000 3000 4000 5000 6000 7000 8000
				165	\|------\|------\|------\|------\|------\|------\|------\|-------\|
				166	A: [ ]
				167	C: [CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC]
				168	B: [ ]
				169	D: [DDDDD]
				170	E: [EEEEE]
				171
				172	The regions that will be seen within this address range then are:
				173	[CCCCCCCCCCCC][DDDDD][CCCCC][EEEEE][CCCCC]
				174
				175	Since B has higher priority than C, its subregions appear in the flat map
				176	even where they overlap with C. In ranges where B has not mapped anything
				177	C's region appears.
				178
				179	If B had provided its own MMIO operations (ie it was not a pure container)
				180	then these would be used for any addresses in its range not handled by
				181	D or E, and the result would be:
				182	[CCCCCCCCCCCC][DDDDD][BBBBB][EEEEE][BBBBB]
				183
				184	Priority values are local to a container, because the priorities of two
				185	regions are only compared when they are both children of the same container.
				186	This means that the device in charge of the container (typically modelling
				187	a bus or a memory controller) can use them to manage the interaction of
				188	its child regions without any side effects on other parts of the system.
				189	In the example above, the priorities of D and E are unimportant because
				190	they do not overlap each other. It is the relative priority of B and C
				191	that causes D and E to appear on top of C: D and E's priorities are never
				192	compared against the priority of C.
				193
Avi Kivity	9d3a473	2011-07-26 14:26:00 +0300	[diff] [blame]	194	Visibility
				195	----------
				196	The memory core uses the following rules to select a memory region when the
				197	guest accesses an address:
				198
				199	- all direct subregions of the root region are matched against the address, in
				200	descending priority order
				201	- if the address lies outside the region offset/size, the subregion is
				202	discarded
Peter Maydell	6f1ce94	2013-10-15 15:42:34 +0100	[diff] [blame]	203	- if the subregion is a leaf (RAM or MMIO), the search terminates, returning
				204	this leaf region
Avi Kivity	9d3a473	2011-07-26 14:26:00 +0300	[diff] [blame]	205	- if the subregion is a container, the same algorithm is used within the
				206	subregion (after the address is adjusted by the subregion offset)
Peter Maydell	6f1ce94	2013-10-15 15:42:34 +0100	[diff] [blame]	207	- if the subregion is an alias, the search is continued at the alias target
Avi Kivity	9d3a473	2011-07-26 14:26:00 +0300	[diff] [blame]	208	(after the address is adjusted by the subregion offset and alias offset)
Peter Maydell	6f1ce94	2013-10-15 15:42:34 +0100	[diff] [blame]	209	- if a recursive search within a container or alias subregion does not
				210	find a match (because of a "hole" in the container's coverage of its
				211	address range), then if this is a container with its own MMIO or RAM
				212	backing the search terminates, returning the container itself. Otherwise
				213	we continue with the next subregion in priority order
				214	- if none of the subregions match the address then the search terminates
				215	with no match found
Avi Kivity	9d3a473	2011-07-26 14:26:00 +0300	[diff] [blame]	216
				217	Example memory map
				218	------------------
				219
				220	system_memory: container@0-2^48-1
				221	\|
				222	+---- lomem: alias@0-0xdfffffff ---> #ram (0-0xdfffffff)
				223	\|
				224	+---- himem: alias@0x100000000-0x11fffffff ---> #ram (0xe0000000-0xffffffff)
				225	\|
				226	+---- vga-window: alias@0xa0000-0xbfffff ---> #pci (0xa0000-0xbffff)
				227	\| (prio 1)
				228	\|
				229	+---- pci-hole: alias@0xe0000000-0xffffffff ---> #pci (0xe0000000-0xffffffff)
				230
				231	pci (0-2^32-1)
				232	\|
				233	+--- vga-area: container@0xa0000-0xbffff
				234	\| \|
				235	\| +--- alias@0x00000-0x7fff ---> #vram (0x010000-0x017fff)
				236	\| \|
				237	\| +--- alias@0x08000-0xffff ---> #vram (0x020000-0x027fff)
				238	\|
				239	+---- vram: ram@0xe1000000-0xe1ffffff
				240	\|
				241	+---- vga-mmio: mmio@0xe2000000-0xe200ffff
				242
				243	ram: ram@0x00000000-0xffffffff
				244
Ademar de Souza Reis Jr	69ddaf6	2011-12-05 16:54:14 -0300	[diff] [blame]	245	This is a (simplified) PC memory map. The 4GB RAM block is mapped into the
Avi Kivity	9d3a473	2011-07-26 14:26:00 +0300	[diff] [blame]	246	system address space via two aliases: "lomem" is a 1:1 mapping of the first
				247	3.5GB; "himem" maps the last 0.5GB at address 4GB. This leaves 0.5GB for the
				248	so-called PCI hole, that allows a 32-bit PCI bus to exist in a system with
				249	4GB of memory.
				250
				251	The memory controller diverts addresses in the range 640K-768K to the PCI
Avi Kivity	7075ba3	2011-08-08 19:58:50 +0300	[diff] [blame]	252	address space. This is modelled using the "vga-window" alias, mapped at a
Avi Kivity	9d3a473	2011-07-26 14:26:00 +0300	[diff] [blame]	253	higher priority so it obscures the RAM at the same addresses. The vga window
				254	can be removed by programming the memory controller; this is modelled by
				255	removing the alias and exposing the RAM underneath.
				256
				257	The pci address space is not a direct child of the system address space, since
				258	we only want parts of it to be visible (we accomplish this using aliases).
				259	It has two subregions: vga-area models the legacy vga window and is occupied
				260	by two 32K memory banks pointing at two sections of the framebuffer.
				261	In addition the vram is mapped as a BAR at address e1000000, and an additional
				262	BAR containing MMIO registers is mapped after it.
				263
				264	Note that if the guest maps a BAR outside the PCI hole, it would not be
				265	visible as the pci-hole alias clips it to a 0.5GB range.
				266
Avi Kivity	9d3a473	2011-07-26 14:26:00 +0300	[diff] [blame]	267	MMIO Operations
				268	---------------
				269
				270	MMIO regions are provided with ->read() and ->write() callbacks; in addition
				271	various constraints can be supplied to control how these callbacks are called:
				272
				273	- .valid.min_access_size, .valid.max_access_size define the access sizes
				274	(in bytes) which the device accepts; accesses outside this range will
				275	have device and bus specific behaviour (ignored, or machine check)
				276	- .valid.aligned specifies that the device only accepts naturally aligned
				277	accesses. Unaligned accesses invoke device and bus specific behaviour.
				278	- .impl.min_access_size, .impl.max_access_size define the access sizes
				279	(in bytes) supported by the implementation; other access sizes will be
				280	emulated using the ones available. For example a 4-byte write will be
Ademar de Souza Reis Jr	69ddaf6	2011-12-05 16:54:14 -0300	[diff] [blame]	281	emulated using four 1-byte writes, if .impl.max_access_size = 1.
Fam Zheng	edc1ba7	2014-05-05 15:53:41 +0800	[diff] [blame]	282	- .impl.unaligned specifies that the implementation supports unaligned
				283	accesses; if false, unaligned accesses will be emulated by two aligned
				284	accesses.
				285	- .old_mmio can be used to ease porting from code using
				286	cpu_register_io_memory(). It should not be used in new code.